branch related to computer security from its earliest age is automated reasoning, particularly when applied to programs and systems. Though the SATAN program of Dan Farmer and Wietse Venema, launched in 1995, has not yet been identified as AI, it has automated a process searching for vulnerabilities in system configurations that would require much more human efforts”. Ingham et al . (2007) have proposed an inductive reasoning system for the protection of web applications. The works of Vigna and co-workers (Mutz et al . 2007; Cova et al . 2007, 2010; Kirdaa et al . 2009; Robertson et al . 2010) have also dealt with the protection of web applications against cyberattacks. Firewalls using deep packet inspections can be considered a sort of AI instantiation in cybersecurity. Firewalls have been part of the cyberdefense arsenal for many years. Although in most cases more sophisticated techniques (Mishra et al . 2011; Valentín and Malý 2014; Tekerek and Bay 2019) are also used, filtering relies on the port number. Firewalls cannot rely on the port number, as most web applications use the same port as the rest of the web traffic. Deep packet inspection is the only option enabling the identification of malware code in a legitimate application. The idea of application layer filtering of the Transmission Control Protocol/Internet Protocol (TCP/IP) model was introduced in the third generation of firewall in the 1990s. The modest success of these technologies is an indication that much more is still to be done in AI, so that it can make a significant difference in terms of cybersecurity. Nevertheless, it is worth noting that using AI in cybersecurity is not necessarily a miracle solution. For example, attacks without malware, which require no software download and dissimulate malware activities inside legitimate cloud computing services, are on the increase, and AI is not yet able to counteract these types of network breach.
1.3. AI applied to intrusion detection
Intrusion detection is defined as the process of intelligent monitoring of events occurring in a computer system or network and their analysis in search for signs of security policy breach (Bace 2000). The main objective of intrusion detection systems is to protect network availability, confidentiality and integrity. Intrusion detection systems are defined both by the method used to detect the attacks and by their location in the network. The intrusion detection system can be deployed as a network- or host-based system in order to detect the anomalies. Abusive use is detected based on the correspondence between known models of hostile activities and the database of previous attacks. These models are very effective for identifying known attacks and vulnerabilities, but less relevant in identifying new security threats. Anomaly detection looks for something rare or uncommon, applying statistical or intelligent measurements to compare the current activity to previous knowledge. Intrusion detection systems rely on the fact that they often need many data for the artificial learning algorithms. They generally require more computer resources, as several metrics are often preserved and must be updated for each system activity (Ahmad et al . 2016). The intrusion detection expert system (IDES) (Lunt 1993) developed by Stanford Research Institute (SRI) formulates expert knowledge on the known models of attack and vulnerabilities of the system in the form of if–then rules. The time-based inductive machine (Teng and Chen 1990) learns several sequential models to ensure the detection of anomalies in a network. Several approaches using the artificial neural networks for intrusion detection systems have been proposed (Kang and Kang 2016; Kim et al . 2016; Vinayakumar et al . 2017; Hajimirzaei and Navimipour 2019). AI-based techniques are categorized in various classes (Mukkamala and Sung 2003a; Novikov et al . 2006).
1.3.1. Techniques based on decision trees
Decision trees are powerful and widespread nonparametric learning tools used for classification and prediction problems. Their purpose is to create a model that predicts the values of the target variable, relying on a set of sequences of decision rules deduced from learning data. Rai et al . (2016) have developed an algorithm based on the C4.5 decision tree approach. The most relevant characteristics are selected by means of information gain and the fractional value is selected so that it renders the classifier unbiased with respect to the most frequent values. In the work of Sahu and Babu (2015), a database referred to as ”Kyoto 2006+” is used for the experiments. In Kyoto 2006+, each instance is labeled as “normal” (no attack), “attack” (known attack) and “unknown attack”. The Decision Tree algorithm (J48) is used to classify the packets. Experiments confirm that the generated rules operate with 97.2% accuracy. Moon et al. (2017) proposed an intrusion detection system based on decision trees using packet behavior analysis to detect the attacks. Peng et al. (2018) proposed a technique that involves a preprocessing for data digitization, followed by their normalization, in order to improve detection efficiency. Then a method based on decision trees is used.
1.3.2. Techniques based on data exploration
Data exploration aims to eliminate the manual elements used for the design of intrusion detection systems. Various data exploration techniques have been developed and widely used. The main data exploration techniques are presented in the following sections.
Fuzzy logic has been used in the field of computer networks security, particularly for intrusion detection (Idris and Shanmugam 2005; Shanmugavadivu and Nagarajan 2011; Balan et al . 2015; Kudłacik et al . 2016; Sai Satyanarayana Reddy et al . 2019), for two main reasons. First, several quantitative parameters used in the context of intrusion detection, for example processor use time and connection interval, can be potentially considered as fuzzy variables. Second, the security concept is itself fuzzy. To put it differently, the fuzzy concept helps in preventing a sharp distinction between normal and abnormal behaviors. Kudłacik et al. (2016) have applied fuzzy logic for intrusion detection. The proposed solution analyzes the user activity over a relatively short period of time, creating a local user profile. A more in-depth analysis involves the creation of a more general structure based on a defined number of local user profiles, known as a “fuzzy profile”. The fuzzy profile represents the behavior of the computer system user. Fuzzy profiles are directly used in order to detect user behavior anomalies, and therefore potential intrusions. Idris and Shanmugam (2005) proposed a modified FIRE system. It is a mechanism for the automation of the fuzzy rule generation process and the reduction of human intervention making use of AI techniques.
1.3.2.2. Genetic algorithms
Genetic algorithms are techniques derived from genetics and natural evolution, which have been used to find approximate solutions to optimization and search problems. The main advantages of genetic algorithms are their flexibility and robustness as global search method. As for drawbacks, they are computationally time-consuming, as they handle several solutions simultaneously. Genetic algorithms have been used in various manners in the field of intrusion detection (Hoque et al . 2012; Aslahi-Shahri et al . 2016; Hamamoto et al . 2018). Hoque et al. (2012) presented an intrusion detection system using a genetic algorithm to effectively detect anomalies in the network. Aslahi-Shahri et al. (2016) proposed a hybrid method that uses support vector machines and genetic algorithms for intrusion detection. The results indicate that this algorithm can reach a 97.3% true positive rate and a 1.7% false positive rate.
Читать дальше