In this section, different published research works have been considered and included to indicate the research gaps in the area. The paper typically reviews and includes studies that have been published in reputable databases. Many researchers have demonstrated the use of DL, ML, and hybrid techniques to detect SQLI attacks [23].
A review of SQLI prevention in web applications has been presented in [1]. The authors have provided a summary of 14 different varieties of SQLI attacks and how they affect online applications. Their research's main objective was to investigate alternative SQLI prevention strategies and to offer an analysis of the most effective defense against SQLI attacks.
Authors in [2] have conducted a systematic literature review of 36 articles related to research on SQLI attacks and ML techniques. To classify different varieties of SQLI attacks, they have identified the most widely used ML techniques. Their finding revealed that few studies generated new SQLI attack datasets using ML tools and techniques. Similarly, their results showed that only a few studies focused only on using mutation operators to generate adversarial SQLI attack queries. In future work, the researchers aimed to cover the use of other ML and DL techniques to generate and detect SQLI attacks.
A comprehensive study on SQLI attacks, their mode, detection, and prevention has been presented in [4]. The authors have identified how attackers of this kind might exploit such a weakness and execute weak code as well as a strategy to mitigate such detrimental effects on database systems. The researchers' investigation revealed that web operations were frequently used for online administrations ranging from high levels of informal communication to managing transaction accounts and dealing with sensitive user data. The real issue, however, was that this data was exposed to attacks because of unauthorized access, where the attackers gained entry to the system using various hacking and cracking techniques with very malicious motives. The attacker can use more sophisticated queries and creative tactics to get around authentication while also gaining total control over both the server and the web application. Many cutting-edge algorithms have been developed up to this point to encrypt data queries to defend against such attacks by structuring desirable query modification plans. In the paper, they worked together to discuss the history of injection attacks, different forms of injection attacks, various case studies, and defenses against SQLI attacks, along with an appropriate illustration.
In the work of [5], a survey on SQLI attack detection and prevention has been presented. The research, according to the authors, might help laypeople comprehend SQL and its hazards. It also helps researchers and programmers who wanted to learn about all the problems that still plague web applications and what strategies can be employed to stop SQLI attacks. From the researcher's perspective, it was anticipated that if web application developers adhered to the strategies provided in their study, the online applications would be safe from such damaging attacks.
Detecting web attacks with end-to-end DL was presented in [10]. Three new insights into the study of autonomous intrusion detection systems have come from this work. Firstly, they assessed whether a method based on the resilient software modeling tool (RSMT), which autonomously monitors and describes the runtime behavior of web applications, was feasible for detecting web attacks. A low-dimensional representation of the raw features with unlabeled request data was used to recognize anomalies by computing the reconstruction error of the request data, and they have also described how RSMT trains a stacked denoising autoencoder to encode and reconstruct the call graph for end-to-end DL.
Secondly, they have described how RSMT trains a stacked denoising autoencoder to encode and reconstruct the call graph for end-to-end DL, where a low-dimensional representation of the raw features with unlabeled request data is used to recognize anomalies by computing the reconstruction error of the request data. Thirdly, they have examined the outcomes of empirically testing RSMT on artificial datasets as well as real-world applications that have been intentionally made vulnerable. Finally, the findings demonstrated that the suggested method could efficiently and accurately identify attacks, such as SQLI, cross-site scripting, and deserialization, with a minimum of labeled training data and domain knowledge.
According to [11], SQLI is a common and challenging network attack that can cause inestimable loop-breaking and loss to the database, and how to detect SQLI statements was one of the current research hotspots. Here, how to detect SQLI statements was one of the current research hotspots. As described by the authors, SQLI is a frequent and difficult network assault that can result in immeasurable loop-breaking and loss to the database. An SQLI detection model and technique based on deep neural networks were developed based on the data properties of SQL statements. The main technique used in this case was word pausing the data to turn it into word vectors, then forming a sparse matrix and feeding it into the model for training. Next, a multi-hidden layer deep neural network model with the ReLU function was built, the traditional loss function was optimized, and a dropout method was added to increase the generalizability of this model and over 96% of the final model's accuracy was achieved. Finally, the proposed technique successfully addressed the issues of overfitting in ML and the requirement for manual screening to extract features, which significantly increases the accuracy of SQLI detection by comparing the experimental results with conventional ML and LSTM algorithms.
Black-box detection of XQuery injection and parameter tampering vulnerabilities in web applications has been presented in [8]. To identify XQuery injection and parameter tampering vulnerabilities in online applications powered by native extensible markup language (XML) databases, a black-box fuzzing approach has been proposed. A working prototype of XiParam was created and put to the test on weak web applications that used BaseX, a native XML database, as their backend. The experimental analysis amply proved that the prototype was successful in preventing both XQuery injection and parameter tampering vulnerabilities from being detected.
In the work of [16], an SQLI attack detection and prevention technique using DL has been presented. Based on extensive local and international research, the authors have suggested an SQLI detection method that uses NLP and DL frameworks and does not rely on a background rule base. By allowing the machine to automatically pick up on the language model characteristics of SQLI attacks, the strategy has increased accuracy, decreased false alarm rates, and provided some protection against attacks that were never discovered in advance.
Detection of SQLI attacks has been presented, tested, and compared to 23 ML classifiers using MATLAB [23]. They generated their own datasets, into which they injected abnormal SQL syntax. They checked and manually verified the SQL statements. A total of 616 SQL statements were used to train the test classifiers. They have used ML techniques such as “coarse KNN, bagged trees, linear SVM, fine KNN, medium KNN, RUS boosted trees, subspace discriminant, boosted trees, weighted KNN, cubic KNN, linear discriminant, medium tree, subspace KNN, simple tree, quadratic discriminant, cubic SVM, fine Gaussian SVM, cosine KNN, complex tree, logistic regression, coarse Gaussian SVM, medium Gaussian, and SVM”. The five best models in terms of accuracy were determined to be ensemble boosted, bagged trees, linear discriminant, cubic SVM, and fine Gaussian SVM. They have tested their proposed technique and the results showed that their technique was able to detect the SQLI attack with an accuracy of 93.8%.
The authors of [24] have proposed a model called ATTAR to detect SQLI attacks by analyzing web access logs to extract SQLI attack features. The features were chosen based on access behavior mining and a grammar pattern recognizer. The main target of this model was the detection of unknown SQLI statements that had not been previously used in the training data. Five ML techniques were used for training: NB, random forest, SVM, ID3, and k-means. The experimental results showed that the accuracy of the models based on random forest and ID3 achieved the best results in detecting SQLI attacks.
The authors of [25] have proposed a hybrid CNN-BiLSTM-based model for SQLI attack detection. The authors presented a detailed comparative analysis of different types of ML techniques used for the detection of SQLI attacks. The CNN-BiLSTM approach provided an accuracy of approximately 98%, compared with other described ML techniques.
The authors of [26] have presented an ML classifier to detect SQLI vulnerabilities in PHP code. Multiple ML techniques were trained and evaluated, including random forest, logistic regression, SVM, multilayer perceptron (MLP), LSTM, and CNN. The authors have found that CNN provided the best precision of 95.4%, while a model based on MLP achieved the highest recall 63.7%, and the highest f-measure of 74.6%.
The authors of [27] have proposed an adaptive deep forest model (ADF) with the integration of the AdaBoost technique. AdaBoost stands for adaptive boosting, which is a statistical classification technique, and the deep forest model is a layered model based on a deep neural network. The adaptive deep forest model proposed in [25] achieved high efficiency, comparable to that of traditional ML models, such as decision trees, and better performance compared with regular deep neural network models, such as RNN and CNN.
The authors of [28] have created a dataset using symbolic finite automata to train a classifier to detect SQLI attacks. The generated data were labeled, and training was conducted with an SL model with ML techniques of two-class support vector machine (TC SVM) and two-class logistic regression (TC LR). The generated models were evaluated using a receiver operating characteristic (ROC) curve.
The authors of [29] have proposed an SQLI detection method using ensemble learning techniques and NLP to generate a bag-of-words model used to train a random forest classifier. The prediction was also considered in this research to improve the detection ability of the classifier. In this study, DT, NB, SVM, and KNN classification models were also trained to classify the same testing dataset, and their performances were compared with that of the proposed method. The experimental results showed that the proposed method achieved better accuracy, higher TPR, and lower FNR than the other four classifiers. The evaluation metrics were used to measure the performance of the classifier. The measurements were based on a confusion matrix, accuracy, precision, true-positive rate, false-positive rate, true-negative rate, false-negative rate, receiver operating characteristic curve, and area under the curve.
The authors of [30] have developed a dataset by gathering and combining a large number of smaller datasets. The generated dataset was labeled, and the learning model was SL. They trained seven ML models: DT, AdaBoost, random forest, optimized linear, TensorFlow linear, deep ANN, and a boosted trees classifier. Then, they compared the seven techniques in terms of performance and accuracy. The results showed that the random forest classifier outperformed all other classifiers and achieved an accuracy of 99.8%. The paper also compared the performance of different ML models in detecting SQLI attacks.
The authors of [31] have proposed a novel approach to the detection of SQLI attacks using human agent knowledge transfer (HAT) and TD ML techniques. In this model, an ML agent acted as a maze game to differentiate between normal SQL queries and malicious SQL queries. If the incoming SQL query was an SQLI attack query, then it gained more rewards and was deemed an SQLI attack query before achieving the final state. Finally, the ML technique has achieved an accuracy of 95%.
The authors of [32] have proposed a detection system based on two techniques. The first detection method was based on pattern matching, which is the same as a signature-based detection system whereby the classifier has a database of SQL attack signatures and only inspects the HTTP URL in an attempt to find a match. The second detection method used was based on ML techniques. To build this model, the authors have collected malicious data and trained the classifier with these data by extracting the features representing attacks. They have used techniques such as SVM, NB, and K-nearest neighbor. The performance of the classifier was measured using the total cost ratio (TCR).
The authors of [33] have trained an SVM to detect malicious SQL queries by modeling the WHERE clause of a query as an interaction network of tokens and computing the centrality of the nodes. Node centralities were used to quantify the degree of importance or centrality of a node in the network. The experimental results obtained on a dataset collected from five web applications using some automated attack tools confirmed that three of the centrality measures used in this study can effectively detect SQLI attacks with minimal impact on performance.
The authors of [15] have proposed an LSTM-based SQLI attack detection method, which can automatically learn the effective representation of data and has a strong advantage to confront complex high-dimensional massive data. Additionally, from the standpoint of penetration, this paper has provided an injection sample development method based on data transmission channels. This technique can produce legitimate positive samples and explicitly simulate SQLI attacks. The strategy, in the researcher's opinion, can successfully address the over-fitting issue brought on by a lack of sufficient positive samples. The experimental findings revealed that the suggested method outperformed numerous similar classical ML techniques and widely used DL techniques in terms of improving the accuracy of the SQLI attack detection and reducing the false positive rate. Finally, the experimental results showed that the accuracy, precision, and f1-score of the proposed method were all above 92%.
The authors of [34] have proposed a framework for SQLI prevention via server-side scripting using ML and compiler platforms. A dataset of 1100 samples of SQL commands were trained in four ML techniques such as boosted decision tree, DT, SVM, and an artificial neural network. The results indicated that the DT technique has achieved the highest prediction efficiency among the tested models.
The author of [35] has used the AdaBoost technique to detect SQLI attacks. In this study, the data were converted into stumps, which were classified as weak stumps providing less weight to the output, or strong stumps providing the highest weight in the overall output. The experimental result showed that the proposed technique accurately and effectively detected injection attacks.
The authors of [36] have proposed a method for classifying dynamic SQL queries as either attacks or normal based on a web profile prepared during the training phase. NB, SVM, and parse tree techniques were used for the classification process. The overall detection rate using the two datasets was 91% and 90%, respectively.
The author of [37] has designed a method to detect malicious SQL queries. The DT technique was used for the classification processes to detect different levels of SQLI. The proposed model maintained an accuracy of more than 98% in detecting SQLI attacks and an accuracy of 92% in classifying the level of attack as simple, unified, or lateral.
The authors of [38] have presented a simple method for SQLI attack detection based on an artificial neural network. First, a large amount of SQLI data were analyzed to extract the relevant features. Then, a variety of neural network models, such as MLP and LSTM, were trained. The experimental results showed that the detection rate of MLP was better than that of LSTM.
The authors of [39] have automatized the process of exploiting SQLI attacks through reinforcement learning agents. In this study, the problem was modeled as a Markov decision process. The experimental results showed that reinforcement learning agents can be used in the future to perform security assessment and penetration testing.
The authors of [40] have presented a detection method by modeling SQL queries as a graph of tokens and utilized the centrality measure of tokens to train single and multiple SVM classifiers. The system was tested using directed and undirected graphs with different SVM classifiers. The experimental results demonstrated that the proposed technique can effectively identify malicious SQL queries.
The authors of [41] have presented a model of a two-class support vector machine (TCSVM) to predict binary labeled outcomes concerning whether an SQLI attack was positive or negative in a web request. This model has intercepted web requests at the proxy level and applied ML predictive analytics to predict SQLI attacks.
The authors of [42] have presented a novel approach for classifying SQL queries. A gap-weighted string subsequence kernel technique was used to compute the similarity metric between the query strings. Then, the SVM was trained on the similarity metrics to determine whether the query strings were normal or malicious. The proposed approach was evaluated using many datasets and achieved an accuracy of 92.48%.
The authors of [43] have presented a new approach to the construction of a dataset with a NoSQL query database. Six classification techniques were trained and evaluated to identify SQLI attacks, which included: DT, SVM, random forest, KNN, neural network, and multilayer perceptron. The experimental results showed that the last two techniques obtained an accuracy of 97.6%.
The authors of [44] have trained a progressive neural network model with an NB classifier to successfully detect SQLI attacks. Progressive neural networks were trained using parameters such as error-based, time-based, SQL query, and union-based SQLI attacks. The proposed method has achieved an accuracy of 97.897%.
The authors of [45] have proposed a hybrid approach using tree-vector kernels in SVM to learn SQL statements. The authors used both the parse tree structure of SQL queries and the query value similarity characteristic to distinguish between malicious and benign queries. The results confirmed the benefit of incorporation to efficiently and accurately identify abnormal queries.
The work [14] has presented the detection of SQLI behaviors using word vectors and long short-term memory (LSTM). A unique technique for detecting SQLI attacks based on a word vector of SQL tokens and LSTM neural networks was presented in this paper. In the suggested approach, SQL query strings were first syntactically broken down into tokens, after which a word vector of SQL tokens was built using the likelihood ratio test, and finally, an LSTM model was trained using sequences of token word vectors. They created a tool called WOVSQLI to implement the suggested method, and it was tested using a set of data from several sources. The performed experiments showed that WOVSQLI was capable of reliably detecting SQLI attacks. Finally, the results of the experiment showed that the proposed tool achieved an accuracy of 98.60%.
The authors of [46] have proposed a DL-based approach to detect SQLI attacks in network traffic. The proposed approach selected only the target features needed by the model to be trained using a deep belief network (DBN) model. The authors also employed test data to test the performance of different models, including LSTM, CNN, and MLP. According to the experimental results, DBN achieved an accuracy of 96%.
The authors of [47] have proposed a framework that combined the EDADT (efficient data adaptive decision tree) technique and the SVM classification technique to detect SQLI attacks. The used dataset was created using the MovieLens dataset system for movie recommendations, which included user login and movie details. The experimental results showed that the proposed approach achieved an accuracy of 99.87%.
The authors of [48] have proposed a method for detecting SQLI using the NB ML technique. The authors have applied a tokenization process to break the query into meaningful elements called tokens. Then, the list of tokens became an input for further classification processes. The result of the NB technique was analyzed using precision, recall, and accuracy.
Evading web application firewalls through adversarial machine learning has been presented in [49]. They have presented WAF-A-MoLE, a tool that models the presence of an adversary. This tool leverages a set of mutation operators that alter the syntax of a payload without affecting the original semantics. The researchers have evaluated the performance of the tool against existing WAFs, which they have trained using their publicly available SQL query dataset. Finally, they showed that WAF-A-MoLE bypasses all the considered ML-based WAFs.
Deep semantic learning for testing SQLI has been presented in [50]. The paper has proposed a deep natural language processing-based tool, dubbed DeepSQLI, to generate test cases for detecting SQLI vulnerabilities. Through adopting DL based neural language model and sequence of words prediction, DeepSQLI was equipped with the ability to learn the semantic knowledge embedded in SQLI attacks, allowing it to translate user inputs (or a test case) into a new test case, which was semantically related and potentially more sophisticated. The experiments were conducted to compare DeepSQLI with SQLmap, a state-of-the-art SQLI testing automation tool, on six real-world web applications that were of different scales, characteristics, and domains. The empirical results demonstrated the effectiveness and the remarkable superiority of DeepSQLI over SQLmap, such that more SQLI vulnerabilities can be identified by using a less number of test cases, whilst running much faster.
Behind an application firewall, are we safe from SQLI attacks has been presented in [51]. The paper was focused on web application firewalls and SQLI attacks. They have presented an ML-based testing approach to detect holes in firewalls that let SQLIattacks bypass. In the beginning, the approach can automatically generate diverse attack payloads, which can be seeded into inputs of web-based applications, and then submit them to a system that was protected by a firewall. Incrementally learning from the tests that were blocked or passed by the firewall, their approach can then select tests that exhibit characteristics associated with bypassing the firewall and mutate them to efficiently generate new bypassing attacks. In the race against cyber attacks, time was vital. Being able to learn and anticipate more attacks that can circumvent a firewall promptly was very important to quickly fix or fine-tune the firewall. They have developed a tool that implements the approach and evaluated it on ModSecurity, a widely used application firewall. The results they obtained suggest good performance and efficiency in detecting holes in the firewall that could let SQLI attacks go undetected.
Automatic detection of NoSQLI using SL has been presented in [52]. They have developed a tool for detecting NoSQLI using SL. To the best of their knowledge, their developed training dataset on NoSQLI was the first of its kind. They manually designed important features and apply various SL techniques. Their tool has achieved a 93.00 f1-score as established by tenfold cross-validation. They also applied their tools to a NoSQLI generating tool, NoSQLMap, and find that their tool outperforms Sqreen, the only available NoSQLI detection tool, by 36.25% in terms of detection rate. The proposed technique was also shown to be database-agnostic achieving similar performance with injection on MongoDB and CouchDB databases.
A framework for SQLI investigations detection, investigation, and forensics has been presented in [53]. This paper has proposed a framework of SQLI investigation architecture and has proved its feasibility in fighting against SQLI attacks. An effective and efficient approach was also proposed to prosecute SQLI aggressors and keep them away from abusing the database.
The development of a compressive framework using ML Approaches for SQLI attacks has been presented in [54]. The paper investigates the most common SQLI attack forms, their mechanisms, and a method of identifying them based on the existence of the SQL query. Furthermore, they have proposed a comprehensive framework for determining the effectiveness of the proposed techniques in addressing a variety of issues based on the type of attack using DL, ML, and hybrid techniques. A thorough examination of the model using a test set revealed that the hybrid approach and ANN outperform NB, SVM, and DT in terms of classifying injected queries. The NB, on the other hand, outperforms the other approaches in terms of web loading time during testing. The results showed an accuracy of 99.16% for ANN and the hybrid technique has an accuracy of 99.6%, making it the best trained among the others. As a result, the proposed method improved the detection and prevention of SQLI attacks. They used a small dataset for training and testing in this study, but maximizing the dataset and implementing the model in practice was recommended for future researchers.
The development of a compressive framework using ML Approaches for SQLI attacks has been presented in [54]. The paper investigates the most common SQLI attack forms, their mechanisms, and a method of identifying them based on the existence of the SQL query. Furthermore, they have proposed a comprehensive framework for determining the effectiveness of the proposed techniques in addressing a variety of issues based on the type of attack using DL, ML, and hybrid techniques. A thorough examination of the model using a test set revealed that the hybrid approach and ANN outperform NB, SVM, and DT in terms of classifying injected queries. The NB, on the other hand, outperforms the other approaches in terms of web loading time during testing. The results showed an accuracy of 99.16% for ANN and the hybrid technique has an accuracy of 99.6%, making it the best trained among the others. As a result, the proposed method improved the detection and prevention of SQLI attacks. They used a small dataset for training and testing in this study, but maximizing the dataset and implementing the model in practice was recommended for future researchers.
According to the recommendation of [54], the detection and prevention rate of the system can be improved by increasing the training and testing dataset and by using the recommended techniques. So, we have proposed this research work to increase the systems detection and prevention rate of SQLI attacks in different web applications.