Research Article

LoRA Based Fine Tuning of CodeBERT for SQL Injection and Cross-Site Scripting Detection in PHP Source Code

by  Eka Patriya, Purwanti, Maulana Mujahidin
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 82
Published: February 2026
Authors: Eka Patriya, Purwanti, Maulana Mujahidin
10.5120/ijca2026926442
PDF

Eka Patriya, Purwanti, Maulana Mujahidin . LoRA Based Fine Tuning of CodeBERT for SQL Injection and Cross-Site Scripting Detection in PHP Source Code. International Journal of Computer Applications. 187, 82 (February 2026), 63-70. DOI=10.5120/ijca2026926442

                        @article{ 10.5120/ijca2026926442,
                        author  = { Eka Patriya,Purwanti,Maulana Mujahidin },
                        title   = { LoRA Based Fine Tuning of CodeBERT for SQL Injection and Cross-Site Scripting Detection in PHP Source Code },
                        journal = { International Journal of Computer Applications },
                        year    = { 2026 },
                        volume  = { 187 },
                        number  = { 82 },
                        pages   = { 63-70 },
                        doi     = { 10.5120/ijca2026926442 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2026
                        %A Eka Patriya
                        %A Purwanti
                        %A Maulana Mujahidin
                        %T LoRA Based Fine Tuning of CodeBERT for SQL Injection and Cross-Site Scripting Detection in PHP Source Code%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 82
                        %P 63-70
                        %R 10.5120/ijca2026926442
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Web application vulnerabilities such as SQL Injection (SQLi) and Cross-Site Scripting (XSS) remain critical security threats, particularly in PHP-based applications. Although recent advances in pretrained language models have shown strong potential for automated source code vulnerability detection, conventional fine-tuning approaches often incur high computational and memory costs. This paper proposes a parameter-efficient vulnerability detection framework based on LoRA based fine-tuning of CodeBERT for classifying PHP source code into SQL Injection, XSS, and benign categories. The proposed approach integrates systematic source code preprocessing, Byte Pair Encoding–based tokenization, and Low-Rank Adaptation to significantly reduce the number of trainable parameters while preserving the representational power of the pretrained model. Experimental results show that the proposed method achieves high detection performance, reaching an overall accuracy of 97% while fine-tuning less than 1% of the total model parameters. These findings demonstrate that LoRA-enhanced CodeBERT provides an effective and computationally efficient solution for automated SQL Injection and XSS detection in PHP source code, making it suitable for practical deployment in resource-constrained environments.

References
  • S. Iannone, L. De Maio, and A. Santone, “A systematic literature review on automated software vulnerability detection,” ACM Computing Surveys, vol. 56, no. 3, pp. 1–39, 2023, doi: 10.1145/3699711.
  • J. Wang, “Survey of deep learning models for software vulnerability detection,” Applied and Computational Engineering, vol. 92, pp. 95–100, 2024, doi: 10.54254/2755-2721/92/20241392.
  • M. Ghalleb, “Source code vulnerability detection using deep learning: A comprehensive review,” Journal of Cloud Computing, vol. 14, no. 1, pp. 1–28, 2025, doi: 10.1186/s42400-025-00518-7.
  • OWASP Foundation, OWASP Top Ten Web Application Security Risks – 2024, OWASP, 2024.
  • M. Stock, S. Lekies, T. Mueller, and M. Johns, “Precise client-side protection against DOM-based cross-site scripting,” IEEE Symposium on Security and Privacy, pp. 655–670, 2021, doi: 10.1109/SP40001.2021.00054.
  • Z. Feng, D. Guo, D. Tang, et al., “CodeBERT: A pre-trained model for programming and natural languages,” in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 1536–1547, 2020, doi: 10.18653/v1/2020.emnlp-main.139.
  • P. Pan, L. Lu, and B. Xu, “Transfer learning for software defect prediction using pretrained language models,” Journal of Systems and Software, vol. 183, 2022, Art. no. 111088, doi: 10.1016/j.jss.2021.111088.
  • M. Akshar, R. Patel, and S. Mehta, “Ensemble learning with CodeBERT for software vulnerability detection on imbalanced datasets,” Expert Systems with Applications, vol. 245, 2024, Art. no. 122814, doi: 10.1016/j.eswa.2024.122814.
  • H. Yang, Y. Zhang, and X. Li, “Automatic detection of SQL injection vulnerabilities using CodeBERT and LSTM,” Information and Software Technology, vol. 167, 2024, Art. no. 107343, doi: 10.1016/j.infsof.2023.107343.
  • A. Vokhranov and A. Bulakh, “Transformer-based vulnerability detection for Python programs using RunBugRun dataset,” Software Quality Journal, vol. 32, no. 1, pp. 1–27, 2024, doi: 10.1007/s11219-023-09645-8.
  • V. L. A. Quan, C. T. Phat, K. V. Nguyen, et al., “XGV-BERT: Leveraging contextualized language models and graph neural networks for software vulnerability detection,” IEEE Access, vol. 11, pp. 118734–118748, 2023, doi: 10.1109/ACCESS.2023.3318124.
  • Y. Zhang, J. Liu, and Q. Wang, “Self-supervised learning for source code vulnerability detection,” IEEE Transactions on Software Engineering, early access, 2025, doi: 10.1109/TSE.2025.3358127.
  • E. J. Hu, Y. Shen, P. Wallis, et al., “LoRA: Low-rank adaptation of large language models,” in Proc. Int. Conf. Learning Representations (ICLR), 2022.
  • A. Ammar and A. M. Alharbi, “SQL injection detection using fine-tuned CodeBERT,” Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27852–27857, 2025.
  • S. Shafiq, Z. Rashid, and A. R. Shahid, “Machine learning-based detection of web attacks: A comprehensive study,” Arabian Journal for Science and Engineering, vol. 50, pp. 1123–1142, 2025, doi: 10.1007/s13369-024-09916-4.
  • J. Li, Y. Zhou, and H. Chen, “Hybrid deep learning approaches for SQL injection and XSS attack detection,” Computers & Security, vol. 140, 2024, Art. no. 103783, doi: 10.1016/j.cose.2024.103783.
  • T. Chen, X. Zhang, and L. Wang, “Graph-based code representation learning for vulnerability detection,” IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 2, pp. 876–889, 2024, doi: 10.1109/TDSC.2023.3294556.
  • R. Zhou and S. Kim, “A large-scale empirical study of vulnerability detection using pretrained code models,” Empirical Software Engineering, vol. 29, no. 4, pp. 1–34, 2024, doi: 10.1007/s10664-024-10412-9.
  • K. Zhao, Y. Liu, and M. Harman, “Deep learning for web application vulnerability detection: Trends and challenges,” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 1, pp. 1–38, 2024, doi: 10.1145/3638324.
  • The MITRE Corporation, Common Weakness Enumeration (CWE) Top 25 Most Dangerous Software Weaknesses, MITRE, 2024.
  • S. Neuhaus and T. Zimmermann, “Security trend analysis with CVE topic models,” in Proc. IEEE Symp. Security and Privacy, 2010, pp. 111–124.
  • Z. Li, L. Tan, and X. Wang, “Vulnerability detection using deep learning,” IEEE Trans. Dependable Secure Comput., vol. 19, no. 3, pp. 1503–1516, 2022.
  • OWASP Foundation, OWASP Top 10 Web Application Security Risks, 2023.
  • M. Howard and D. LeBlanc, Writing Secure Code, 2nd ed. Redmond, WA, USA: Microsoft Press, 2003.
  • W. G. Halfond, J. Viegas, and A. Orso, “A classification of SQL-injection attacks,” in Proc. IEEE Int. Symp. Software Testing, 2006, pp. 65–81.
  • Y. Zhou and A. Sharma, “Automated detection of XSS vulnerabilities,” Comput. Secur., vol. 98, pp. 1–12, 2020.
  • Y. Li et al., “Deep learning based vulnerability detection: A survey,” IEEE Access, vol. 9, pp. 115395–115420, 2021.
  • V. Raychev, M. Vechev, and E. Yahav, “Code representation learning,” Commun. ACM, vol. 63, no. 6, pp. 94–103, 2020.
  • F. Feng et al., “CodeBERT: A pre-trained model for programming and natural languages,” in Proc. EMNLP, 2020, pp. 1536–1547.
  • J. Devlin et al., “BERT: Pre-training of deep bidirectional transformers,” in Proc. NAACL, 2019, pp. 4171–4186.
  • E. Hu et al., “LoRA: Low-rank adaptation of large language models,” in Proc. ICLR, 2022.
  • A. Dettmers et al., “Parameter-efficient fine-tuning of large models,” Adv. Neural Inf. Process. Syst., vol. 35, 2022.
  • J. Xu et al., “Efficient transfer learning for source code analysis,” IEEE Trans. Software Eng., vol. 49, no. 4, pp. 1872–1886, 2023.
  • T. Chen et al., “Transformer-based vulnerability detection,” IEEE Access, vol. 10, pp. 91234–91246, 2022.
  • R. Wang et al., “Practical vulnerability detection using pretrained models,” Comput. Secur., vol. 123, 2023.
  • C. G. De Souza et al., “Evaluation metrics for security classification,” J. Inf. Secur., vol. 12, no. 2, pp. 89–102, 2021.
  • H. Zhang et al., “Benchmarking deep learning models for vulnerability detection,” IEEE Softw., vol. 40, no. 1, pp. 45–53, 2023.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

CodeBERT LoRA Fine Tuning PHP SQL Injection

Powered by PhDFocusTM