Research Article

An NLP-Driven Intelligent Video Query System for Interactive Video Retrieval

by  Shailendra Singh Kathait, Ashish Kumar, Samay Sawal
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 81
Published: February 2026
Authors: Shailendra Singh Kathait, Ashish Kumar, Samay Sawal
10.5120/ijca2026926403
PDF

Shailendra Singh Kathait, Ashish Kumar, Samay Sawal . An NLP-Driven Intelligent Video Query System for Interactive Video Retrieval. International Journal of Computer Applications. 187, 81 (February 2026), 1-6. DOI=10.5120/ijca2026926403

                        @article{ 10.5120/ijca2026926403,
                        author  = { Shailendra Singh Kathait,Ashish Kumar,Samay Sawal },
                        title   = { An NLP-Driven Intelligent Video Query System for Interactive Video Retrieval },
                        journal = { International Journal of Computer Applications },
                        year    = { 2026 },
                        volume  = { 187 },
                        number  = { 81 },
                        pages   = { 1-6 },
                        doi     = { 10.5120/ijca2026926403 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2026
                        %A Shailendra Singh Kathait
                        %A Ashish Kumar
                        %A Samay Sawal
                        %T An NLP-Driven Intelligent Video Query System for Interactive Video Retrieval%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 81
                        %P 1-6
                        %R 10.5120/ijca2026926403
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Efficient analysis of large-scale urban surveillance video remains a critical challenge for traffic management authorities. This paper introduces a unified video query system that enables naturallanguage– driven retrieval of traffic violation events from continuous CCTV feeds. This approach builds on state-of-the-art deeplearning detectors for a diverse set of infractions—including helmet non-compliance and cycle-lane misuse, illegal parking, overspeeding and wrong-way driving, and pedestrian tracking via facial recognition and augments them with fine-grained attribute extraction (vehicle type, color, carrying objects, timestamp, and spatial region). Detected events are stored in a multi-attribute database that supports compound filters. An integrated large language model (LLM) translates free-form user queries into structured query specifications (e.g., “Show me all red motorcycles speeding above 60 km/h between 6 AM and 8 AM”), automatically resolving synonyms, time-range interpretations, and attribute mappings. Retrieved results are presented as ranked frame sequences, complete with annotated bounding boxes and metadata, and can be reviewed via an interactive dashboard. This system demonstrates that natural-language–based video querying, when tightly coupled with a rich, structured violation index, can dramatically accelerate incident investigation and support data-driven traffic enforcement.

References
  • Shailendra Singh Kathait, Ashish Kumar, Ram Patidar, Khushi Agrawal, Samay Sawal (2024). Computer Vision and Deep Learning based Approach for Traffic Violations due to Overspeeding and Wrong Direction Detection. International Journal of Computer Applications, paper-id: 6e503f15-f6c9- 4ee2- 9212-4db588484729, DOI: 10.5120/ijca2025924477.
  • Shailendra Singh Kathait, Ashish Kumar, Ram Patidar, Khushi Agrawal, Samay Sawal (2024). Computer Vision and Deep Learning based Approach for Violations due to Illegal Parking Detection. International Journal of Computer Applications, DOI: 10.5120/ijca2025924506.
  • Shailendra Singh Kathait, Ashish Kumar, Ram Patidar, Khushi Agrawal, Samay Sawal (2024). Deep Learning-based Approach for Detecting Traffic Violations Involving No Helmet Use and Wrong Cycle Lane Usage. International Journal of Computer Applications, DOI: 10.5120/ijca2025924714.
  • Shailendra Singh Kathait, Ashish Kumar, Ram Patidar, Khushi Agrawal, Samay Sawal (2024). Deep Learning-Based Person Tracking: A Smart Approach to Security and Civic Monitoring. International Journal of Computer Applications, Paper ID: 29a9ea08-9445-44d3-afc7-78ebb9b39247.
  • N. Wojke, A. Bewley, and D. Pauls, “Simple Online and Realtime Tracking with a Deep Association Metric,” in Proc. IEEE Int. Conf. Image Process., 2017, pp. 3645–3649.
  • H. Li, Y. Qi, X. Tian, W. Yao, and J. Liu, “A Survey on MultiObject Tracking: Metrics, Benchmarks, and Best Practices,” ACM Comput. Surv., vol. 54, no. 4, pp. 1–45, 2022.
  • A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early Years,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380, 2000
  • J. Xu, T. Mei, T. Yao, and Y. Fang, “MSR-VTT: A Large Video Description Dataset for Bridging Video and Language,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 5288–5296.
  • J. Gao, C. Sun, Z. Yang, and R. Nevatia, “TALL: Temporal Activity Localization via Language Query,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 5267–5275.
  • X. Zhou, Y. Wang, and L. Chen, “XMODE: Explainable Multi-Modal Database Exploration,” in Proc. ACM Int. Conf. Multimedia, 2023, pp. 1234–1243.
  • T. Brown et al., “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm,” OpenAI Blog, 2022.
  • Y. LeCun, “CityVision: Real-Time AI Surveillance at Scale for Large-Scale Events,” AI Mag., vol. 45, no. 2, pp. 55–63, 2024.
  • P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128 × 128 ¨ 120 dB 15 s Latency Asynchronous Temporal Contrast Vision Sensor,” IEEE J. Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2014.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Computer Vision Facial Recognition Person Tracking Deep Learning Object Detection Multi-Object Tracking Real-Time Processing OpenCV YOLO

Powered by PhDFocusTM