Optimizing Prompt Refinement: Algorithmic Strategies for Large Language Model-Based Text Classification

Ziqiao Ao; Juhi Singh; Sebastian Antinome

Research Article

Optimizing Prompt Refinement: Algorithmic Strategies for Large Language Model-Based Text Classification

by Ziqiao Ao, Juhi Singh, Sebastian Antinome

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 78

Published: February 2026

Authors: Ziqiao Ao, Juhi Singh, Sebastian Antinome

10.5120/ijca2026926329

PDF

Ziqiao Ao, Juhi Singh, Sebastian Antinome . Optimizing Prompt Refinement: Algorithmic Strategies for Large Language Model-Based Text Classification. International Journal of Computer Applications. 187, 78 (February 2026), 1-10. DOI=10.5120/ijca2026926329

                        @article{ 10.5120/ijca2026926329,
                        author  = { Ziqiao Ao,Juhi Singh,Sebastian Antinome },
                        title   = { Optimizing Prompt Refinement: Algorithmic Strategies for Large Language Model-Based Text Classification },
                        journal = { International Journal of Computer Applications },
                        year    = { 2026 },
                        volume  = { 187 },
                        number  = { 78 },
                        pages   = { 1-10 },
                        doi     = { 10.5120/ijca2026926329 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2026
                        %A Ziqiao Ao
                        %A Juhi Singh
                        %A Sebastian Antinome
                        %T Optimizing Prompt Refinement: Algorithmic Strategies for Large Language Model-Based Text Classification%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 78
                        %P 1-10
                        %R 10.5120/ijca2026926329
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The performance of Large Language Models (LLMs) for text classification depends on how well prompts are designed and refined. This paper presents a structured framework for improving prompt refinement strategies for LLM-based classification, with a focus on question-type classification for Microsoft technical certification exams. Several prompt optimization techniques were evaluated, including Chain of Thought (CoT), Self-Consistency, Tree of Thought (ToT), and different configurations of Retrieval- Augmented Generation (RAG) were evaluated. A modular prompt structure was also developed to support category-specific evaluation and improve decision consistency. Experiments were conducted in three stages: (1) tuning and comparing prompt refinement techniques, (2) optimizing RAG retrieval parameters, and (3) applying a modular rule-based approach to enhance classification reliability. Experimental results indicate that the proposed framework enhances classification performance, achieving an absolute improvement of approximately 40 percentage points in F1 score compared to baseline prompting methods. The methodology can be adapted to educational assessment, automated content analysis, and other text classification applications.

References

Abdelrahman Abdallah, Bhawna Piryani, Jamshid Mozafari, Mohammed Ali, and Adam Jatowt. Rankify: A comprehensive python toolkit for retrieval, re-ranking, and retrievalaugmented generation, 2025.
Shijie Chen, Bernal Jim´enez Guti´errez, and Yu Su. Attention in large language models yields efficient zero-shot re-rankers, 2024.
ChunLiu ChunLiu, Hongguang Zhang, Kainan Zhao, Xinghai Ju, and Lin Yang. Llmembed: Rethinking lightweight llm’s genuine function in text classification. arXiv (Cornell University), pages 7994–8004, 01 2024.
Aleksandra Edwards and Jose Camacho-Collados. Language models for text classification: Is in-context learning enough? ACL Anthology, pages 10058–10072, 05 2024.
Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, and Pengfei Liu. GPTScore: Evaluate as you desire. In Kevin Duh, Helena Gomez, and Steven Bethard, editors, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6556–6576, Mexico City, Mexico, June 2024. Association for Computational Linguistics.
Ali Hakimi Parizi, Yuyang Liu, Prudhvi Nokku, Sina Gholamian, and David Emerson. A comparative study of prompting strategies for legal text classification. In Daniel Preot,iuc- Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, and Nikolaos Aletras, editors, Proceedings of the Natural Legal Language Processing Workshop 2023, pages 258–265, Singapore, December 2023. Association for Computational Linguistics.
Chi Hu, Yuan Ge, Xiangnan Ma, Hang Cao, Qiang Li, Yonghua Yang, Tong Xiao, and Jingbo Zhu. Rankprompt: Step-by-step comparisons make language models better reasoners, 2024.
Taeho Hwang, Soyeong Jeong, Sukmin Cho, SeungYoon Han, and Jong C Park. Dslr: Document refinement with sentence-level re-ranking and reconstruction to enhance retrieval-augmented generation, 07 2024.
Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xin Zhou. Better zero-shot reasoning with role-play prompting, 08 2023.
Alice Kwak, Clayton Morrison, Derek Bambauer, and Mihai Surdeanu. Classify first, and then extract: Prompt chaining technique for information extraction. In Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, C˘at˘alina Goant, ˘a, Daniel Preot,iuc-Pietro, and Gerasimos Spanakis, editors, Proceedings of the Natural Legal Language Processing Workshop 2024, pages 303–317, Miami, FL, USA, November 2024. Association for Computational Linguistics.
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher R´e, Diana Acosta- Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, and Yuta Koreeda. Holistic evaluation of language models, 11 2022.
Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries, 07 2004.
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55, 09 2022.
Robert Logan IV, Ivana Balazevic, Eric Wallace, Fabio Petroni, Sameer Singh, and Sebastian Riedel. Cutting down on prompts and parameters: Simple few-shot learning with language models. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Findings of the Association for Computational Linguistics: ACL 2022, pages 2824–2835, Dublin, Ireland, May 2022. Association for Computational Linguistics.
Soumya Mishra. ESG impact type classification: Leveraging strategic prompt engineering and LLM fine-tuning. In Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, Hsin- Hsi Chen, Hiroki Sakaji, and Kiyoshi Izumi, editors, Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing, pages 72–78, Bali, Indonesia, November 2023. Association for Computational Linguistics.
Gabriel Orlanski. Evaluating prompts across multiple choice tasks in a zero-shot setting, 2022.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, 2001.
Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui. Mauve: Measuring the gap between neural text and human text using divergence frontiers, 2021.
Victor Sanh, Albert Webson, Colin Raffel, Stephen Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Hyperscience Teven, Le Scao, Arun Raja, Manan Dey, Sap Saiful, Bari Ntu, Singapore Xu, Urmish Thakker, Shanya Sharma, Walmart Labs, Eliza Szczechla, Bigscience Taewoon, Kim Vu, Amsterdam Gunjan, Chhablani Bigscience, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian, Jiang Zeals, Japan Wang, Matteo Manica, Sheng Shen, U Berkeley, Zheng-Xin Yong, Harshit Pandey, Bigscience Michael, Mckenna Parity, Rachel Inria, France Thomas, Wang Inria, France Trishala, Neeraj Bigscience, Jos Rozen, Andrea Santilli, Thibault Fevry, Bigscience Jason, Alan Fries, Snorkel Ai, Ryan Teehan, Charles River, Analytics Bers, Stella Biderman Booz, Eleutherai Leo, Gao Eleutherai, Thomas Wolf, and Alexander Rush. Multitask prompted training enables zero-shot task generalization, 03 2022.
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv:2203.11171 [cs], 10 2022.
Yuqi Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, and Suparna De. Prompt-based zero-shot text classification with conceptual knowledge. In Vishakh Padmakumar, Gisela Vallejo, and Yao Fu, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 30–38, Toronto, Canada, July 2023. Association for Computational Linguistics.
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models, 05 2023.
Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, and Bryan Catanzaro. Rankrag: Unifying context ranking with retrieval-augmented generation in llms, 2024.
Lei Zhang, Yunshui Li, Ziqiang Liu, Jiaxi Yang, Junhao Liu, Longze Chen, Run Luo, and Min Yang. Marathon: A race through the realm of long context with large language models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5201–5217, 2024.
Tianjun Zhang, Shishir G Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, and Joseph E Gonzalez. Raft: Adapting language model to domain specific rag. arXiv (Cornell University), 03 2024.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. arXiv:1904.09675 [cs], 02 2020.
Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, and Jie Tang. Longrag: A dual-perspective retrieval-augmented generation paradigm for long-context question answering. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 22600–22632, 2024.
Tony Z. Zhao, EricWallace, Shi Feng, Dan Klein, and Sameer Singh. Calibrate before use: Improving few-shot performance of language models. arXiv:2102.09690 [cs], 06 2021.
Yujia Zhou, Yan Liu, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Zheng Liu, Chaozhuo Li, Zhicheng Dou, Tsung-Yi Ho, and Philip S Yu. Trustworthiness in retrieval-augmented generation systems: A survey, 2024.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Prompt Engineering Retrieval-Augmented Generation Large Language Models Text Classification Modularized Prompting