Analysis of Document Retrieval for Online Public Administrative Procedure Services

  • Dinh-Dien La Department of Information and Communications, Ha Giang Province, Viet Nam
  • Van-Hieu Nguyen Institute of Applied Science and Technology - IAST, University of Information and Communication Technology, Thai Nguyen University, Vietnam
  • Trung-Nghia Phung Institute of Applied Science and Technology - IAST, University of Information and Communication Technology, Thai Nguyen University, Vietnam
  • Khanh-Van Tran Institute of Applied Science and Technology - IAST, University of Information and Communication Technology, Thai Nguyen University, Vietnam
Keywords: Document retrieval, ensemble Models, public administrative procedures

Abstract

Public Administrative Procedures (APs) are processes, implementation methods, documents, and requirements or conditions prescribed by government agencies or authorized individuals to address specific tasks related to individuals or organizations. However, organizations and citizens still face challenges in easily and conveniently accessing information and public administrative services. This paper investigates advanced techniques to address the challenges of document retrieval for public administrative services. We implement a hybrid retrieval process, combining traditional retrieval models such as TF-IDF and BM25 with modern models such as SBERT and fine-tuning models. Results demonstrate that combining these models significantly enhances retrieval performance. The ensemble model of BM25 and finetuned SBERT achieved
the highest F2 score, indicating superior effectiveness in information retrieval.

Author Biographies

Dinh-Dien La, Department of Information and Communications, Ha Giang Province, Viet Nam

Dinh-Dien La is a PhD student majoring in computer science, University of Information and Communications Technology, Thai Nguyen University. He is currently Deputy Director of the Department of Information
and Communications of Ha Giang province, in charge of digital transformation. His research interests are data
science, machine learning, and deep learning in the domain of law and public administration.

Van-Hieu Nguyen, Institute of Applied Science and Technology - IAST, University of Information and Communication Technology, Thai Nguyen University, Vietnam

Van-Hieu Nguyen is pursuing an Engineering degree in Information Technology at the University of Information and Communication Technology in Thai Nguyen, Vietnam. He is involved with the Institute of Applied Science and Technology in Thai Nguyen. His research interests include machine learning, deep learning, and natural
language processing.

Trung-Nghia Phung, Institute of Applied Science and Technology - IAST, University of Information and Communication Technology, Thai Nguyen University, Vietnam

Assoc. Prof. Trung-Nghia Phung received his Engineering degree in Electronics and Telecommunications from Hanoi University of Science and Technology (HUST) in 2002. He completed his Master of Science degree in Telecommunications from Vietnam National University –Hanoi (VNUH) in 2007 and his PhD degree in Information Science from Japan Advanced Institute of Science and Technology (JAIST) in 2013. He was Dean of Faculty of Electronics and Telecommunications, Head of Academic Affairs, and he has been Rector of Thai Nguyen University of Information and Communication Technology (ICTU). He has been a Vice President of
Vietnam Club of Faculties-Institutes-Schools-Universities of ICT (FISU) and President of FISU Branch in the Northern Midlands, Mountains and Coastal Region of Vietnam. His research interests include machine learning and deep learning.

Khanh-Van Tran, Institute of Applied Science and Technology - IAST, University of Information and Communication Technology, Thai Nguyen University, Vietnam

Van-Khanh Tran received Ph.D. in Natural Language Processing from the Japan Advanced Institute of Science and Technology (JAIST). He is currently an AI Research Scientist on the NLP team at FPT Smart Cloud’s Generative AI (GenAI) Center, where he focuses on developing large language models and AI assistant ecosystems tailored for Vietnamese users. He also serves as the Deputy Head of the Institute of Applied Science and Technology. His research interests include natural language processing, large language models, and AI applications in the legal, healthcare, and finance domains.

References

No.76/NQ-CP. (2021) Nghi-quyet-76-nq-cp-2021. [Online]. Available: https://datafiles.chinhphu.vn/cpp/files/vbpq/2021/07/76.signed.pdf(Vietnamese)

F. Ortiz-Rodríguez, R. Palma, and B. Villazón-Terrazas, “Egoir: ontology-based information retrieval intended for egovernment,” in Informatik 2007–Informatik trifft Logistik– Band 1. Gesellschaft f¨ur Informatik e. V., 2007, pp. 237–241.

L. Cheng, Y. Yang, K. Zhao, and Z. Gao, “Research and improvement of tf-idf algorithm based on information theory,” Advances in Intelligent Systems and Computing, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID: 198317927

M. Ogbi and M. Aminilari, “Bm25 ranking algorithm development using matching concepts in unstructured text,” 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:9098595

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2018.

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.

M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information processing & management, vol. 45, no. 4, pp. 427–437, 2009.

Published
2024-11-25