Low-Barrier Object Detection for Mobile Applications

  • Khoi Nguyen The Department of Information Technology, FPT University HCMC
  • Tuong Ho Vinh Department of Information Technology, TDT University
  • Anh Hoang School of Business Information Technology, College of Technology and Design, University of Economics Ho Chi Minh City https://orcid.org/0000-0003-2754-3710
Keywords: Distillation, object detection, android application

Abstract

This paper investigates innovative applications of image processing and object recognition methods aimed at simplifying the creation and deployment of object detection models. By doing so, we seek to expand access to advanced computer vision technologies for small and medium-sized businesses. Our research leverages a combination of modern technologies including Flask for web development, Firebase for database management, and Kotlin and Jetpack Compose for mobile application development. We integrate these with automatic training methods provided by the Autodistill library, utilizing models such as Detic and YOLOv8.

The results demonstrate that this technological combination significantly enhances the performance of our object detection models, contributing to AI solutions in the digital intelligence era. A notable advancement is Autodistill’s capability to bypass the traditional dataset creation step by automatically generating labeled datasets from unlabeled input data. This feature markedly improves the efficiency and effectiveness of both data preparation and model training processes. Overall, our findings underscore the potential of these integrated technologies to democratize access to sophisticated computer vision capabilities for smaller enterprises, fostering greater innovation and competitiveness in the marketplace.

Author Biographies

Khoi Nguyen The, Department of Information Technology, FPT University HCMC

Khoi Nguyen The , received the bachelor’s degree in artificial intelligence from FPT University. His current research interests include computer vision, object detection, deep learning, and self-supervised learning approaches.

Tuong Ho Vinh, Department of Information Technology, TDT University

 Tuong Ho Vinh is a final-year Software Engineering student at Ton Duc Thang University, specializing in computer vision with a focus on self-supervised learning and multimodal approaches for open vocabulary detection tasks. His research direction is to explore the integration of textual and visual modalities to improve the adaptability and generalization of computer vision models.

Anh Hoang, School of Business Information Technology, College of Technology and Design, University of Economics Ho Chi Minh City

Anh HOANG received the B.S. degree in Telecommunication engineer from the Department of Electrical, Electronic, and Information Engineering, Hanoi University of Transport and Communication, in 2007, and the M.S. degree in Computer Science (major in Wireless Networks Security) from the National Taiwan University of Science and Technology (NTUST), Taiwan, in 2010. He completed Ph.D. program at the Graduate School of Advanced Science and Technology (major in Knowledge Science), Japan Advanced Institute of Science and Technology (JAIST), Japan, in September 2021. His research interests are related to AI/Machine Learning, Data Science/Data Mining/Data Analytics, CyberSecurity, and Business Intelligence/ Business Analytics. He is currently teaching at School of Business Information Technology, College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Vietnam.

References

V. K. Kukkala, J. Tunnell, S. Pasricha, and T. Bradley, “Ad vanced driver-assistance systems: A path toward autonomous vehicles,” IEEE Consumer Electronics Magazine, vol. 7, no. 5, pp. 18–25, 2018.

M. M. Antony and R. Whenish, “Advanced driver assistance systems (adas),” in Automotive Embedded Systems: Key Technologies, Innovations, and Applications. Springer, 2021, pp. 165–181.

M. Payal, P. Dixit, T. Sairam, and N. Goyal, “Robotics, ai, and the iot in defense systems,” AI and IoT-Based Intelligent Automation in Robotics, pp. 109–128, 2021.

A. Khang, V. Abdullayev, E. Litvinova, S. Chumachenko, A. V. Alyar, and P. Anh, “Application of computer vision (cv) in the healthcare ecosystem,” in Computer Vision and AI-Integrated IoT Technologies in the Medical Ecosystem. CRC Press, 2024, pp. 1–16.

A. Zareian, K. D. Rosa, D. H. Hu, and S.-F. Chang, “Open vocabulary object detection using captions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pat tern Recognition, 2021, pp. 14393–14402.

H. Bangalath, M. Maaz, M. U. Khattak, S. H. Khan, and F. Shahbaz Khan, “Bridging the gap between object and image-level representations for open-vocabulary detection,” Advances in Neural Information Processing Systems, vol. 35, pp. 33781–33794, 2022.

X. Zhou, R. Girdhar, A. Joulin, P. Kr¨ahenb¨ uhl, and I. Misra, “Detecting twenty-thousand classes using image-level su pervision,” in European Conference on Computer Vision. Springer, 2022, pp. 350–368.

P. Tang, X. Wang, A. Wang, Y. Yan, W. Liu, J. Huang, and A. Yuille, “Weakly supervised region proposal network and object detection,” in Proceedings of the European Confer ence on Computer Vision (ECCV), September 2018.

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” CoRR, vol. abs/2103.00020, 2021. [Online]. Available: https: //arxiv.org/abs/2103.00020

Roboflow, “autodistill.” [Online]. Available: url{https: //github.com/autodistill/autodistill}

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.

P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, “A review of yolo algorithm developments,” Procedia computer science, vol. 199, pp. 1066–1073, 2022.

J. Du, “Understanding of object detection based on cnn family and yolo,” in Journal of Physics: Conference Series, vol. 1004. IOP Publishing, 2018, p. 012029.

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” jan 2023. [Online]. Available: https://github.com/ultralytics/ ultralytics

C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “Yolov9: Learn ing what you want to learn using programmable gradient information,” arXiv preprint arXiv:2402.13616, 2024.

real BI GmbH, “Fit-q: Ai fitness + gaming.” [Online]. Available: https://play.google.com/store/apps/ details?id=com.statletics.bodyweightconnect

Pixocial technology PTE. LTD., “Vmake ai fashion model studio.” [Online]. Available: https://play.google.com/store/ apps/details?id=com.airbrush.vmake

G. Ngo, “Repdetect,” https://github.com/giaongo/RepDetect, 2024-07-19.

X. Zhou, R. Girdhar, A. Joulin, P. Kr¨ahenb¨ uhl, and I. Misra, “Detecting twenty-thousand classes using image level supervision,” CoRR, vol. abs/2201.02605, 2022. [Online]. Available: https://arxiv.org/abs/2201.02605

T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll´ ar, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312

NBDuy, “Snail dataset,” https://universe.roboflow.com/ nbduy/snail-edpnm, dec 2021, visited on 2024-07-21. [Online]. Available: https://universe.roboflow.com/nbduy/ snail-edpnm

Published
2024-11-25