Low-Barrier Object Detection for Mobile Applications
Abstract
This paper investigates innovative applications of image processing and object recognition methods aimed at simplifying the creation and deployment of object detection models. By doing so, we seek to expand access to advanced computer vision technologies for small and medium-sized businesses. Our research leverages a combination of modern technologies including Flask for web development, Firebase for database management, and Kotlin and Jetpack Compose for mobile application development. We integrate these with automatic training methods provided by the Autodistill library, utilizing models such as Detic and YOLOv8.
The results demonstrate that this technological combination significantly enhances the performance of our object detection models, contributing to AI solutions in the digital intelligence era. A notable advancement is Autodistill’s capability to bypass the traditional dataset creation step by automatically generating labeled datasets from unlabeled input data. This feature markedly improves the efficiency and effectiveness of both data preparation and model training processes. Overall, our findings underscore the potential of these integrated technologies to democratize access to sophisticated computer vision capabilities for smaller enterprises, fostering greater innovation and competitiveness in the marketplace.
References
V. K. Kukkala, J. Tunnell, S. Pasricha, and T. Bradley, “Ad vanced driver-assistance systems: A path toward autonomous vehicles,” IEEE Consumer Electronics Magazine, vol. 7, no. 5, pp. 18–25, 2018.
M. M. Antony and R. Whenish, “Advanced driver assistance systems (adas),” in Automotive Embedded Systems: Key Technologies, Innovations, and Applications. Springer, 2021, pp. 165–181.
M. Payal, P. Dixit, T. Sairam, and N. Goyal, “Robotics, ai, and the iot in defense systems,” AI and IoT-Based Intelligent Automation in Robotics, pp. 109–128, 2021.
A. Khang, V. Abdullayev, E. Litvinova, S. Chumachenko, A. V. Alyar, and P. Anh, “Application of computer vision (cv) in the healthcare ecosystem,” in Computer Vision and AI-Integrated IoT Technologies in the Medical Ecosystem. CRC Press, 2024, pp. 1–16.
A. Zareian, K. D. Rosa, D. H. Hu, and S.-F. Chang, “Open vocabulary object detection using captions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pat tern Recognition, 2021, pp. 14393–14402.
H. Bangalath, M. Maaz, M. U. Khattak, S. H. Khan, and F. Shahbaz Khan, “Bridging the gap between object and image-level representations for open-vocabulary detection,” Advances in Neural Information Processing Systems, vol. 35, pp. 33781–33794, 2022.
X. Zhou, R. Girdhar, A. Joulin, P. Kr¨ahenb¨ uhl, and I. Misra, “Detecting twenty-thousand classes using image-level su pervision,” in European Conference on Computer Vision. Springer, 2022, pp. 350–368.
P. Tang, X. Wang, A. Wang, Y. Yan, W. Liu, J. Huang, and A. Yuille, “Weakly supervised region proposal network and object detection,” in Proceedings of the European Confer ence on Computer Vision (ECCV), September 2018.
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” CoRR, vol. abs/2103.00020, 2021. [Online]. Available: https: //arxiv.org/abs/2103.00020
Roboflow, “autodistill.” [Online]. Available: url{https: //github.com/autodistill/autodistill}
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, “A review of yolo algorithm developments,” Procedia computer science, vol. 199, pp. 1066–1073, 2022.
J. Du, “Understanding of object detection based on cnn family and yolo,” in Journal of Physics: Conference Series, vol. 1004. IOP Publishing, 2018, p. 012029.
G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” jan 2023. [Online]. Available: https://github.com/ultralytics/ ultralytics
C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “Yolov9: Learn ing what you want to learn using programmable gradient information,” arXiv preprint arXiv:2402.13616, 2024.
real BI GmbH, “Fit-q: Ai fitness + gaming.” [Online]. Available: https://play.google.com/store/apps/ details?id=com.statletics.bodyweightconnect
Pixocial technology PTE. LTD., “Vmake ai fashion model studio.” [Online]. Available: https://play.google.com/store/ apps/details?id=com.airbrush.vmake
G. Ngo, “Repdetect,” https://github.com/giaongo/RepDetect, 2024-07-19.
X. Zhou, R. Girdhar, A. Joulin, P. Kr¨ahenb¨ uhl, and I. Misra, “Detecting twenty-thousand classes using image level supervision,” CoRR, vol. abs/2201.02605, 2022. [Online]. Available: https://arxiv.org/abs/2201.02605
T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll´ ar, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
NBDuy, “Snail dataset,” https://universe.roboflow.com/ nbduy/snail-edpnm, dec 2021, visited on 2024-07-21. [Online]. Available: https://universe.roboflow.com/nbduy/ snail-edpnm
