Vietnamese Scene Text Detection Methodbased on Deep Learning
Abstract
This paper proposes an efficient method for detecting Vietnamese text in outdoor scene images. Essentially, the text detection method presented here is based on the idea of utilizing deep learning network architectures to learn various geometric properties in order to reconstruct polygonal representations of text regions. The effectiveness of the method has been evaluated on four real-world outdoor scene image datasets, including the ICDAR 2015, Total-Text, VinText, and VnSceneText datasets. Experimental results show that the proposed method can detect text of various shapes and sizes with high and consistent accuracy. Specifically, the method achieved Precision, Recall, and Hmean scores of 87.53%, 86.94%, and 87.23%, respectively, on the test datasets, 84.32%, 88.17%, and 86.20% on a different dataset, 85.63%, 87.94%, and 86.77% on yet another dataset, and 85.14%, 87.23%, and 86.17% on the last dataset. The experimental results indicate that this approach is feasible for detecting Vietnamese text in outdoor scene images.
References
[2] S. Long, X. He, and C. Yao. Scene text detection and recognition: The deep learning era. Int. J. Comput. Vision, pages 1–24, 2020.
[3] S. M. Hanif and L. Prevost, “Text detection and localization in complex scene images using constrained adaboost algorithm,” in Proc. Int. Conf. on Doc. Anal. and Recognit., 2009, pp. 1–5.
[4] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in Proc. Int. Conf. on Comp. Vision, 2011, pp.1457–1464.
[5] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven,“PhotoOCR: Reading text in uncontrolled conditions,” in Proc. IEEE Int. Conf. on Comp. Vision, 2013, pp. 785–792
[6] S. Tian, Y. Pan, C. Huang, S. Lu, K. Yu, and C. Lim Tan, “Text flow: A unified text detection system in natural scene images,” in Proc. IEEE Int. Conf. on Comp. Vision, 2015, pp. 4651–4659.
[7] H. Cho, M. Sung, and B. Jun, “Canny text detector: Fast and robust scene text localization algorithm,” in Proc. IEEE Conf. on Comp. Vision and Pattern Recognit., 2016, pp. 3566–3573.
[8] B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Conf. on Comp. Vision and Pattern Recognit., 2017, pp. 2550–2558.
[9] Y. Zhu, C. Yao, and X. Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Comp. Sci., vol. 10, no. 1, pp. 19–36, 2016.
[10] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu. Textboxes: A fast text detector with a single deep neural network. In AAAI Conf. on Artificial Intelligence, 2017
[11] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed. SSD: single shot multibox detector. In European Conf. Comput. Vision, 2016.
[12] M. Liao, B. Shi, and X. Bai. Textboxes++: A single-shot oriented scene text detector. IEEE Trans. Image Processing, 27(8):3676–3690, 2018.
[13] Y. Liu and L. Jin. Deep matching prior network: Toward tighter multi-oriented text detection. In Proc. Conf. Comput. Vision Pattern Recognition, 2017.
[14] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li. Single shot text detector with regional attention. In Proc. Int. Conf. Comput. Vision, pages 3047–3055, 2017.
[15] M. Liao, Z. Zhu, B. Shi, G. Xia, and X. Bai. Rotation-sensitive regression for oriented scene text detection. In Proc. Conf. Comput. Vision Pattern Recognition, pages 5909–5918, 2018.
[16] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang. EAST: an efficient and accurate scene text detector. In Proc. Conf. Comput. Vision Pattern Recognition, 2017.
[17] W. He, X. Zhang, F. Yin, and C. Liu. Deep direct regression for multi-oriented scene text detection. In Proc. Int. Conf. Comput. Vision, 2017.
[18] L. Xie, Y. Liu, L. Jin, and Z. Xie. Derpn: Taking a further step toward more general object detection. In AAAI Conf. on Artificial Intelligence, volume 33, pages 9046–9053, 2019.
[19] B. Shi, X. Bai, and S. J. Belongie. Detecting oriented text in natural images by linking segments. In Proc. Conf. Comput. Vision Pattern Recognition, 2017.
[20] J. Tang, Z. Yang, Y. Wang, Q. Zheng, Y. Xu, and X. Bai. Seglink++: Detecting dense and arbitrary-shaped scene text by instanceaware component grouping. Pattern recognition, 96:106954, 2019. Qiao. Detecting text in natural image with connectionist text proposal network. In European Conf. Comput. Vision, 2016.
[21] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai. Multioriented text detection with fully convolutional networks. In Proc. Conf. Comput. Vision Pattern Recognition, 2016.
[22] C. Xue, S. Lu, and F. Zhan. Accurate scene text detection through border semantics awareness and bootstrapping. In European Conf. Comput. Vision, pages 355–372, 2018.
[23] M. Liao, P. Lyu, M. He, C. Yao, W. Wu, and X. Bai. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell., 2019.
[24] P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In European Conf. Comput. Vision, pages 67–83, 2018.
[25] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. In Proc. Int. Conf. Comput. Vision, pages 2961–2969, 2017.
[26] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. Shape robust text detection with progressive scale expansion network. In Proc. Conf. Comput. Vision Pattern Recognition, pages 9336–9345, 2019.
[27] Z. Tian, M. Shu, P. Lyu, R. Li, C. Zhou, X. Shen, and J. Jia. Learning shape-aware embedding for scene text detection. In Proc. Conf. Comput. Vision Pattern Recognition, pages 4234–4243, 2019.
[28] Pengfei Wang, Chengquan Zhang, Fei Qi, Zuming Huang, Mengyi En, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning. In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), October 21–25, 2019, Nice, France. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3343031.3350988, 2019.
[29] Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, Wayne Zhang, "Fourier Contour Embedding for Arbitrary-Shaped Text Detection", CVPR, (2021), https://doi.org/10.48550/arXiv.2104.10442.
[30] M. Liao, Z. Zou, Z. Wan, C. Yao and X. Bai, "Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion" in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 45, no. 01, pp. 919-931, 2023.
[31] N. Nguyen et al., "Dictionary-guided Scene Text Recognition," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 7379-7388, doi: 10.1109/CVPR46437.2021.00730.
[32] N. T. Pham, V. D. Pham, Q. Nguyen-Van, B. H. Nguyen, D. N. Minh Dang and S. D. Nguyen, "Vietnamese Scene Text Detection and Recognition using Deep Learning: An Empirical Study," 2022 6th International Conference on Green Technology and Sustainable Development (GTSD), Nha Trang City, Vietnam, 2022, pp. 213-218, doi: 10.1109/GTSD54989.2022.9989248.
[33] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Adv. Neural Inf. Process. Syst. (NIPS). 5998–6008.
[34] Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. 2018. TextSnake: A flexible representation for detecting text of arbitrary shapes. In Eur. Conf. Comp. Vis. (ECCV). 20–36.
[35] Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. 2019. Shape Robust Text Detection With Progressive Scale Expansion Network. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). 9336–9345.
[36] Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al. 2015. ICDAR 2015 competition on robust reading. In Int. Conf. Doc. Anal. Recognit. (ICDAR). IEEE, 1156–1160.
[37] Chee Kheng Ch’ng and Chee Seng Chan. 2017. Total-Text: A comprehensive dataset for scene text detection and recognition. In Int. Conf. Doc. Anal. Recognit. (ICDAR), Vol. 1. IEEE, 935–942.
[38] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conf. Comp. Vis. Patt. Recognit. (CVPR). IEEE, 248–255.