A Human Retrieval System based on Human Attribute Ontology and Deep Multi-task Neural Network

  • Trang T.T. Phung HCMUS, SGU
  • Ngoc Ly Quoc Faculty of Information Technology, HCM University of Science –VNUHCM, Vietnam.
  • Fukuzawa Masayuki
Keywords: Image Retrieval, Object Retrieval, Human Attribute Ontology, Attribute Learning, Deep Learning, Deep feature learning

Abstract

The goal of this research is to enhance the capability
of image retrieval systems to understand images more
effectively. We present a model designed for searching human
objects (such as pedestrians or persons) within expansive
image datasets. Our unique approach involves developing an
image retrieval system that incorporates attribute learning
and the Human Attribute Ontology (HAO). This research
offers several key contributions: (1) The development of the
Human Attribute Ontology (HAO) which serves as a repository
for storing prior knowledge about images. Thanks to
its hierarchical structure, this ontology facilitates the reuse of
prior knowledge, optimizing the subsequent stages of attribute
learning and image retrieval; (2) The implementation of a
Convolutional Neural Network (CNN) to spearhead attribute
learning, leveraging the HAO to enhance accuracy; (3) The
creation of a Human Image Retrieval system that utilizes both
attribute learning and the HAO. Our system delves deeper
by understanding images at the attribute level, highlighting
the advantages of harnessing the ontology to reuse existing
knowledge. The efficacy of our methodology is validated
through experiments on benchmark datasets like PETA and
Pa100k achieving state-of-the-art results.

Author Biographies

Trang T.T. Phung, HCMUS, SGU

Trang T.T. Phung is currently a lecturer in the Faculty of Information Technology at Saigon University (SGU), Vietnam. Her research interests lie in the fields of Computer Vision, especially in Image Retrieval
and Image Classification.

Ngoc Ly Quoc, Faculty of Information Technology, HCM University of Science –VNUHCM, Vietnam.

Ly Quoc Ngoc is currently Associate Professor and Head of the Computer Vision and Cognitive Cybernetics Department at the Faculty of Information Technology, University of Science, Ho Chi Minh City. He received his Ph.D in Computer Science from VNUHCM-HCMUS, Vietnam, in 2009. His research interests lie in the fields of Computer Vision, Digital Image and Video Processing, Computer Graphics, Artificial Intelligence, Multimodal Deep Learning, Medical Image Processing.

Fukuzawa Masayuki

Masayuki Fukuzawa is currently an associate professor of Faculty of Information and Human Sciences in Kyoto Institute of Technology, Japan. His specialty is image instrumentation and processing as multidimensional signal. His current research interests include image and video processing for clinical diagnosis, optical instrumentation of semiconductor crystals, and intelligent image sensors.

References

E. Yaghoubi, F. Khezeli, D. Borza, S. Kumar, J. Neves, and H. Proenc¸a, “Human attribute recognition-a comprehensive survey,” Appl. Sci, 2020.

S. Dubey, “A decade survey of content based image retrieval using deep learning,” 2021.

M. Kalayeh, E. Basaran, M. Gokmen, M. Kamasak, and M. Shah, “Human semantic parsing for person reidentification,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, p. 1062–1071, 2018.

T. He, X. Shen, J. Huang, Z. Chen, and X.-S. Hua, “Partial person re-identification with part-part correspondence learning,” Cvpr, p. 9105–9115, 2021.

J. Liu, Z.-J. Zha, W. Wu, K. Zheng, and Q. Sun, “Spatialtemporal correlation and topology learning for person reidentification in videos,” CVPR, p. 4370–4379, 2021.

Z. Zhang, H. Zhang, and S. Liu, “Person re-identification using heterogeneous local graph attention networks,” Cvpr, p. 12136–12145, 2021.

B. Nguyen, B. Nguyen, T. Do, E. Tjiputra, Q. Tran, and A. Nguyen, “Graph-based person signature for person reidentifications,” 2021.

D. Parikh and K. Grauman, “Relative attributes learning,” p.503–510, 2011.

K. Grauman and B. Leibe, Visual object recognition, 2011.

X. Wang, S. Zheng, R. Yang, B. Luo, and J. Tang, “Pedestrian attribute recognition: A survey,” p. 1–32, 2019.

J. Joo, S. Wang, and S. Zhu, “Human attribute recognition by rich appearance dictionary,” Proc. IEEE Int. Conf. Comput. Vis, p. 721–728, 2013.

Y. Lin, “Improving person re-identification by attribute and identity learning,” Pattern Recognit, vol. 95, p. 151–161, 2017.

G. Zhang and J. Xu, “Person re-identification by midlevel attribute and part-based identity learning,” Proc. Mach. Learn. Res, vol. 95, p. 220–231, 2018.

S. Li, T. Xiao, H. Li, B. Zhou, D. Yue, and X. Wang, “Person search with natural language description,” p. 5187–5196, 2017.

H. Galiyawala and M. Raval, “Person retrieval in surveillance using textual query: a review,” Multimed. Tools Appl, 2021.

M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. Hoi, “Deep learning for person re-identification: A survey and outlook,” p. 1–20, 2020.

C. Tay, S. Roy, and K. Yap, “Aanet: Attribute attention network for person re-identifications,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, vol. 2019-June, p. 7127–7136, 2019.

M. Wieczorek, A. Michalowski, A. Wroblewska, and J. Dabrowski, “A strong baseline for fashion retrieval with

person re-identification models,” 2020.

T. Gruber, “Toward principles for the design of ontologies used for knowledge sharing,” Int. J. Hum. - Comput. Stud, vol. 43, no. 5–6, p. 907–928, 1993.

H. Nguyen, N. Ly, and T. Phung, “Large-scale face image retrieval system at attribute level based on facial attribute ontology and deep neuron network,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 10752, p. 539–549, 2018.

M. Xu, “Bio-inspired deep attribute learning towards facial aesthetic prediction,” IEEE Trans. Affect. Comput, vol. 3045, no. c, p. 1–1, 2018.

R. Feris, C. Lampert, and D. Parikh, Visual Attributes. Springer, 2017.

V. Ferrari and A. Zisserman, “Learning visual attributes,” Adv. Neural Inf. Process. Syst, vol. 20, 2008.

A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” 2009.

C. Lampert, H. Nickisch, and S. Harmeling, “Attribute-based classification for zero-shot visual object categorization,” 2013.

O. Russakovsky and L. Fei-Fei, “Attribute learning in largescale datasets,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, no. PART 1, p. 1–14, 2012.

N. Kumar, “Describable visual attributes for face images,” 2011.

Y. Fu, “Attribute learning for image / video understanding,” Queen, 2015-03.

E. Rudd, M. G¨unther, and T. Boult, “Moon: A mixed objective optimization network for the recognition of facial attributes,” LNCS Bioinforma, p. 19–35, 2016.

K. He, Z. Wang, Y. Fu, R. Feng, Y. Jiang, and X. Xue, “Adaptively weighted multi-task deep network for person

attribute classification,” MM, p. 1636–1644, 2017.

D. Jayaraman, F. Sha, and K. Grauman, “Decorrelating semantic visual attributes by resisting the urge to share,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, p. 1629–1636, 2014.

Z. Huo, Y. Xia, and B. Zhang, “Vehicle type classification and attribute prediction using multi-task rcnn,” p. 564–569, 2016.

Z. Tan, Y. Yang, J. Wan, H. Hang, G. Guo, and S. Li, “Attention-based pedestrian attribute analysis,” IEEE Trans.

Image Process, vol. 28, no. 12, p. 6126–6140, 2019.

Y. Hu, X. Bai, P. Zhou, F. Shang, and S. Shen, “Data augmentation imbalance for imbalanced attribute classification,” 2020.

A. Specker, M. Cormier, and J. Beyerer, “Upar: Unified pedestrian attribute recognition and person retrieval,” Appl. Comput. Vision, WACV, p. 981–990, 2023.

X. Liu, “Hydraplus-net: Attentive deep features for pedestrian analysis,” Proc. IEEE Int. Conf. Comput. Vis, no. c, p.350–359, 2017.

C. Tang, L. Sheng, Z. Zhang, and X. Hu, “Improving pedestrian attribute recognition with weakly-supervised multiscale attribute-specific localization,” Proc. IEEE Int. Conf. Comput. Vis, no. c, p. 4996–5005, 2019.

J. Jia, H. Huang,W. Yang, X. Chen, and K. Huang, “Rethinking of pedestrian attribute recognition: Realistic datasets and a strong baseline,” p. 1–12, 2020.

J. Jia, X. Chen, and K. Huang, “Spatial and semantic consistency regularizations for pedestrian attribute recognition,” Proc. IEEE Int. Conf. Comput. Vis, p. 942–951, 2021.

M. Moghaddam, M. Charmi, and H. Hassanpoor, “Jointly human semantic parsing and attribute recognition with feature pyramid structure in efficientnets,” IET Image Process, vol. 15, no. 10, p. 2281–2291, 2021.

J. Jia, N. Gao, F. He, X. Chen, and K. Huang, “Learning disentangled attribute representations for robust pedestrian attribute recognition,” Proc. AAAI Conf. Artif. Intell, vol. 36, no. 1, p. 1069–1077, 2022.

Z. Tan, Y. Yang, J. Wan, G. Guo, and S. Li, “Relation-aware pedestrian attribute recognition with graph convolutional networks,” 2020.

Z. Ji, Z. Hu, Y. Wang, Z. Shao, and Y. Pang, “Reinforced pedestrian attribute recognition with group optimization,” SSRN Electron. J, 2022.

Y. Shi, Z. Wei, H. Ling, Z. Wang, J. Shen, and P. Li, “Person retrieval in surveillance videos via deep attribute mining and reasoning,” IEEE Trans. Multimed, vol. 14, no. 8, 2020.

J. Wang, X. Zhu, S. Gong, and W. Li, “Transferable joint attribute-identity deep learning for unsupervised person reidentification,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, no. 1, p. 2275–2284, 2018.

Y. Men, Y. Mao, Y. Jiang, W. Ma, and Z. Lian, “Controllable person image synthesis with attribute-decomposed

gan,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, p. 5083–5092, 2020.

Y. Chen, S. Duffner, A. Stoian, J. Dufour, and A. Baskurt, “Pedestrian attribute recognition with part-based cnn and combined feature representations,” Jt. Conf. Comput. Vision, Imaging Comput. Graph. Theory Appl, vol. 5, p. 114–122, 2018.

X. Song, H. Yang, and C. Zhou, “Pedestrian attribute recognition with graph convolutional network in surveillance scenarios,” Futur. Internet, vol. 11, no. 11, 2019.

Q. Dong, X. Zhu, and S. Gong, “Person search by text attribute query as zero-shot learning,” Proc. IEEE Int. Conf.Comput. Vis, vol. 2019-Octob, p. 3651–3660, 2019.

N. Sarafianos, C. Nikou, T. Giannakopoulos, and I. Kakadiaris, “Curriculum learning for multi-task classification of visual attributes,” p. 2608–2615, 2017.

S. Pei, “Multitask model for person re-identication by,” 2021.

N. Ly, T. Do, and B. Nguyen, “Large-scale coarse-to-fine object retrieval ontology and deep local multitask learning,” Comput. Intell. Neurosci, vol. 2019, 2019.

Y. Cao, J. Wang, and D. Tao, “Symbiotic adversarial learning for attribute-based person search,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 12359, p. 230–247, 2020.

Z. Liu, H. Mao, C. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” Proc. IEEE Comput.

Soc. Conf. Comput. Vis. Pattern Recognit, vol. 2022-June, p. 11966–11976, 2022.

Z. Wang, Z. Fang, J. Wang, and Y. Yang, “Vitaa: Visualtextual attributes alignment in person search by natural

language,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 12357, p. 402–420, 2020.

T. Ma, M. Yang, H. Rong, Y. Qian, Y. Tian, and N. Al-Nabhan, “Dual-path cnn with max gated block for text-based person re-identification,” Image Vis. Comput, vol. 111, p. 1–14, 2021.

S. Wang, R. Wang, Z. Yao, S. Shan, and X. Chen, “Crossmodal scene graph matching for relationship-aware imagetext retrieval,” Appl. Comput. Vision, WACV, p. 1497–1506, 2020.

B. Jeong, J. Park, and S. Kwak, “Asmr: Learning attributebased person search with adaptive semantic margin regularizer,” Proc. IEEE Int. Conf. Comput. Vis, vol. 1, no. 1, p. 11996–12005, 2021.

N. Maillot, M. Thonnat, and C. Hudelot, “Ontology based object learning and recognition: Application to image retrieval,” Proc. - Int. Conf. Tools with Artif. Intell. ICTAI, no. 1, p. 620–625, 2004.

N. Maillot, “Ontology based object learning and recognitionthesis,” 2008.

V. Mezaris, I. Kompatsiaris, and M. Strintzis, “An ontology approach to object-based image retrieval,” p. –511–14, 2004.

C. Hudelot, “Towards a cognitive vision platform for semantic image interpretation; application to the recognition of biological organisms,” 2008.

R. Contreras, O. Starostenko, V. Alarcon-Aquino, and L. Flores-Pulido, “Facial feature model for emotion recognition using fuzzy reasoning,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 6256, p. 11–21, 2010.

R. Bashar, S. Kang, P. Dawadi, and P. Rhee, “A contextaware statistical ontology approach for adaptive face recognition,” p. 704–709, 2007.

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2019.

Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5586–5609, 2022.

Y. Deng, P. Luo, C. Loy, and X. Tang, “Pedestrian attribute recognition at far distance,” Proc. ACM Int. Conf. Multimed, vol. MM’14, p. 789–792, 2014.

D. Li, Z. Zhang, X. Chen, and K. Huang, “A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios,” IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1575–1590, 2019.

D. Li, X. Chen, and K. Huang, “Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios,” p. 111–115, 2015.

Published
2024-09-15