Evidential Reasoning combine with Mass- based Similarity for Imbalanced Classiﬁcation

Anh Hoang

doi:10.32913/mic-ict-research.v2025.n2.1358

Anh Hoang +84967077784 https://orcid.org/0000-0003-2754-3710

DOI: https://doi.org/10.32913/mic-ict-research.v2025.n2.1358

Keywords: Imbalance classification, Mass-based similarity, Multi-source evidence, Predictive modeling, Knowledge discovery

Abstract

This paper introduces an integrated approach to address imbalanced classiﬁcation problems using the Dempster-Shafer theory of evidence and mass-based dissim- ilarity measurement. Traditional distance-based and density- based classiﬁers often struggle with skewed datasets, lead- ing to two key challenges: (1) misclassiﬁcation bias, where conventional classiﬁers treat all instances equally despite the dominance of majority-class samples, and (2) variability in instance densities, which affects similarity assessments. To overcome these limitations, we introduce EMass, a novel classiﬁer that replaces distance- and density-based similarity calculations with mass-based measures. Each neighbor of a considering instance is treated as an independent source of prior knowledge, represented through a basic probability assignment (BPA). We then apply Dempster’s rule of combi- nation to aggregate multiple sources of information, produc- ing a comprehensive probability estimate for classiﬁcation. Experimental evaluations are conducted on 60 imbalanced data sets, comparing 12 algorithms using key performance metrics, including the F1 score, the Brier score, and the area under the curve (AUC). The results show the effectiveness of the proposed EMass classiﬁer in improving the classiﬁcation performance for imbalanced data sets.

Author Biography

Anh Hoang, +84967077784

Anh HOANG received the B.S. degree in Telecommunication engineer from the Department of Electrical, Electronic, and Information Engineering, Hanoi University of Transport and Communication, in 2007, and the M.S. degree in Computer Science (major in Wireless Networks Security) from the National Taiwan University of Science and Technology (NTUST), Taiwan, in 2010. He completed Ph.D. program at the Graduate School of Advanced Science and Technology (major in Knowledge Science), Japan Advanced Institute of Science and Technology (JAIST), Japan, in September 2021. His research interests are related to AI/Machine Learning, Data Science/Data Mining/Data Analytics, CyberSecurity, and Business Intelligence/ Business Analytics.

References

R. Rastogi and K. Shim, “Public: A decision tree classiﬁer that integrates building and pruning,” Data Mining and Knowledge Discovery, vol. 4, pp. 315–344, 2000.

J. Hatwell, M. M. Gaber, and R. M. A. Azad, “Ada-WHIPS: explaining AdaBoost classiﬁcation with applications in the health sciences,” BMC Medical Informatics and Decision Making, vol. 20, no. 1, pp. 1–25, 2020.

F. Thabtah, P. Cowling, and Y. Peng, “Mcar: multi-class classiﬁcation based on association rule,” in The 3rd AC- S/IEEE International Conference on Computer Systems and Applications. IEEE, 2005, p. 33.

Z. Hao, X. Wang, L. Yao, and Y. Zhang, “Improved classiﬁ- cation based on predictive association rules,” in 2009 IEEE International Conference on Systems, Man and Cybernetics. IEEE, 2009, pp. 1165–1170.

D. Heckerman, D. Geiger, and D. M. Chickering, “Learning bayesian networks: The combination of knowledge and statistical data,” arXiv preprint arXiv:1302.6815, 2013.

S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning $ for $-NN classiﬁcation,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 3, pp. 1–19, 2017.

H. Cheng, X. Yan, J. Han, and S. Y. Philip, “Direct discrim- inative pattern mining for effective classiﬁcation,” in 2008 IEEE 24th International Conference on Data Engineering. IEEE, 2008, pp. 169–178.

A. Paul, D. P. Mukherjee, P. Das, A. Gangopadhyay, A. R. Chintha, and S. Kundu, “Improved random forest for classi- ﬁcation,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4012–4024, 2018.

J. Gehrke, R. Ramakrishnan, and V. Ganti, “Rainforest—a framework for fast decision tree construction of large datasets,” Data Mining and Knowledge Discovery, vol. 4, pp. 127–162, 2000.

C. Wang, C. Deng, and S. Wang, “Imbalance-XGBoost: leveraging weighted and focal losses for binary label- imbalanced classiﬁcation with XGBoost,” Pattern Recogni- tion Letters, vol. 136, pp. 190–197, 2020.

M. Ring and B. M. Eskoﬁer, “An approximation of the Gaussian RBF kernel for efﬁcient classiﬁcation with SVMs,” Pattern Recognition Letters, vol. 84, pp. 107–113, 2016.

H. Yu, J. Yang, and J. Han, “Classifying large data sets using svms with hierarchical clusters,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining, 2003, pp. 306–315.

W. M. Wang, J. Wang, Z. Li, Z. Tian, and E. Tsui, “Multiple affective attribute classiﬁcation of online customer product reviews: A heuristic deep learning method for supporting kansei engineering,” Engineering Applications of Artiﬁcial Intelligence, vol. 85, pp. 33–45, 2019.

Y. Villuendas-Rey, C. F. Rey-Bengur´ıa, A´ . Ferreira-Santiago, O. Camacho-Nieto, and C. Ya´n˜ez-Ma´rquez, “The na¨ıve associative classiﬁer (nac): a novel, simple, transparent, and accurate classiﬁcation model evaluated on ﬁnancial data,” Neurocomputing, vol. 265, pp. 105–115, 2017.

Y. Villuendas-Rey, M. D. Alanis-Tamez, C. F. R. Bengur´ıa, C. Ya´n˜ez-Ma´rquez, and O. Camacho-Nieto, “Medical diag- nosis of chronic diseases based on a novel computational intelligence algorithm,” Journal of Universal Computer Sci- ence, pp. 775–796, 2018.

A. Orriols-Puig and E. Bernado´-Mansilla, “Evolutionary rule-based systems for imbalanced data sets,” Soft Comput- ing, vol. 13, pp. 213–225, 2009.

Y. SUN, A. K. C. WONG, and M. S. KAMEL, “Classiﬁ- cation of imbalanced data: A review,” International Journal of Pattern Recognition and Artiﬁcial Intelligence, vol. 23, no. 04, pp. 687–719, 2009.

A. G. de Sa´, A. C. Pereira, and G. L. Pappa, “A customized classiﬁcation algorithm for credit card fraud detection,” Engineering Applications of Artiﬁcial Intelligence, vol. 72, pp. 21–29, 2018.

P. Tapkan, L. O¨ zbakır, S. Kulluk, and A. Baykasog˘lu, “A cost-sensitive classiﬁcation algorithm: BEE-Miner,” Knowledge-Based Systems, vol. 95, pp. 99–113, 2016.

D. Gan, J. Shen, B. An, M. Xu, and N. Liu, “Integrating TANBN with cost sensitive classiﬁcation algorithm for im- balanced data in medical diagnosis,” Computers & Industrial Engineering, vol. 140, p. 106266, 2020.

C. Zhang, J. Bi, S. Xu, E. Ramentol, G. Fan, B. Qiao, and H. Fujita, “Multi-imbalance: An open-source software for multi-class imbalance learning,” Knowledge-Based Systems, vol. 174, pp. 137–143, 2019.

Y. Pan, L. Zhang, X. Wu, and M. J. Skibniewski, “Multi- classiﬁer information fusion in risk analysis,” Information Fusion, vol. 60, pp. 121–136, 2020.

C. Zhu, B. Qin, F. Xiao, Z. Cao, and H. M. Pandey, “A fuzzy preference-based Dempster-Shafer evidence theory for decision fusion,” Information Sciences, vol. 570, pp. 306– 322, 2021.

X. Xu, D. Zhang, Y. Bai, L. Chang, and J. Li, “Evidence rea- soning rule-based classiﬁer with uncertainty quantiﬁcation,” Information Sciences, vol. 516, pp. 192–204, 2020.

K. M. Ting, Y. Zhu, M. Carman, Y. Zhu, and Z.-H. Zhou, “Overcoming key weaknesses of distance-based neighbour- hood methods using a data dependent dissimilarity measure,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Singapore: Springer, 2016, pp. 1205–1214.

A. Tversky, “Features of similarity.” Psychological Review, vol. 84, no. 4, p. 327, 1977.

C. L. Krumhansl, “Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density,” American Psychologist, vol. 5, pp. 445–463, 1978.

A. Hoang, T. N. Mau, D.-V. Vo, and V.-N. Huynh, “A mass- based approach for local outlier detection,” IEEE Access, vol. 9, pp. 16 448–16 466, 2021.

A. P. Dempster, “Upper and lower probabilities induced by a multivalued mapping,” in “Classic Works of the Dempster- Shafer Theory of Belief Functions”. Berlin Heidelberg: Springer, 2008, pp. 57–72.

G. Shafer, A Mathematical Theory of Evidence. New Jersey: Princeton University Press, 1976.

T. N. Mau, Q. Le, D.-V. Vo, D. Doan, and V.-N. Huynh, “Integrating machine learning and evidential reasoning for user proﬁling and recommendation,” Journal of Systems Science and Systems Engineering, vol. 32, pp. 393–412, 2023.

P. Smets, “Decision making in the TBM: the necessity of the pignistic transformation,” International Journal of Approximate Reasoning, vol. 38, no. 2, pp. 133–147, 2005.

E. Ramasso and R. Gouriveau, “Prognostics in switching systems: Evidential markovian classiﬁcation of real-time neuro-fuzzy predictions,” in 2010 Prognostics and System Health Management Conference. IEEE, 2010, pp. 1–10.

Z. Su, T. Denoeux, Y. Hao, and M. Zhao, “Evidential $-NN classiﬁcation with enhanced performance via optimizing a class of parametric conjunctive --rules,” Knowledge-Based Systems, vol. 142, pp. 7–16, 2018.

M. Lichman et al., “UCI Machine Learning Repository,” 2013.

I. Triguero, S. Gonza´lez, J. M. Moyano, S. Garc´ıa, J. Alcala´- Fdez, J. Luengo, A. Ferna´ndez, M. J. del Jesu´s, L. Sa´nchez, and F. Herrera, “Keel 3.0: An open source software for multi-stage analysis in data mining,” International Journal of Computational Intelligence Systems, vol. 10, no. 1, pp. 1238–1249, 2017.

J. P. Siebert, “Vehicle recognition using rule based meth- ods,” Turing Institute Research Memorandum TIRM-87-0.18, 1987.

K. Nakai and M. Kanehisa, “Expert system for predicting protein localization sites in gram-negative bacteria,” Pro- teins: Structure, Function, and Bioinformatics, vol. 11, no. 2, pp. 95–110, 1991.

F. Wilcoxon, “Individual comparisons by ranking methods,” in Breakthroughs in Statistics. New York: Springer, 1992, pp. 196–202.

Evidential Reasoning combine with Mass- based Similarity for Imbalanced Classiﬁcation

Abstract

Author Biography

References

Most read articles by the same author(s)

Aim, Scope, Indexing

Editorial Board