A Rich High-Order Mutation Testing Dataset for Software Fortification
Abstract
High-order mutation (HOM) testing is a rigorous technique for evaluating the effectiveness of test suites by introducing mutations with multiple concurrent faults into the source code. In this study, we present the development and analysis of a comprehensive dataset tailored for HOM testing purposes. The dataset comprises 2,839,792 instances categorized into Survived and Killed classes, representing instances correctly identified as surviving and not surviving the mutation testing process, respectively. We employ four prominent machine learning algorithms—Logistic Regression, Random Forest Classifier, LightGBM, and XGBoost—to classify instances within these categories. Experimental results demonstrate varying levels of accuracy, precision, recall, and F1-score across the algorithms, with LightGBM and XGBoost exhibiting superior performance. These findings underscore the importance of high-quality datasets in facilitating effective HOM testing and provide valuable insights into the capabilities of machine learning algorithms in this context.
References
R. A. DeMillo, R. J. Lipton, and F. G. Sayward, “Hints on test data selection: Help for the practicing programmer,” Computer, vol. 11, no. 4, pp. 34–41, 1978.
Y. Jia and M. Harman, “An analysis and survey of the development of mutation testing,” IEEE transactions on software engineering, vol. 37, no. 5, pp. 649–678, 2010.
G. Fraser and A. Zeller, “Mutation-driven generation of unit tests and oracles,” in Proceedings of the 19th international symposium on Software testing and analysis, 2010, pp. 147 158.
M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. Le Traon, and M. Harman, “Mutation testing advances: an analysis and survey,” in Advances in computers. Elsevier, 2019, vol. 112, pp. 275–378.
X. Dang, D. Gong, X. Yao, T. Tian, and H. Liu, “En hancement of mutation testing via fuzzy clustering and multi-population genetic algorithm,” IEEE Transactions on Software Engineering, vol. 48, no. 6, pp. 2141–2156, 2021.
A. M. Dakhel, A. Nikanjam, V. Majdinasab, F. Khomh, and M. C. Desmarais, “Effective test generation using pre-trained large language models and mutation testing,” Information and Software Technology, p. 107468, 2024.
Y. Li, W. Shen, T. Wu, L. Chen, D. Wu, Y. Zhou, and B. Xu, “How higher order mutant testing performs for deep learning models: A fine-grained evaluation of test effectiveness and efficiency improved from second-order mutant-classification tuples,” Information and Software Technology, vol. 150, p. 106954, 2022.
T. Swamy, A. Zulfiqar, L. Nardi, M. Shahbaz, and K. Oluko tun, “Homunculus: Auto-generating efficient data-plane ml pipelines for datacenter networks,” in Proceedings of the 28th ACM International Conference on Architectural Sup port for Programming Languages and Operating Systems, Volume 3, 2023, pp. 329–342.
N. T. Binh et al., “Optimizing mutant generation for lustre programs with multi-threading,” in 2020 5th international conference on innovative technologies in intelligent systems and industrial applications (CITISIA). IEEE, 2020, pp. 1–5.
J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin, “Using mutation analysis for assessing and comparing test ing coverage criteria,” IEEE Transactions on Software En gineering, vol. 32, no. 8, pp. 608–624, 2006.
Q. Zhu, A. Panichella, and A. Zaidman, “A systematic literature review of how mutation testing supports quality assurance processes,” Software Testing, Verification and Re liability, vol. 28, no. 6, p. e1675, 2018.
Y. Gil and D. Ma’ayan, “Better prediction of mutation score,” Authorea Preprints, 2023.
K. Jalbert and J. S. Bradbury, “Predicting mutation score using source code and test suite metrics,” in 2012 First International Workshop on Realizing AI Synergies in Software Engineering (RAISE). IEEE, 2012, pp. 42–46.
Y. Jia and M. Harman, “Higher order mutation testing,” Information and Software Technology, vol. 51, no. 10, pp. 1379–1393, 2009.
J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001.
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
