UDC 004.048+616-079.4
1 National Technical University “Igor Sikorsky Kyiv Polytechnic Institute,” Kyiv, Ukraine
vbabenko2191@gmail.com
|
2 National Technical University “Igor Sikorsky Kyiv Polytechnic Institute,” Kyiv, Ukraine
nastenko.e@gmail.com
|
|
4 National Technical University “Igor Sikorsky Kyiv Polytechnic Institute,” Kyiv, Ukraine
o.nosovets@gmail.com
|
5 Institute of Nuclear Medicine and Radiation Diagnostics, National Academy of Medical Sciences of Ukraine, Kyiv, Ukraine
irinadykan@gmail.com
|
6 Institute of Nuclear Medicine and Radiation Diagnostics, National Academy of Medical Sciences of Ukraine, Kyiv, Ukraine
btarasyuk13@gmail.com
|
7 Amosov National Institute of Cardiovascular Surgery, National Academy of Medical Sciences of Ukraine, Kyiv, Ukraine
lazorch@ukr.net
|
|
PATHOLOGY CLASSIFICATION FROM MEDICAL IMAGES BY THE ALGORITHM
OF RANDOM FOREST OF OPTIMAL-COMPLEXITY TREES
Abstract. The authors propose an approach to the construction of classifiers in the class of random forest algorithms. A genetic algorithm is used to determine the optimal combination and composition of features’ ensembles in the construction of forest trees. The principles of the group method of data handling are used to optimize the trees’ structure. Optimization of the tree voting procedure in the forest is implemented by the analytic hierarchy process. Examples of using the proposed algorithm to identify pathologies in medical images and the classification results as compared with other known analogs are presented.
Keywords: pathology classification, medical images, random forest, genetic algorithm, group method of data handling, analytic hierarchy process.
full text
REFERENCES
- Sarker I.H. Machine learning: algorithms, real-world applications and research directions. SN Computer Science. 2021. Vol. 2, N 3. P. 160–160. https://doi.org/10.1007/s42979-021-00592-x .
- Mayr A., Binder H., Gefeller O., Schmid M. The evolution of boosting algorithms. Methods of Information in Medicine. 2014. Vol. 53, N 06. P. 419–427. https://doi.org/10.3414/ME13-01-0122 .
- Osman A.H., Aljahdali H.M.A. An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model. IEEE Access. 2020. Vol. 8. P. 39165–39174. https://doi.org/10.1109/ACCESS.2020.2976149.
- Ho T.-K. Random decision forests. Proc. 3rd International Conference on Document Analysis and Recognition (14–16 August 1995, Montreal, QC, Canada). Montreal, 1995. Vol. 1. P. 278–282. https://doi.org/10.1109/ICDAR.1995.598994.
- Nastenko I., Maksymenko V., Potashev S., Pavlov V., Babenko V., Rysin S., Matviichuk O., Lazoryshinets V. Random forest algorithm construction for the diagnosis of coronary heart disease based on echocardiography video data streams. Innovative Biosystems and Bioengineering. 2021. Vol. 5, N 1. P. 61–69. https://doi.org/10.20535/ibb.2021.5.1.225794.
- Pavlyshenko B. Using stacking approaches for machine learning models. Proc. 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP) (21–25 August 2018, Lviv, Ukraine). Lviv, 2018. P. 255–258. https://doi.org/10.1109/DSMP.2018.8478522.
- Indolia S., Goswami A.K., Mishra S.P., Asopa P. Conceptual understanding of convolutional neural network — a deep learning approach. Procedia Computer Science. 2018. Vol. 132. P. 679–688. https://doi.org/10.1016/j.procs.2018.05.069.
- Gu J., Wang Z., Kuen J., Ma L., Shahroudy A., Shuai B., Liu T., Wang X., Wang G., Cai J., Chen T. Recent advances in convolutional neural networks. Pattern Recognition. 2018. Vol. 77. P. 354–377. https://doi.org/10.1016/j.patcog.2017.10.013 .
- Sherstinsky A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena. 2020. Vol. 404. P. 132306–132306. https://doi.org/10.1016/j.physd.2019.132306 .
- Bojer C.S., Meldgaard J.P. Kaggle forecasting competitions: An overlooked learning opportunity. International Journal of Forecasting. 2021. Vol. 37, Iss. 2. P. 587–603. https://doi.org/10.1016/j.ijforecast.2020.07.007.
- Gururaj T., Vishrutha Y.M., Uma M., Rajeshwari D., Ramya B.K. Prediction of lung cancer risk using random forest algorithm based on Kaggle data set. International Journal of Recent Technology and Engineering. 2020. Vol. 8, Iss. 6. P. 1623–1630. https://doi.org/10.35940/ijrte.F7879.038620 .
- Litjens G., Kooi T., Bejnordi B.E., Setio A.A.A., Ciompi F., Ghafoorian M., van der Laak J.A.W.M., van Ginneken B., Sїnchez C.I. A survey on deep learning in medical image analysis. Medical Image Analysis. 2017. Vol. 42. P. 60–88. https://doi.org/10.1016/j.media.2017.07.005 .
- Nastenko E., Pavlov V., Nosovets O., Krugliy V., Honcharuk M., Karlyuk A., Grishko D., Trofymenko O., Babenko V. Application of texture analysis in solving the problem of classification of medical images. Biomedychna inzheneriya i tekhnolohiya. 2020. N 4. P. 69–82. https://doi.org/10.20535/2617-8974.2020.4.221876 .
- Cosgun Y., Yildirim A., Yucel M., Karakoc A.E., Koca G., Gonultas A., Gursoy G., Ustun H., Korkmaz M. Evaluation of invasive and noninvasive methods for the diagnosis of helicobacter pylori infection. Asian Pacific Journal of Cancer Prevention. 2016. Vol. 17, N 12. P. 5265–5272. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5454669/
- Norouzi M., Collins M.D., Fleet D.J., Kohli P. CO2 Forest: improved random forest by continuous optimization of oblique splits. 2015. P. 1–8. arXiv preprint arXiv:1506.06155.
- Chaudhary A., Kolhe S., Kamal R. An improved random forest classifier for multi-class classification. Information Processing in Agriculture. 2016. Vol. 3, Iss. 4. P. 215–222. https://doi.org/10.1016/j.inpa.2016.08.002.
- Elyan E., Gaber M.M. A genetic algorithm approach to optimising random forests applied to class engineered data. Information Sciences. 2017. Vol. 384. P. 220–234. https://doi.org/10.1016/j.ins.2016.08.007 .
- Nastenko I., Maksymenko V., Dykan I., Nosovets O., Tarasiuk B., Pavlov V., Babenko V., Kruhlyi V., Soloduschenko V., Dyba M., Umanets V. Liver pathological states identification in diffuse diseases with self-organization models based on ultrasound images texture features. Proc. 2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT) (23–26 September 2020, Zbarazh, Ukraine). Zbarazh, 2020. Vol. 2. P. 21–25. https://doi.org/10.1109/CSIT49958.2020.9321999.
- Nastenko I., Maksymenko V., Galkin A., Pavlov V., Nosovets O., Dykan I., Tarasiuk B., Babenko V., Umanets V., Petrunina O., Klymenko D. Liver pathological states identification with self-organization models based on ultrasound images texture features. In: Advances in Intelligent Systems and Computing V. Shakhovska N., Medykovskyy M.O. (Eds.). 2021. Vol. 1293. P. 401–418. https://doi.org/10.1007/978-3-030-63270-0_26.
- Anastasakis L., Mort N. The development of self-organization techniques in modelling: A review of the group method of data handling (GMDH). Research Report. ACSE Research Report 813. University of Sheffield, Department of Automatic Control and Systems Engineering. 2001. URL: https://gmdhsoftware.com/GMDH_%20Anastasakis_and_Mort_2001.pdf .
- Furman E., Kye Y., Su J. Computing the Gini index: A note. Economics Letters. 2019. Vol. 185. P. 108753–108753. https://doi.org/10.1016/j.econlet.2019.108753.
- Dong X., Qian M., Jiang R. Packet classification based on the decision tree with information entropy. The Journal of Supercomputing. 2020. Vol. 76, Iss. 6. P. 4117–4131. https://doi.org/ 10.1007/s11227-017-2227-z .
- Chicco D., Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020. Vol. 21, N 1. 6. https://doi.org/10.1186/s12864-019-6413-7 .
- Breiman L. Bagging predictors. Technical Report 421. Berkeley: University of California, Department of Statistics, 1994.
- Breiman L. Random forests. Machine Learning. 2001. Vol. 45, Iss. 1. P. 5–32. https://doi.org/10.1023/A:1010933404324.
- Breiman L. Bagging predictors. Machine Learning. 1996. Vol. 24, Iss. 2. P. 123–140. https://doi.org/10.1007/BF00058655 .
- Goldberg D.E. Genetic algorithms in search, optimization & machine learning. Boston: Addison-Wesley Longman Publishing Co., Inc., 1989. 432 p.
- Nosovets O., Babenko V., Davydovych I., Petrunina O., Averianova O., Zyonh L.D. Personalized clinical treatment selection using genetic algorithm and analytic hierarchy process. Advances in Science, Technology and Engineering Systems Journal. 2021. Vol. 6, Iss. 4. P. 406–413. https://doi.org/10.25046/aj060446 .
- Saaty T.L. Decision making for leaders: The analytic hierarchy process for decisions in a complex world. Pittsburgh: RWS Publications, 1990. 292 p.
- Sperandei S. Understanding logistic regression analysis. Biochemia Medica. 2014. Vol. 24, N 1. P. 12–18. https://doi.org/10.11613/BM.2014.003 .
- Zizka J., Darena F., Svoboda A., Adaboost. In: Text Mining with Machine Learning. 2019. P. 201–210. https://doi.org/10.1201/9780429469275-9 .
- Petrunina O., Shevaga D., Babenko V., Pavlov V., Rysin S., Nastenko I. Comparative analysis of classification algorithms in the analysis of medical images from speckle tracking echocardiography video data. Innovative Biosystems and Bioengineering. 2021. Vol. 5, N 3. https://doi.org/10.20535/ibb.2021.5.3.234990.
- Nastenko E., Maksimenko V., Potashev S., Pavlov V., Babenko V., Rysin S., Matviychuk O., Lazoryshinets V. Application of the method of group consideration of arguments for the construction of algorithms for the diagnosis of coronary heart disease. Biomedychna inzheneriya i tekhnolohiya. 2021. N 5. P. 1–9. https://doi.org/10.20535/2617-8974.2021.5.227141 .