تشخیص بیماری دیابت با استفاده از مدل رای‌گیری نرم

نوع مقاله : مقاله پژوهشی

نویسندگان

گروه مهندسی کامپیوتر، دانشکده مهندسی و فناوری، دانشگاه مازندران، بابلسر، ایران.

چکیده

دیابت یکی از عوامل مهم مرگ و میر در سراسر جهان است و تاثیرات آن بر بیماری‌های کلیوی و قلبی و از دست دادن بینایی قابل توجه است. پیش‌بینی دیابت یک حوزه تحقیقاتی مهم است که می‌تواند به بهبود درمان بیماری کمک کند. در این مقاله، روش جدیدی برای تشخیص بیماری دیابت پیشنهاد شده است. روش پیشنهادی روی مجموعه‌داده دیابت اعمال شده است، ابتدا در مرحله پیش‌پردازش، شناسایی داده‌های پرت و حذف آنها، جایگزین نمودن مقادیر گمشده و نرمال‌سازی داده‌ها انجام می‌شود. پس از پیش‌پردازش داده‌ها با استفاده از الگوریتم لاسو، ویژگی‌های مهم انتخاب می‌شوند. سپس با استفاده از سه طبقه‌بند K-نزدیکترین همسایه، تقویت گرادیان شدید و کت‌بوست، نمونه‌ها به دو کلاس بیماران دیابتی و سالم طبقه‌بندی می‌شوند. در پایان برای بهبود روش پیشنهادی از الگوریتم رای‌گیری نرم برای ادغام سه طبقه‌بند استفاده شده است. مدل پیشنهادی در این پژوهش با استفاده از معیارهای ارزیابی دقت، صحت و پوشش مورد ارزیابی قرار گرفت. این مدل به دقت 94.4%، صحت 96.5% و پوشش 92.7% دست یافت. نتایج حاکی از آن هستند که مدل پیشنهادی با افزایش دقت در تشخیص بیماری دیابت نسبت به سایر، عملکرد بهتری داشته است. بنابراین، با استفاده از این مدل، می‌توان افرادی که در معرض خطر ابتلا به دیابت هستند را با دقت بیشتری شناسایی کرد و اقدامات پیشگیرانه‌ای را برای کنترل بیماری دیابت انجام داد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Diabetes Diagnosis Using Soft Voting Model

نویسندگان [English]

  • Sekine Asadi Amiri
  • Hannah Yousefpour
  • Saeide Mohammadpour
Dept. Computer Engineering, Faculty of Engineering and Technology, University of Mazandaran, Babolsar, Iran.
چکیده [English]

Diabetes is one of the most significant factors leading to death, which can significantly result in kidney diseases, heart diseases, and sight loss. The application of data mining could be helpful for the diagnosis and treatment of this disease. Predicting diabetes is an important area of research that can help improve the treatment process. It is necessary to prevent, monitor, and raise awareness about this disease. In this article, a new method to diagnose diabetes is proposed. The proposed method includes a pre-processing stage in which outlier data is removed. Eventually, by using the K-Nearest Neighbor classifier, Extreme Gradient Boosting and CatBoost, samples have been classified into two classes: diabetic and non-diabetic. In the end, to improve the proposed method, a soft voting algorithm has been used to merge the three classifiers. The proposed method has been applied to the Pima diabetes dataset, which includes information on age, gender, blood pressure, glucose, and other factors related to diabetes. The proposed method in this research was evaluated using evaluation metrics such as accuracy, precision, and recall. This model achieved 94.4. % accuracy, 96.5% precision, and 92.7% recall. The results indicate that the proposed model has performed better than other references by increasing the accuracy in diagnosing diabetes. Therefore, by using this model, it will be possible to identify potential diabetic patients more accurately and ultimately prevent them from becoming diabetic.

کلیدواژه‌ها [English]

  • Machine Learning
  • Feature Selection
  • Data Mining
  • Diabetes
  • Prediction
[1] N. Arora, A. Singh, M.Z.N. Al-Dabagh, and S.K. Maitra, “A novel architecture for diabetes patients’ prediction using K-means clustering and SVM,” Math. Probl. Eng., vol. 2022, pp. 1-9, 2022, doi: 10.1155/2022/4815521.
[2] D. Sisodia and D.S. Sisodia, “Prediction of diabetes using classification algorithms,” Procedia Comput. Sci., vol. 132, pp. 1578-1585, 2018, doi: 10.1016/j.procs.2018.05.122.
[3] Z. Salih Ageed et al., “Comprehensive survey of big data mining approaches in cloud systems,” Qubahan Acad. J., vol. 1, no. 2, pp. 29-38, 2021, doi: 10.48161/qaj.v1n2a46.
[4] W. Haoxiang and S. Smys, “Big data analysis and perturbation using data mining algorithm,” J. Soft Comput. Paradigm, vol. 3, no. 1, pp. 19-28, 2021, doi: 10.36548/jscp.2021.1.003.
[5] H. Wu, S. Yang, Z. Huang, J. He, and X. Wang, “Type 2 diabetes mellitus prediction model based on data mining,” Informat. Med. Unlocked, vol. 10, pp. 100-107, 2018, doi: 10.1016/j.imu.2017.12.006.
[6] M.M.F. Islam, R. Ferdousi, S. Rahman, and H. Bushra, “Likelihood prediction of diabetes at early stage using data mining techniques,” in Computer Vision and Machine Intelligence in Medical Image Analysis. Advances in Intelligent Systems and Computing, vol 992, Springer, Singapore, doi: 10.1007/978-981-13-8798-2_12.
[7] F.G. Woldemichael and S. Menaria, “Prediction of diabetes using data mining techniques,” in Proc. 2nd Int. Conf. Trends Electron. Informat. (ICOEI), 2018, doi: 10.1109/icoei.2018.8553959.
[8] C. Fiarni, E.M. Sipayung, and S. Maemunah, “Analysis and prediction of diabetes complication disease using data mining algorithm,” Procedia Comput. Sci., vol. 161, pp. 449-457, 2019, doi: 10.1016/j.procs.2019.11.144.
[9] A. Aldallal and A.A.A. Al-Moosa, “Using data mining techniques to predict diabetes and heart diseases,” in Proc. 4th Int. Conf. Frontiers Signal Process. (ICFSP), Poitiers, France, 2018, pp. 150-154, doi: 10.1109/ICFSP.2018.8552051.
[10] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine learning and data mining methods in diabetes research,” Comput. Struct. Biotechnol. J., vol. 15, pp. 104-116, 2017, doi: 10.1016/j.csbj.2016.12.005.
[11] A. Kumar, P. Kumar, A. Srivastava, A. Kumar, K. Vengatesan, and A. Singhal, “Comparative analysis of data mining techniques to predict heart disease for diabetic patients,” in Advances in Computing and Data Sciences (ICACDS 2020), Communications in Computer and Information Science, vol 1244. Springer, Singapore, 2020, doi: 10.1007/978-981-15-6634-9_46.
[12] T.R. Mahesh et al., “Blended ensemble learning prediction model for strengthening diagnosis and treatment of chronic diabetes disease,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/4451792.
[13] A. Oza and A. Bokhare, “Diabetes prediction using logistic regression and K-nearest neighbor,” in Cong. Intell. Syst. Lect. Notes Data Eng. Commun. Technol., vol 111. Springer, Singapore, 2022, doi: 10.1007/978-981-16-9113-3_30.
[14] M.J. Sai et al., “An ensemble of light gradient boosting machine and adaptive boosting for prediction of type-2 diabetes,” Int. J. Comput. Intell. Syst., vol. 16, no. 1, 2023, doi: 10.1007/s44196-023-00184-y.
[15] A. Mahabub, “A robust voting approach for diabetes prediction using traditional machine learning techniques,” SN Appl. Sci., vol. 1, no. 12, 2019, doi: 10.1007/s42452-019-1759-7.
[16] Z. Mushtaq et al., “Voting classification-based diabetes mellitus prediction using hypertuned machine-learning techniques,” Mobile Inf. Syst., vol. 2022, pp. 1-16, 2022, doi: 10.1155/2022/6521532.
[17] UCI Machine Learning, “Pima Indians diabetes database,” 2016, [Online]. Available: https://www.kaggle.com/uciml/pima-indians-diabetes-database
[18] R. Muthukrishnan and R. Rohini, “LASSO: A feature selection technique in predictive modeling for machine learning,” in Proc. IEEE Int. Conf. Adv. Comput. App. (ICACA), Coimbatore, India, 2016, pp. 18-20, doi: 10.1109/ICACA.2016.7887916.
[19] H. Veisi, H.R. Ghaedsharaf, and M. Ebrahimi, “Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features,” Soft Comput. J., vol. 8, no. 1, pp. 70-85, 2019, doi: 10.22052/8.1.70 [In Persian].
[20] F. Leon, S.-A. Floria, and C. Badica, “Evaluating the effect of voting methods on ensemble-based classification,” in Proc. Int. Conf. Innovat. Intell. Syst. App. (INISTA), Gdynia, Poland, 2017, pp. 1-6, doi: 10.1109/INISTA.2017.8001122.
[21] R. Taimourei-Yansary, M. Mirzarezaee, M. Sadeghi, and B. Nadjar Araabi, “Predicting invasive disease-free survival time in breast cancer patients using semi-supervised graph-based machine learning techniques,” Soft Comput. J., vol. 10, no. 1, pp. 48-69, 2021, doi: 10.22052/scj.2022.243330.1039 [In Persian].
[22] R. Akhoondi and R. Hosseini, “A Novel Fuzzy-Genetic Differential Evolutionary Algorithm for Optimization of A Fuzzy Expert Systems Applied to Heart Disease Prediction,” Soft Comput. J., vol. 6, no. 2, pp. 32-47, dor: 20.1001.1.23223707.1396.6.2.3.7 [In Persian].
[23] R. Rastogi and M. Bansal, “Diabetes prediction model using data mining techniques,” Measurement: Sensors, vol. 24, p. 100605, 2022, doi: 10.1016/j.measen.2022.100605.
[24] G. Battineni, G.G. Sagaro, C. Nalini, F. Amenta, and S.K. Tayebati, “Comparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods,” Machines, vol. 7, no. 4, p. 74, 2019, doi: 10.3390/machines7040074.
[25] D. Choubey, P. Kumar, S. Tripathi, and S. Kumar, “Performance evaluation of classification methods with PCA and PSO for diabetes,” Netw. Model. Anal. Health Informat. Bioinformat., vol. 9, no. 1, 2020, doi: 10.1007/s13721-019-0210-8.
[26] V. Chang, J. Bailey, Q.A. Xu, and Z. Sun, “Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms,” Neural Comput. App., vol. 35, pp. 16157-16173, 2023, doi: 10.1007/s00521-022-07049-z.
[27] S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, pp. 40-46. 2021.
[28] P. Houngue and A. G. Bigirimana, “Leveraging pima dataset to diabetes prediction: Case study of deep neural network,” J. Comput. Commun., vol. 10, no. 11, pp. 15-28, 2022, doi: 10.4236/jcc.2022.1011002.