Comparative Performance of Machine Learning Algorithms for Diabetes Prediction
DOI:
https://doi.org/10.37802/joti.v8i1.1195Keywords:
Diabetes Mellitus, Machine Learning, Random Forest, Accuracy, Blood Glucose LevelsAbstract
Early detection of diabetes mellitus is crucial to prevent severe complications. This study evaluates three machine learning algorithms for diabetes prediction using a quantitative comparative experimental design. The algorithms are k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Random Forest. These methods were chosen to compare distinct learning paradigms. k-NN is distance-based, SVM is margin-based, and Random Forest is an ensemble method. The goal is to find the optimal model for clinical use. The Pima Indians Diabetes dataset was used. It includes 390 patients and 15 clinical features. Performance was measured by accuracy, precision, recall, and F1-score. Random Forest had the highest accuracy (89.7%) and F1-score, providing the most balanced classification. SVM followed with 84.6%, and k-NN achieved 76.9%. Although k-NN had the highest recall (0.750), its precision was low (0.375), showing a high false-positive rate. Feature importance analysis pointed to blood glucose levels as the most significant predictor, which matches clinical knowledge. In summary, ensemble techniques like Random Forest offer the most reliable results. This highlights the importance of selecting the right algorithm for early diabetes detection in clinical applications.
Downloads
References
R. D. Joshi and C. K. Dhakal, “Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches,” Int. J. Environ. Res. Public Health, vol. 18, no. 14, p. 7346, Jul. 2021, doi: 10.3390/ijerph18147346.
M. Guhdar, A. Ismail Melhum, and A. Luqman Ibrahim, “Optimizing Accuracy of Stroke Prediction Using Logistic Regression,” Journal of Technology and Informatics (JoTI), vol. 4, no. 2, pp. 41–47, Jan. 2023, doi: 10.37802/joti.v4i2.278.
F. Yunita Sari, M. S. Kuntari, H. Khaulasari, and W. Ari Yati, “Comparison of Support Vector Machine Performance with Oversampling and Outlier Handling in Diabetic Disease Detection Classification,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 22, no. 3, pp. 539–552, Jul. 2023, doi: 10.30812/matrik.v22i3.2979.
C.-Y. Chou, D.-Y. Hsu, and C.-H. Chou, “Predicting the Onset of Diabetes with Machine Learning Methods,” J. Pers. Med., vol. 13, no. 3, p. 406, Feb. 2023, doi: 10.3390/jpm13030406.
M. KIVRAK, “Early Diagnosis of Diabetes Mellitus by Machine Learning Methods According to Plasma Glucose Concentration, Serum Insulin Resistance and Diastolic Blood Pressure Indicators,” Medical Records, vol. 4, no. 2, pp. 191–5, May 2022, doi: 10.37990/medr.1021148.
I. G. A. Gunadi and D. O. Rachmawati, “A Comparative Study on the Impact of Feature Selection and Dataset Resampling on the Performance of the K-Nearest Neighbors (KNN) Classification Algorithm,” Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), vol. 13, no. 2, pp. 419–427, Jul. 2024, doi: 10.23887/janapati.v13i2.82174.
P. V. S. Kumar and N. S. Kumar, “Analysis and comparison for prediction of Diabetic Pregnant women using Innovative Principal Component Analysis algorithm over Support Vector Machine Algorithm with Improved Accuracy,” ., no. 25, pp. 942–948, Feb. 2023, doi: 10.18137/cardiometry.2022.25.942948.
L. P. Nguyen et al., “The Utilization of Machine Learning Algorithms for Assisting Physicians in the Diagnosis of Diabetes,” Diagnostics, vol. 13, no. 12, p. 2087, Jun. 2023, doi: 10.3390/diagnostics13122087.
S. Qin, “Apply multiple machine learning models to diabetes prediction,” Applied and Computational Engineering, vol. 86, no. 1, pp. 240–249, Jul. 2024, doi: 10.54254/2755-2721/86/20241610.
R. K. Dewi and S. K. Wardhani, “Prediction of Women’s Potential Type 2 Diabetes with Similarity Classifier Based on P-Probabilistic Extension,” Journal of Information Technology and Cyber Security, vol. 1, no. 2, pp. 76–84, Dec. 2023, doi: 10.30996/jitcs.9945.
A. Syahri, U. Fariha, R. Afandi, and I. Nurliyana, “Comparison of Logistic Regression, Random Forest and Adaboost Algorithms for Diabetes Mellitus Classification,” IJATIS: Indonesian Journal of Applied Technology and Innovation Science, vol. 1, no. 1, pp. 41–46, May 2024, doi: 10.57152/ijatis.v1i1.1116.
Suresh Reddya M and Ramakrishnan V, “Diabetes Prediction Using Blood Sample Data with Novel Voting Classifier over Random Forest,” 2022. doi: 10.3233/APC220045.
M. Suda, T. Ooka, and Z. Yamagata, “Prediction and predictor elucidation of metabolic syndrome onset among young workers using machine learning techniques: A nationwide study in Japan,” Environmental and Occupational Health Practice, vol. 4, no. 1, pp. 2021-0023-OA, 2022, doi: 10.1539/eohp.2021-0023-OA.
S. Wang, “Diabetes Prediction Using Random Forest in Healthcare,” Highlights in Science, Engineering and Technology, vol. 92, pp. 210–217, Apr. 2024, doi: 10.54097/5ndh9a05.
P. Saha et al., “Predicting Time to Diabetes Diagnosis Using Random Survival Forests,” Feb. 07, 2024. doi: 10.1101/2024.02.03.24302304.
A. Edet, S. Inyang, I. Umoren, and U. E. Etuk, “Machine Learning Approach for Classification of Cyber Threats Actors in Web Region,” Journal of Technology and Informatics (JoTI), vol. 6, no. 1, pp. 70–77, Oct. 2024, doi: 10.37802/joti.v6i1.679.
W. A. Arifin, I. Ariawan, A. A. Rosalia, L. Lukman, and N. Tufailah, “Data scaling performance on various machine learning algorithms to identify abalone sex,” Jurnal Teknologi dan Sistem Komputer, vol. 10, no. 1, pp. 26–31, Jan. 2022, doi: 10.14710/jtsiskom.2021.14105.
R. Sumiati, Moh. Chamim, D. Leni, Y. Rosa, and H. Hanif, “Modeling Mechanical Component Classification Using Support Vector Machine with A Radial Basis Function Kernel,” Jurnal Teknik Mesin, vol. 16, no. 2, pp. 165–174, Dec. 2023, doi: 10.30630/jtm.16.2.1250.
I. Rehan, S. Khan, and R. Ullah, “Raman spectroscopy assisted support vector machine: a steadfast tool for noninvasive classification of urinary glucose of diabetes mellitus,” Phys. Scr., vol. 99, no. 2, p. 026004, Feb. 2024, doi: 10.1088/1402-4896/ad1da8.
F. R. Lumbanraja, F. Lufiana, Y. Heningtyas, and K. Muludi, “IMPLEMENTASI SUPPORT VECTOR MACHINE (SVM) UNTUK KLASIFIKASI PEDERITA DIABETES MELLITUS,” Jurnal Komputasi, vol. 10, no. 1, pp. 75–83, Apr. 2022, doi: 10.23960/komputasi.v10i1.2940.
G. Abdurrahman, “Klasifikasi Kanker Payudara Menggunakan Algoritma SVM dengan Kernel RBF, Linier, dan Sigmoid,” JUSTIFY : Jurnal Sistem Informasi Ibrahimy, vol. 2, no. 1, pp. 74–80, Jul. 2023, doi: 10.35316/justify.v2i1.3370.
S. Wang, “Diabetes Prediction Using Random Forest in Healthcare,” Highlights in Science, Engineering and Technology, vol. 92, pp. 210–217, Apr. 2024, doi: 10.54097/5ndh9a05.
Alfi Indah Nurrizqi, Erfiani, and Agus Mohamad Soleh, “Comparison of Ensemble Method Performance in Classifying Blood Sugar Levels Output from Non-Invasive Device,” Int. J. Sci. Res. Sci. Eng. Technol., vol. 11, no. 3, pp. 330–336, Jun. 2024, doi: 10.32628/IJSRSET2411322.
X. Fu, Y. Chen, J. Yan, Y. Chen, and F. Xu, “BGRF: A broad granular random forest algorithm,” Journal of Intelligent & Fuzzy Systems, vol. 44, no. 5, pp. 8103–8117, May 2023, doi: 10.3233/JIFS-223960.
Lukman Arif Sanjani, R. Bimo Mandala Putra, and U. Laili Yuhana, “Exploring the Application of Machine Learning for Automatic Inbound Email Classification in CRM System at XYZ Company,” Journal of Technology and Informatics (JoTI), vol. 6, no. 1, pp. 1–7, Oct. 2024, doi: 10.37802/joti.v6i1.715.















