Optimizing Accuracy of Stroke Prediction Using Logistic Regression


  • Mohammed Guhdar University of Zakho
  • Amera Ismail Melhum University of Duhok
  • Alaa Luqman Ibrahim University of Zakho




Data Analysis Informatics, Logistic Regression (LR), Stroke Machine Learning, Stroke Prediction


An unexpected limitation of blood supply to the brain and heart causes the majority of strokes. Stroke severity can be reduced by being aware of the many stroke warning signs in advance. A stroke may result if the flow of blood to a portion of the brain stops suddenly. In this research, we present a strategy for predicting the early start of stroke disease by using Logistic Regression (LR) algorithms. To improve the performance of the model, preprocessing techniques including SMOTE, feature selection and outlier handling were applied to the dataset. This method helped in achieving a balance of class distribution, identifying and removing unimportant features and handling outliers. with the existence of increased blood pressure, body mass, heart conditions, average blood glucose levels, smoking status, prior stroke, and age. Impairment occurs as the brain's neurons gradually die, depending on which area of the brain is affected by the reduced blood supply. Early diagnosis of symptoms can be extremely helpful in predicting stroke and supporting a healthy lifestyle. Furthermore, we performed an experiment using logistic regression (LR) and compared it to a number of other studies that used the same machine learning model, which is logistic regression (LR), and the same dataset. The results showed that our method successfully achieved the highest F1 score and area under curve (AUC) score, which can be a successful tool for stroke disease prediction with an accuracy of 86% compared to the other five studies in the same field. The predictive model for stroke has prospective applications, and as a result, it is still significant for academics and practitioners in the fields of medicine and health sciences.


Download data is not yet available.


T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, “Stroke Disease Detection and Prediction Using Robust Learning Approaches,” Journal of Healthcare Engineering, vol. 2021, 2021.

V. Tutwiler, A. D. Peshkova, I. A. Andrianova, D. R. Khasanova, J. W. Weisel, and R. I. Litvinov, “Contraction of blood clots is impaired in acute ischemic stroke,” Arteriosclerosis, thrombosis, and vascular biology, vol. 37, no. 2, pp. 271–279, 2017.

P. B. Gorelick, “Alcohol and stroke.,” Stroke, vol. 18, no. 1, pp. 268–271, 1987.

K. Reynolds, B. Lewis, J. D. L. Nolen, G. L. Kinney, B. Sathya, and J. He, “Alcohol consumption and risk of stroke: a meta-analysis,” Jama, vol. 289, no. 5, pp. 579–588, 2003.

J. Alberto and T. Rodríguez, “Stroke prediction through Data Science and Machine Learning Algorithms,” no. Ml, 2021.

M. Fatahi and O. Speck, “Magnetic resonance imaging (MRI): A review of genetic damage investigations,” Mutation Research/Reviews in Mutation Research, vol. 764, pp. 51–63, 2015.

C. Sharma, S. Sharma, M. Kumar, and A. Sodhi, “Early Stroke Prediction Using Machine Learning,” in 2022 International Conference on Decision Aid Sciences and Applications (DASA), 2022, pp. 890–894.

K. G. Dinesh, K. Arumugaraj, K. D. Santhosh, and V. Mareeswari, “Prediction of cardiovascular disease using machine learning algorithms,” in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 2018, pp. 1–7.

J. F. Medina-Mendieta, M. Cortés-Cortés, and M. Cortés-Iglesias, “COVID-19 forecasts for Cuba using logistic regression and gompertz curves,” MEDICC review, vol. 22, pp. 32–39, 2022.

R. Xiao, X. Cui, H. Qiao, X. Zheng, and Y. Zhang, “Early diagnosis model of Alzheimer’s Disease based on sparse logistic regression,” Multimedia Tools and Applications, vol. 80, no. 3, pp. 3969–3980, 2021.

P. Johnson et al., “Genetic algorithm with logistic regression for prediction of progression to Alzheimer’s disease,” BMC bioinformatics, vol. 15, no. 16, pp. 1–14, 2014.

S. Nusinovici et al., “Logistic regression was as good as machine learning for predicting major chronic diseases,” Journal of clinical epidemiology, vol. 122, pp. 56–69, 2020.

A. S. Abdalrada, O. H. Yahya, A. H. M. Alaidi, N. A. Hussein, H. T. Alrikabi, and T. A.-Q. Al-Quraishi, “A predictive model for liver disease progression based on logistic regression algorithm,” Periodicals of Engineering and Natural Sciences (PEN), vol. 7, no. 3, pp. 1255–1264, 2019.

W.-W. Chang, S.-Z. Fei, N. Pan, Y.-S. Yao, and Y.-L. Jin, “Incident Stroke and Its Influencing Factors in Patients With Type 2 Diabetes Mellitus and/or Hypertension: A Prospective Cohort Study,” Frontiers in Cardiovascular Medicine, vol. 9, 2022.

V. Bandi, D. Bhattacharyya, and D. Midhunchakkravarthy, “Prediction of Brain Stroke Severity Using Machine Learning.,” Rev. d’Intelligence Artif., vol. 34, no. 6, pp. 753–761, 2020.

G. Sailasya and G. L. A. Kumari, “Analyzing the performance of stroke prediction using ML classification algorithms,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021.

A. A. Ali, “Stroke prediction using distributed machine learning based on Apache spark,” Stroke, vol. 28, no. 15, pp. 89–97, 2019.

M. S. Azam, M. Habibullah, and H. K. Rana, “Performance Analysis of Various Machine Learning Approaches in Stroke Prediction,” International Journal of Computer Applications, vol. 175, no. 21, pp. 11–15, 2020.

E. Dritsas and M. Trigka, “Stroke risk prediction with machine learning techniques,” Sensors, vol. 22, no. 13, p. 4670, 2022.

Z. Liu, C. Yang, X. Wang, and Y. Xiang, “Blood-based biomarkers: a forgotten friend of hyperacute ischemic stroke,” Frontiers in neurology, p. 797, 2021.

C. S. Nwosu, S. Dev, P. Bhardwaj, B. Veeravalli, and D. John, “Predicting stroke from electronic health records,” in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019, pp. 5704–5707.

H. Lee et al., “Machine learning approach to identify stroke within 4.5 hours,” Stroke, vol. 51, no. 3, pp. 860–866, 2020.

Kaggle, “Stroke Prediction Dataset,” Kaggle. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset (accessed Sep. 11, 2022).

R. E. Wright, “Logistic regression.,” 1995.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.

M. J. O’Donnell et al., “Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study,” The lancet, vol. 388, no. 10046, pp. 761–775, 2016.