Identification of comorbidities associated with Covid-19 in the state of Hidalgo through grouping methods

Keywords: Cluster, K-Means, DBSCAN, EM, COVID-19

Abstract

Three clustering algorithms, K-Means, DBSCAN and EM, are analyzed in an open database, with 10,039 records, referring to COVID-19 cases presented in the state of Hidalgo, Mexico. The purpose of this study is to obtain an interpretation of the comorbidities associated with COVID-19 by implementing the aforementioned algorithms. The results of the clusters were validated with the silhouette index, as a technique for evaluating the quality of the algorithms, obtaining the K-Means algorithm as the best classifier in this comparison. In addition, the Tukey HSD test is performed to identify the mean difference between the groups of comorbidities related to the SARS-CoV-2 virus, identifying the existence of a significant difference between the means of the groups obtained. The associated comorbidities identified in this study are diabetes, hypertension, and obesity, in an age range of 45 to 49.87 years.

Downloads

Download data is not yet available.

References

Al Ferdous, F. (2020). A conceptual review on different data clustering algorithms and a proposed insight into their applicability in the context of covid-19. Journal of Advances in Technology and Engineering Research, 6 (2), 58-68.

Casiano, J. R. (2021). Análisis de comorbilidad asociados a la mortalidad por COVID 19 en el municipio de Nezahualcóyotl mediante algoritmos K-means y EM. 8 (16), pp. 117-125.

Chapman, C. K. (2007). CRISP-DM 1.0: Step by step data minig guide.

CONACyT. (2022). Covid-19-México. Información General.Retrieved 01 19, 2023, from https://datos.covid-19.conacyt.mx.

Doroshenko, A. (2020). Analysis of the distribution of COVID-19 in Italy using clustering algorithms. In 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP) (pp. 325-328). IEEE.

Erwin, C., Olcay, A., & Dan, H. (2021). COVID-19 Mortality Prediction Using Machine Learning-Integrated Random Forest Algorithm under Varying Patient Frailty. Mathematics, 9 (2043).

Herrera-Jaramillo, Y. A., Gómez-Ramirez, D. A., Ortega-Giraldo, J. C., & Ardilla-García, A. M. (2021). Semantic and morpho-syntactic prevention's guideline for covid-19 based on cognitively inspired artificial intelligence and data mining. case study: Europe, North America and South America. In Artificial Intelligence for COVID-19 , 501-519.

Kumar, K. M., & Reddy, A. (2016). A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. 58, pp. 39-48. Pattern Recognition.

Laird, N. (1993). The EM algorithm.

Melin, P., Monica, J. C., Sánchez, D., & Cartillo, O. (2020). Analysis of spatial spred relationships of coronavirus (COVID-19) pandemic in the world using self organizing maps. Chaos, Solitons & Fractals (109917), 138.

Nasim, V., Masoud, S., Julio, D. D., Abolfazl, M., & George, M. (2021). County-level longitudinal clustering of COVID-19 mortality to incidence ratio in the United States. 11 (3088).

Pandey, A. (2014). Study and Analysis of K-Means Clustering Algorithm Using Rapidminer. 4 (12), 60-64.

Pérez-Ortega, J., Almaraz-Ortega, N., Torres-Poveda, K., Martínez-González, G., Zavala-Díaz, J. C., & Pasos-Rangel, R. (2022). Application of Data Science for Cluster Analysis of COVID-19 Mortality According to Sociodemographic Factors at Municipal Level in Mexico. Mathematics, 10 (13), 2167.

Poojita, G., & Deepak, J. (2021). A region-specific clustering approach to investigate risk-factors in mortality rate during COVID-19: Comprehensive statistical analysis from 208 countries. J. Med. Eng. Technol. (45), 284-289.

Roy, C., & Valerio, F. (2020). Combining rank-size and k-means for clustering countries over the COVID-19 new deaths per million. 158 (111975).

SAS, I. (2022, 03 18). Retrieved 03 21, 2023, from SAS® Enterprise Miner™ 15.2: Reference Help: https://documentation.sas.com/doc/en/emref/15.2/p1tsqq44rg56ron17qd3m7ey4mzu.htm

Sexena, P. S. (2009). Prediction of student’s academic performance using clustering. In National conference on cloud computing & big data, (pp. 1-6).

Verma, M., Srivastava, M., Chack, N., Diswar, A. K., & Gupta, N. (2012). A comparative study of various clustering algorithms in data mining. International Journal of Engineering Research and Applications (IJERA), 2 (3), 1379-1384.

Zhang, S., Zhang, C., & Yang, Q. (2003). Data preparation for data mining. Applied Artificial Intelligence., (pp. 375-381). San Francisco.

Published
2023-07-05
How to Cite
Enríquez-Ramírez, C., Raluy-Herrero, M., & Olvera-Cuellar, M. (2023). Identification of comorbidities associated with Covid-19 in the state of Hidalgo through grouping methods. Pädi Boletín Científico De Ciencias Básicas E Ingenierías Del ICBI, 11(21), 8-14. https://doi.org/10.29057/icbi.v11i21.10539