Método de identificación de patrones con arreglos relacionales en secuencias de ADN

Victor Ignacio Sobrevilla-Solis; Anilú Franco-Árcega; Luis Heriberto Garcia-Islas; Esteban Rueda-Soriano; Virgilio López-Morales; Joel Suárez-Cansino

doi:10.29057/icbi.v10iEspecial3.8928

Victor Ignacio Sobrevilla-Solis Universidad Autónoma del Estado de Hidalgo https://orcid.org/0000-0003-3920-3430
Anilú Franco-Árcega Universidad Autónoma del Estado de Hidalgo https://orcid.org/0000-0002-9415-8313
Luis Heriberto Garcia-Islas Universidad Autónoma del Estado de Hidalgo https://orcid.org/0000-0002-1483-2428
Esteban Rueda-Soriano Universidad Autónoma del Estado de Hidalgo https://orcid.org/0000-0002-5430-2536
Virgilio López-Morales Universidad Autónoma del Estado de Hidalgo https://orcid.org/0000-0003-2043-8766
Joel Suárez-Cansino Universidad Autónoma del Estado de Hidalgo https://orcid.org/0000-0002-8927-1135

DOI: https://doi.org/10.29057/icbi.v10iEspecial3.8928

Keywords: Sequential pattern mining, frequent contiguous patterns, biological sequences, DNA sequences, bioinformatics

Abstract

Biological sequences contain a significant amount of genetic information from living organisms. The analysis of these sequences can provide information that might help biologists better understand them. The discovery of frequent patterns from a group of DNA sequences has become one of the greatest challenges in the application of data mining techniques. There is considerable time and effort expended in obtaining sequential frequent patterns when the methods utilized are based on Apriori algorithms such as GSP or Key-segment. This paper proposes a sequence mapping-based algorithm designed to augment the search for contiguous frequent patterns in a group of DNA sequences. The present paper presents experiments for several sets of DNA sequences whose whole length varies between 1000 and 5000 nucleotides obtained from a biological database. These experiments demonstrated a faster algorithm for frequent pattern mining on DNA sequences compared with other related algorithms.

Downloads

Download data is not yet available.

References

Aggarwal, C. and Han, J., (2014). Frequent Pattern Mining. Springer International.

Agrawal, R., y Srikant, R. (1995)., Mining sequential patterns. En Proceedings of the eleventh International Conference on Data Engineering (pp. 3–14). Washington, DC, USA: IEEE Computer Society

Bailey, T.L., (2008) Discovering Sequence Motifs, page 231–251. Humana Press.

D’haeseleer, P., (2006). What are DNA sequence motifs? Nature Biotechnology, 24(4):423–425.

Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., y Hsu, M.-C., (2000). Freespan: frequent pattern-projected sequential pattern mining. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 355–359).

Mizrachi, I., (2016). Genbank: The nucleotide sequence database. consultado el 11-11-2021 desde http://www.ncbi.nlm.nih.gov/books/NBK21105/

Rocha, M., Ferreira, P.G., (2018). Bioinformatics Algorithms: Design and Implementation in Python. Elsevier.

Srikant, R., y Agrawal, R., (1996). Mining sequential patterns: Generalizations and performance improvements. En P. Apers, M. Bouzeghoub, y G. Gardarin (Eds.), Advances in database technology (pp. 1–17). Berlin, Heidelberg: Springer Berlin Heidelberg.

Starr, C., Evers, C.A., Starr, L., (2011). Biología: Conceptos y Aplicaciones. Cengage Learning.