Motif Discovery in DNA Sequences Using Scaled Conjugate Gradient Neural Networks

Authors

  • 1,2Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq
  • 1,2Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq
  • 1,2Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq
  • Department of Computer Communication Engineering, Al-Rafidain University College, Baghdad, Iraq

Keywords:

Bioinformatics, Data Mining, Deoxyribonucleic Acid (DNA), Motif Discovery, Artificial Neural Networks (ANNs), SCG

Abstract

Finding motifs in DNA sequences is a current challenge and an essential step in bioinformatics.
Processing these issues needs considerable data analysis due to technical advancements in the
industry. Artificial Neural Networks (ANNs) are increasingly used, particularly for motif
identification and genomic analysis. In order to find motifs in DNA sequences, this work proposed a
supervised learning algorithm for feed-forward neural networks called Scaled Conjugate Gradient
(SCG) algorithm. The SCG algorithm utilizes a step-size scaling mechanism that is fully automated to
minimize time-consuming row searches during each training iteration. This algorithm was used in this
work for motif discovery to train code patterns and to reduce a multivariate global error function
dependent on the network weights. It trains many code patterns of lengths between 4 to 509 bases to
find them in a database with 2,227,382 bases; many experiments were done with different numbers of
hidden layers; our finding ten hidden layers provide the best results, with training percentage is 100%.
Compared to the other supervised learning neural network algorithms, One Step Secant, Gradient
Descent, Bayesian Regularization, and BFGS Quasi-Newton; our find SCG algorithm produced
higher accuracy (100%) and less time during the training and testing phases.

References

P. Singh and N. Singh, “Role of Data Mining Techniques in Bioinformatics”, International Journal of

Applied Research in Bioinformatics, Vol. 11, No. 1, pp. 51–60, 2021, DOI:

4018/ijarb.2021010106.

Y. Wani et al., "Advances and applications of Bioinformatics in various fields of life",

International Journal of Fauna and Biological Studies, vol. 5, no. 2, pp. 3–10, 2018, [Online].

Available: http://www.ncbi.nlm.nih.gov/BLAST/ed.

P. Thareja and R. S. Chhillar, "A review of data mining optimization techniques for

bioinformatics applications", International Journal of Engineering Trends and Technology,

Vol. 68, No. 10, pp. 58–62, 2020, doi:10.14445/22315381/IJETT-V68I10P210.

M. Rocha and P. Ferreira, Bioinformatics Algorithms, Elsevier, Braga, Portugal, 2018.

S. Choudhuri, BIOINFORMATICS FOR BEGINNERS, Elsevier, Maryland, U.S., 2014.

G. Mariscal, Ó. Marbán, and C. Fernández, "A survey of data mining and knowledge discovery

process models and methodologies", The Knowledge Engineering Review Cambridge

University, Vol. 25, No. 2, pp. 137–166, 2010, DOI: 10.1017/S0269888910000032.

A. Yang, W. Zhang and J. Wang, "Review on the Application of Machine Learning

Algorithms in the Sequence Data Mining of DNA", Frontiers in Bioengineering and

Biotechnology, Vol. 8, No. 2, September, pp. 1– 13, 2020, DOI: 10.3389/fbioe.2020.01032.

R. Hasan and J. Uddin, "Data Mining Techniques for Informative Motif Discovery",

International Journal of Computer Applications, Vol. 88, No. 12, pp. 21–24, February 2014,

DOI:10.5120/15405-3901.

Y. He, Z. Shen and Q. Zhang, "A survey on deep learning in DNA/RNA motif mining", Briefings

in Bioinformatics, Vol. 22, No. 4, pp. 1–10, November 2021, DOI: 10.1093/bib/bbaa229.

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

for DNA Motif Discovery", Information Society of Serbia - ISOS, Serbia | Creative Commons

License: CC BY-NC-ND, pp. 232–236, 2018.

L. Cao, P. Liu, J. Chen, and L. Deng, "Prediction of Transcription Factor Binding

Sites Using a Combined Deep Learning Approach," the journal frontiers in Oncology,

Vol. 12, No.1 June, pp. 1–10, 2022, DOI:10.3389/fonc.2022.893520.

X. Shen????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????- Supervised Motif Learning Graph Neural

Network for Drug Discovery", Machine Learning for Molecules Workshop at NeurIPS,

pp.1–8, 2020, [Online]. Available: https://ml4molecules.github.io.

S. Mohanty, P. Kumar and A. Abdulhakim, "A Review on Planted (l, d) Motif Discovery

Algorithms for medical Diagnose", Multidisciplinary Digital Publishing Institute

(MDPI), Vol. 22, No. 3, pp. 1– 27, 2022, https://doi.org/10.3390/s22031204.

D. Wang, Q. Zhang, C. A. Yuan, X. Qin, Z. K. Huang, and L. Shang, "Motif Discovery via

Convolutional Networks with K-mer Embedding", Springer International Publishing,

Vol. 11644 LNCS., pp. 374–382, 2019, https://doi.org/10.1007/978-3-030-26969-2_36.

N. K. Lee, F. L. Azizan, Y. S. Wong, and N. Omar, "DeepFinder: An integration of feature

based and deep learning approach for DNA motif discovery", BIOTECHNOLOGY &

BIOTECHNOLOGICAL EQUIPMENT, Vol. 32, No. 3, pp. 759–768, 2018, DOI:

1080/13102818.2018.1438209.

J. Lanchantin, R. Singh, B. Wang, and Y. Qi, "Deep motif dashboard: Visualizing and

understanding genomic sequences using deep neural networks", Pacific Symposium on

Biocomputing, vol. 0, no. 212679, pp. 254–265, 2017, DOI: 10.1142/9789813207813_0025.

G. S. Pugalendhi, "Detection of Regulatory Motif in Eukaryotes by Self Organizing Map

Neural Networks", International Journal of Advanced Research in Computer Science, Vol.

, No. 10, pp. 92–96, 2013. Available Online at www.ijarcs.info, ISSN No. 0976-5697.

A. B. Yousif, H. K. Al-Khafaji, and T. Abbas, "A survey of exact motif finding algorithms",

Indones. J. Electr. Eng. Comput. Sci., Vol. 27, No. 2, pp. 1109–1118, 2022, DOI:

11591/IJEECS.v27.i2.pp1109-1118.

F. Zambelli, G. Pesole, and G. Pavesi, "Motif discovery and transcription factor binding sites

before and after the next-generation sequencing era", Briefings in Bioinformatics, Vol. 14, No.

, pp. 225–237, April 2012, DOI:10.1093/bib/bbs016.

G. Pavesi, G. Mauri, and G. Pesole, "In silico representation and discovery of transcription

factor binding sites", Briefings in bioinformatics, Vol. 5, No. 3, pp. 217–236,

September 2004, DOI: 10.1093/bib/5.3.217.

V. Rao, "C++ neural networks and fuzzy logic", Vol. 3, No. 8, IDG Books Worldwide, 1995.

C. Aggarwal, "Neural Networks and Deep Learning", USA, Springer, 2018.

X. Wu, F. Lü, B. Wang, and J. Cheng, "Analysis of DNA sequence pattern using probabilistic

neural network model", Journal of Research and Practice in Information Technology, Vol. 37,

No. 4, pp. 353–362, 2005, Online ISSN: 1443-458X.

U. S. Reddy, M. Arock, and A. V. Reddy, "Planted (l, d) - Motif Finding using Particle

Swarm Optimization", International Journal Computation Applied, Vol. ecot, No. 2, pp. 51–

, 2010, DOI: 10.5120/1541-144.

M. Moller, "A scaled conjugate gradient algorithm for fast supervised learning", the official

journal of the International Neural Network Society, Vol. 6, No. 4, pp. 525–533, November

, DOI:10.1016/S0893-6080(05)80056-5

Downloads

Published

2023-04-10