95. Baltzakis, H. and Papamarkos, N., A new signature verification technique based on a two-stage neural network classifier. Eng. Appl. Artif. Intell ., 14, 1, 95–103, 2001.
96. Zhao, Z.Q., Huang, D.S., Sun, B.Y., Human face recognition based on multi-features using neural networks committee. Pattern Recognit. Lett ., 25, 12, 1351–1358, 2004.
97. Patil, V. and Shimpi, S., Handwritten English character recognition using neural network. Elixir Comput. Sci. Eng ., 41, 5587–5591, 2011.
98. Davydova, 10 Applications of Artificial Neural Networks in Natural Language Processing, Retrieved from https://medium.com/@datamonsters/artificial-neural-networks-in-natural-language-processing-bcf62aa9151a.
99. Murakawa, M., Yoshizawa, S., Kajitani, I., Yao, X., Kajihara, N., Iwata, M., Higuchi, T., The grd chip: Genetic reconfiguration of dsps for neural network processing. IEEE Trans. Comput ., 48, 6, 628–639, 1999.
100. Mozolin, M., Thill, J.C., Usery, E.L., Trip distribution forecasting with multi-layer perceptron neural networks: A critical evaluation. Transport. Res. Part B: Meth ., 34, 1, 53–73, 2000.
101. Kalchbrenner, N., Grefenstette, E., Blunsom., P., A Convolutional Neural Network for Modelling Sentences, in: Proceedings of ACL , vol. 1, pp. 655–665, 2014.
102. Setiono, R., Baesens, B., Mues, C., Recursive neural network rule extraction for data with mixed attributes. IEEE Trans. Neural Networks , 19, 2, 299–307, 2008.
103. Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D., Draw, in: Proceedings of the 32nd International Conference on Machine Learning, PMLR , vol. 37, pp. 1462–1471, 2015.
104. Zen, H. and Sak, H., Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, in: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2015, April, IEEE, pp. 4470–4474.
105. Sutskever, I., Vinyals, O., Le, Q.V., Sequence to sequence learning with neural networks, in: Advances in Neural Information Processing Systems , pp. 3104–3112, 2014.
106. Oymak, S. and Soltanolkotabi, M., Towards moderate overparameterization: global convergence guarantees for training shallow neural networks. IEEE J. Sel. Areas Inf. Theory , 1, 84–105, 2020.
* Corresponding author : satyasundara123@gmail.com
2
Introduction to Unsupervised Learning in Bioinformatics
Nancy Anurag Parasa 1 , Jaya Vinay Namgiri 1 , Sachi Nandan Mohanty 2 and Jatindra Kumar Dash 1 *
1 Department of Computer Science and Engineering, SRM University-AP, Andhra Pradesh, Amaravathi, India
2 Department of Computer Science and Engineering, IcfaiTech, ICFAI Foundation for Higher Education, Hyderabad, India
* Corresponding author : jatinkdash@gmail.com
Abstract
Unsupervised learning algorithmic techniques are applied in grouping the data depending upon similar attributes, most similar patterns, or relationships amongst the dataset points or values. These Machine learning models are also referred to as self-organizing models which operate on clustering technique. Distinct approaches are employed on every other algorithm in splitting up data into clusters. Unsupervised machine learning uncovers previously unknown patterns in data. Unsupervised machine learning algorithms are applied in case of data insufficiency. Few applications of unsupervised machine learning techniques include: Clustering, anomaly detection. Clustering algorithms in bioinformatics are mostly used to decrypt the salient data in gene expression which is used to acknowledge biological processes in an organism. These models aid in drug design through gene expression profiling. Self organising maps are used in data reduction which provides a better understanding of genomics. Various clustering algorithms are deployed in microarray analysis which is useful in clinical research in keeping track of gene expression data. To define in simpler terms unsupervised learning is a technique which works on the input data to produce the output which is hidden or undetermined. This chapter presents various unsupervised algorithms used for knowledge exploration in the field of bioinformatics and highlights several novel works reported in the recent literature.
Keywords :Clustering, self-organizing-maps, microarray
Machine Learning can be coined as equipping the machine (computers) to learn from the environment through experience by facilitating the machines with some tasks whose performance can be measured using some metrics and algorithms. This broad spectrum of machine learning is subdivided into few areas as mentioned below.
Supervised learning—In the above categories supervised learning is stipulated as learning system where the data (input) is provided and the output is also known which states that output is dependent on the input provided. From the experience of learning from the data provided this approach predicts labels for the newly given data.
Reinforcement Learning—This learning approach drives on a goal oriented approach in an interactive environment, and functions on the basis of feedback system using the cases rewards and punishments based on the interaction with the data and its outcomes.
Unsupervised Learning—This learning approach explores all the hidden patterns from the input provided as the output is unknown. Prediction is performed on the dataset where the algorithms are applied and the resultant outcome is produced [1].
As the biological data is vast because of compound protein structures and genome sequences, understanding and decrypting the function of cells is resilient. So as to study the rudimentary biological processes, machine learning approaches paves a way to make the system hassle free in developing tools, software and algorithms. This chapter dives in introducing the unsupervised learning approaches, algorithms and their practices in bioinformatics domain which is an interdisciplinary field of science grouping together biology, statistics and computer science in order to analyse and assess the huge amounts of biological data [2].
In unsupervised learning approach the machine learns from the dataset given as input and labels or groups data accordingly [1]. This can also be referred as self-organization, where the algorithm applied structures the data based on the input provided with minimum human intervention. This approach draws all the hidden patterns that exist in the data and also reveals the relationship of the patterns present.
Unsupervised learning basically operates on few common algorithms [3]
Clustering
Association
Anomaly detection
Latent variable
Dimensionality reduction.

Figure 2.1Machine learning in bioinformatics.
Among the above approaches this chapter explores about the algorithmic techniques that are widely applied in bioinformatics paradigm.
Unsupervised learning in bioinformatics—Machine learning in bioinformatics is spread across 6 realms [6] as shown in Figure 2.1.
Genomics and proteomics—the complete set of genes in a cell of an organism is called genome. Genes are structures in which DNA is stored produced from RNA (mRNA-messenger RNA) that is made up from proteins [7]. Every cell of an organism is developed from proteins which are dynamic in nature because every other tissue produces non identical set of proteins. This dynamic nature of proteins is based on the gene expression data. Unlike proteomes, genomes are constant. The set of proteins present in a cell provides insights about the structure and function of a cell [8]. It is difficult to handle gene expression data manually due to its size. Hence machine learning approach such as clustering algorithms are deployed upon varied gene expression data so as to group up similar functions and structures of tissues and explore hidden information.
Читать дальше