Data Analytics in Bioinformatics

Здесь есть возможность читать онлайн «Data Analytics in Bioinformatics» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Data Analytics in Bioinformatics: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Data Analytics in Bioinformatics»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Machine learning techniques are increasingly being used to address problems in computational biology and bioinformatics. Novel machine learning computational techniques to analyze high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. Machine learning techniques such as Markov models, support vector machines, neural networks, and graphical models have been successful in analyzing life science data because of their capabilities in handling randomness and uncertainty of data noise and in generalization. Machine Learning in Bioinformatics compiles recent approaches in machine learning methods and their applications in addressing contemporary problems in bioinformatics approximating classification and prediction of disease, feature selection, dimensionality reduction, gene selection and classification of microarray data and many more.

Data Analytics in Bioinformatics — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Data Analytics in Bioinformatics», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Multilayer Perceptron (MLP) Network: MLP supports multi-class classification. It consists of at least one hidden layer along with an input layer and an output layer. In this network every single node is connected to all other nodes in the next layer by connecting weights to develop a fully connected neural network. It may involve one or more than one hidden layer in the network for complex problem classification. MLP implements non-linear activation function for predicting the output units. It has a unique ability to propagate in both the direction i.e. forward propagation and backward propagation [18]. Right of Figure 3.3 shows the architecture of MLP with an input layer, one hidden layer and an output layer.

Figure 33Single layer perceptron left and multilayer perceptron with one - фото 29

Figure 3.3Single layer perceptron (left) and multilayer perceptron with one hidden layer (right) [20].

Backpropagation: It is a supervised learning algorithm implemented in MLP network that helps to change the weight to minimize the calculated error. It traverses backward from output layer to input layer through hidden layers.

Unsupervised learning occurs when the target output is unknown. In this kind of learning network supervises how to group the output result based on the given input data. So this is also defined as self-organization. Some of the applications of unsupervised learning include image processing, speech recognition, text mining, etc. There are some well-established form of neural network based unsupervised learning algorithms available, such as principal components analysis, Kohonen’s self-organizing maps, independent components analysis, Hebbian learning, etc. [9]. Though unsupervised learning algorithms have a proven track in many areas, unwrapping their application for a comprehensive review is beyond the scope of our discussion. This chapter will only focus on the application of supervised neural networks. To get more insights on unsupervised learning in proteomics and genomics you can refer [21].

3.3.5 Application of ANN in Bioinformatics

Artificial neural network has been used in many areas of bioinformatics and has proved to be one of the most powerful tools in solving many bioinformatics problems. Some of the areas of bioinformatics where ANN is applied are listed below and discussed in detail in further sections.

1. In DNA, RNA alignment

2. Image and signal processing

3. In the problems of genes identification

4. In the coding region recognition of genes

5. features detection, classification and Sequencing

6. In signal Identification and analysis generated from regulatory sites

7. Protein structure prediction from different sequences

8. Expression of genetic and genomic data

9. In monitoring the treatment of patients based on DNA sequences.

3.3.6 Broadly Used Supervised Machine Learning Techniques

Apart from ANN there are many supervised machine learning algorithms such as Support Vector Machine (SVM) [22], Logistic Regression [23], Decision tree [24], K-nearest neighbors (KNN) [25] and Random Forest [26] which are widely used in the field of bioinformatics and obtaining a high classification accuracy. These models with most popular Artificial Neural Network architecture being used in the literature are further discussed.

3.4 Literature Review

Over the years, artificial neural network has been widely used in gene expression data processing due to its ability to identify the complicated relationships between different attributes in the large data sets. ANN has achieved great success due to their potential to manage the complexity and nonlinearity of biology datasets. Gene expression analysis have aimed for defining more specific biological aspects to enhance patient risk stratification and to guarantee the highest benefits and least toxicity from a specific treatment.

Wisconsin Prognostic Breast Cancer (WPBC) dataset was collected by Samundeeswari et al . [27] to perform an experiment using ANN model. ANN was used to handle the predicting status of patients at a particular endpoint and to predict the time of disease occurrence. Dataset consisted of 35 features and 194 instances. Feedforward neural network model was used, with two hidden layers and 20 neurons in each layer and the entire experiment was carried out in Matlab environment. Model was trained with backpropagation technique and the sigmoid activation function was used for hidden and output nodes. In this research Neural Network proved to perform remarkably with 96% specificity and 97.68% accuracy.

An ANN model was used by Narayanan et al . [28] to identify positive and negative genes related to cancer from a large dataset. A dataset of 74 patients with 7,129 gene expressions was collected. Out of 74, 31samples were normal bone marrow cases and rest of the patients were diagnosed with multiple myeloma. Different experiments were carried out using single layer ANN model. At the end the authors concluded that the requirement of hidden layers in a network is dependent on the complexity of the gene expression dataset. For gene expression analysis single layer neural network was very useful for the sake of simplicity and generalizability and could solve many complex problems with a suitable architectural modification.

Hu et al . [29] designed a classification model to classify bladder cancer cell for six different tumor classes using 467 images. Using both supervised and unsupervised learning methods accuracy of the model was estimated. In supervised learning, MLP of one hidden layer along with backpropagation algorithm was applied to classify and mitigate the error rate. In unsupervised learning, fuzzy and non-fuzzy c-means clustering methods were implemented. Different activation functions such as Gaussian, sigmoid and sinusoid were studied for different network configurations. Using all the available data, and 5 different features neural network classifier was able to capture the information about cancer cells and obtain 96.9% classification rate whereas fuzzy c-means obtained only 76.50%.

In the year 2003, Won et al . [30] classified leukemia dataset consisting of 72 samples with two different classes such as acute lymphoblastic leukemia and acute myeloid leukemia. Each sample had 7,129 gene expression levels representing the input for the model. Model was trained using 38 samples and rest of the samples were used for testing. Researchers used a 3-layered MLP for data classification with 8 hidden nodes and 2 output nodes. Result showed that ANN outperformed with an accuracy of 97%.

Thein et al . [31] used the breast cancer medical dataset with 699 instances and 10 attributes with one class attribute. The dataset was made available by university of Wisconsin hospital, Madison. Attributes 1 to 9 were used to represent features to be used in the model. Each instance belonged to one of two possible classes: Benign or Malignant. According to the class distribution 458 were Benign and 241 instances were Malignant. The dataset was classified using multilayer neural network (MLP) with backpropagation technique and achieved an accuracy of 99.97%. Authors finally depicted that ANN has the greatest tolerance of noisy data and a great ability to classify the untrained data pattern.

Peterson et al . [32] analyzed DNA microarray cancer data set by comparing different machine learning algorithms such as ANN, logistic regression, linear discriminant analysis, SVM and k-nearest neighbor for survival analysis of patients. One of the main findings here was that ANN is dependent on the statistical significance of the features so despite large sample size, ANN outperformed all other classifiers, in achieving greatest area under the curve.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Data Analytics in Bioinformatics»

Представляем Вашему вниманию похожие книги на «Data Analytics in Bioinformatics» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Data Analytics in Bioinformatics»

Обсуждение, отзывы о книге «Data Analytics in Bioinformatics» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x