LibCat » Книги » Приключения » unrecognised » Data Analytics in Bioinformatics

Data Analytics in Bioinformatics

Здесь есть возможность читать онлайн «Data Analytics in Bioinformatics» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Data Analytics in Bioinformatics
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
5 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 100
- 1
- 2
- 3
- 4
- 5

Data Analytics in Bioinformatics: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Data Analytics in Bioinformatics»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Machine learning techniques are increasingly being used to address problems in computational biology and bioinformatics. Novel machine learning computational techniques to analyze high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. Machine learning techniques such as Markov models, support vector machines, neural networks, and graphical models have been successful in analyzing life science data because of their capabilities in handling randomness and uncertainty of data noise and in generalization. Machine Learning in Bioinformatics compiles recent approaches in machine learning methods and their applications in addressing contemporary problems in bioinformatics approximating classification and prediction of disease, feature selection, dimensionality reduction, gene selection and classification of microarray data and many more.

Data Analytics in Bioinformatics — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Data Analytics in Bioinformatics», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Study of Climate Change

Excessive use of fossil fuels for obtaining energy contributes to high emission of harmful carbon dioxide gas, which is a major cause of global climate change.

The study of different microorganisms that uses carbon dioxide as their leading source will help to reduce atmospheric carbon dioxide levels.

Biotechnology

In bioinformatics, biotechnology is used to identify organisms and micro-organisms which can be useful in dairy industries and food manufacturing companies. For example micro-organisms like Lactococcus lactis involved in dairy industry for the manufacture of buttermilk, cheese, yogurt, etc.

Crop Improvement

Study of bioinformatics involves study of DNA, RNA sequence, prediction of function and structure of protein of plant genomes.

Genetic Knowledge of plants has shown the organisation of genes of plants and this knowledge is used for producing improved insect resistant crops and makes plants more productive and the protein model helps to improve genes of plants.

Insect Resistance

Soil-borne bacteria like Bacillus thuringiensis makes proteins that are toxic to some insects.

Genes of these soil-borne bacteria have been studied and successfully transferred to cotton, potatoes and maize to control many serious pests [4, 5].

These bacteria facilitate to repel insect attack so the practice of using insecticides in plants can be reduced with the study of the protein produced by them and hence the nutritional content of the plants can be improved.

Development of Drought Resistance Varieties

Genetic knowledge of plants helps to develop varieties of crops with a great tolerance of soil alkalinity, iron toxicities and have the capability to grow in reduced water condition. This also allows crop development in substandard soil regions to create more agricultural land and to increase crop production [7].

Comparative Studies

To understand the functions of genes, inherited diseases mechanisms and evolution of species we need to analyze and compare the genetic substance of different species.

Bioinformatics tools are also applied to make comparisons between the numbers, locations and biochemical functions of genes in different organisms [5, 6].

There are a wide range of applications of bioinformatics in the domain of diagnosis, medicine, agriculture, biotechnology. Studying and using different tools of bioinformatics will allow researchers to extend knowledge far more efficiently and effectively through data analysis and experiments. This will fasten the major discoveries more accurately.

3.1.3 Issues with Bioinformatics

Section 3.1.1 discusses different applications of bioinformatics. These applications come with many challenges when it is associated with some issues related with the data or the devices used for collection or analysis of it. So addressing and analysing these issues are required for proper execution and effective result. This subsections below discusses different issues that are faced when the biological study is conducted.

3.1.3.1 Issues Related to Structure

Study of DNA and protein includes problems like protein structure prediction as they are represented in 3D data, so structure prediction, alignment and analysis become a difficult task. The prediction of protein three-dimensional structure from sequence can be solved with the application of ANN.

Most of the biological networks such as protein–protein interaction networks, gene regulatory network, etc. are difficult to interpret and build due to the complexity of biological system. So using graph-theoretic methods these massive range of networks are displayed in graphs which makes classification very difficult using traditional methods.

3.1.3.2 Sequence Analysis

Classification of RNA, Protein Sequence and DNA become a challenge because of difference and similarity of many organisms.

Issue with Genome Sequence

A Genome denotes to the complete set of chromosomes of an organism consisting of DNA .Genome sequencing, is a way of mapping out DNA or ordering DNA for organizing, processing and interpreting the sequences, which again requires improvements in sequencing strategies. Each sequencing of DNA faces challenges in searching the sequence pattern, designing, analyzing and interpreting the data.

In gene findings and genome annotation: Gene finding suggests for prediction of nucleotide sequence such as introns and exons in DNA-sequence segments, whereas genome annotation is a process of gene sequencing to find out the gene coding regions to analyze protein sequence [8]. It involves study of the repetitive DNA within the genome, emulated from either same or nearly same sequence.

In sequence comparison: Sequence comparison is the process of comparing two or more than two sequences. Availability of large amount of sequences in genomic database requires proper categorization of DNA and protein sequence. So sequence comparison helps assigning a hypothetical structure and function to a sequence for identification, design and interpretation of sequence [8].

Analysis of sequencing or DNA sequencing is an important task because it helps to detect individual genes that are associated with a disease. When a disease affects an individual, its protein or genes get altered, that causes gene sequence alteration. So it becomes very important to detect these genes to find the cure of the disease. Traditional methods of gene detection were based on trial and error method. Now the advancement in Data mining and machine learning like Neural Network (NN) allow more precise study of genes and its sequence to simplify the task [9]. Many machine learning algorithms are used to classify the normal and abnormal genes with a great accuracy.

Solution to above problems involves following steps

Collection of Biological Data

Building Computational model

Analyze and solve problems of computational model

Test the computation algorithm

Evaluate the performance of the model.

3.2 Biological Datasets

Bioinformatics deals with various biological datasets being collected at different levels of omics data such as

Genomic Sequence data

Protein Sequence data

Microarray data

Structure data (Structure of RNA and protein)

Chemical data

Disease data.

Based on the type of data Biological database can be divided in to two categories:

a. Primary DatabaseThese kinds of databases are archival in nature because these databases are created by the experimental results submitted directly by researchers. These databases are populated with protein sequence, nucleotide sequence or macromolecular structure etc. [10].Example: Protein Data Bank (PDB), GenBank, DNA Data Bank of Japan (DDBJ), Gene Expression Omnibus (GEO).

b. Secondary DatabaseThese databases are either manually created or extracted from result analysis of primary database to create more structured records for easy retrieval of data [10]. Example: Swiss-port (it is protein sequence database maintained by Swiss Institute of Bioinformatics, Switzerland and the European Bioinformatics Institute, UniProt Knowledgebase.

3.3 Building Computational Model

Building Computational model includes study of different behavior of complex system to get some new insights for deepening our understanding. In this section we will discuss some prerequisites which are required for building the computational model.

3.3.1 Data Pre-Processing and its Necessity

After collecting data from database it goes through several processes because data present in the databases are often raw, noisy, incomplete or inconsistent due to these reasons data cannot be used directly for mining process because it may produce unsatisfactory mining result. In order to enhance the classification result, a pre-processing step is initiated as an essential step before mining the data. It usually includes following methods such as data cleaning, data integration, data transformation, dimensionality reduction and so on [11]. Data pre-processing technique significantly improves the quality of data, performance of the classification model and minimizes the time required for actual mining.