LibCat » Книги » Приключения » unrecognised » Bioinformatics

Bioinformatics

Здесь есть возможность читать онлайн «Bioinformatics» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Bioinformatics
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Bioinformatics: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Bioinformatics»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Praise for the third edition of
“This book is a gem to read and use in practice.”
— "This volume has a distinctive, special value as it offers an unrivalled level of details and unique expert insights from the leading computational biologists, including the very creators of popular bioinformatics tools."
— “A valuable survey of this fascinating field. . . I found it to be the most useful book on bioinformatics that I have seen and recommend it very highly.”
— “This should be on the bookshelf of every molecular biologist.”
— The field of bioinformatics is advancing at a remarkable rate. With the development of new analytical techniques that make use of the latest advances in machine learning and data science, today’s biologists are gaining fantastic new insights into the natural world’s most complex systems. These rapidly progressing innovations can, however, be difficult to keep pace with.
The expanded fourth edition of the best-selling
aims to remedy this by providing students and professionals alike with a comprehensive survey of the current field. Revised to reflect recent advances in computational biology, it offers practical instruction on the gathering, analysis, and interpretation of data, as well as explanations of the most powerful algorithms presently used for biological discovery.
offers the most readable, up-to-date, and thorough introduction to the field for biologists at all levels, covering both key concepts that have stood the test of time and the new and important developments driving this fast-moving discipline forwards.
This new edition features:
New chapters on metabolomics, population genetics, metagenomics and microbial community analysis, and translational bioinformatics A thorough treatment of statistical methods as applied to biological data Special topic boxes and appendices highlighting experimental strategies and advanced concepts Annotated reference lists, comprehensive lists of relevant web resources, and an extensive glossary of commonly used terms in bioinformatics, genomics, and proteomics
is an indispensable companion for researchers, instructors, and students of all levels in molecular biology and computational biology, as well as investigators involved in genomics, clinical research, proteomics, and related fields.

Bioinformatics — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Bioinformatics», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

In order to provide stability and ensure that old analyses can be reproduced, both genome browsers make available not only the current version of the genome assemblies but older ones as well. In addition, annotation tracks, such as the GENCODE gene track and the SNP track, may be based on different versions of the underlying data. Thus, users are encouraged to verify the version of all data (both genome assembly and annotations) when comparing a region of interest between the UCSC and Ensembl Genome Browsers.

This chapter presents general guidelines for accessing the genome sequence and annotations using the UCSC and Ensembl Genome Browsers. Although similar analyses could be carried out with either browser, we have chosen to use different examples at the two sites to illustrate different types of questions that a researcher might want to ask. We finish with a short description of JBrowse (Buels et al. 2016), another web-based genome browser that users can set up on their own servers to share custom genome assemblies and annotations. All of the resources discussed in this chapter are freely available.

The UCSC Genome Browser

After starting in 2000 with just a display of an early draft of the human genome assembly, the UCSC Genome Browser now provides access to assemblies and annotations from over 100 organisms (Haeussler et al. 2019). The majority of assemblies are of mammalian genomes, but other vertebrates, insects, nematodes, deuterostomes, and the Ebola virus are also included. The assemblies from some organisms, including human and mouse, are available in multiple versions. New organisms and assembly versions are added regularly.

The UCSC Browser presents genomic annotation in the form of tracks. Each track provides a different type of feature, from genes to SNPs to predicted gene regulatory regions to expression data. Each organism has its own set of tracks, some created by the UCSC Genome Bioinformatics team and others provided by members of the bioinformatics community. Over 200 tracks are available for the GRCh37 version of the human genome assembly. The newer human genome assembly, GRCh38, has fewer tracks, as not all the data have been remapped from the older assembly. Other genomes are not as well annotated as human; for example, fewer than 20 tracks are available for the sea hare. Some tracks, such as those created from NCBI transcript data, are updated weekly, while others, such as the SNP tracks created from NCBI variant data (Sayers et al. 2019), are updated less frequently, depending on the release schedule of the underlying data. For ease of use, tracks are organized into subsections. For example, depending on the organism, the Genes and Gene Predictions section may include evidence-based gene predictions, ab initio gene predictions, and/or alignment of protein sequences from other species.

The home page of the UCSC Genome Browser provides a stepping-off point for many of the resources developed by the Genome Bioinformatics group at UCSC, including the Genome Browser, BLAT, and the Table Browser, which will be described in detail later in this chapter. The Tools menu provides a link to liftOver , a widely used tool that converts genomic coordinates from one assembly to another. Using this tool, it is possible to update annotation files so that old data can be integrated into a new genome assembly. The Download menu provides an option to download all the sequence and annotation data for each genome assembly hosted by UCSC, as well as some of the source code. The What's New section provides updates on new genome assemblies, as well as new tools and features. Finally, there is an extensive Help menu, with detailed documentation as well as videos. Users may also submit questions to a mailing list, and most queries are answered within a day.

The UCSC Genome Browser provides multiple ways for both individual users and larger genome centers to share data with collaborators or even the entire bioinformatics community. These sharing options are available on the My Data link on the home page. Custom Tracks allow users to display their own data as a separate annotation track in the browser. User data must be formatted in a standard data structure in order to be interpreted correctly by the browser. Many commonly used file formats are supported, including Browser Extensible Data (BED), Binary Alignment/Map (BAM), and Variant Call Format (VCF; Box 4.1). Small data files can be uploaded or pasted into the Genome Browser for personal use. Larger files must be saved on the user's web server and accessed by URL through the Genome Browser. As anyone with the URL can access the data, this method can be used to share data with collaborators. Alternatively, Custom Tracks , along with track configurations and settings, can be shared with selected collaborators using a named Session . Some groups choose to make their Sessions available to the world at large in My Data → Public Sessions . Finally, groups with very large datasets can host their data in the form of a Track Hub so that it can be viewed on the UCSC Genome Browser. When a Track Hub is paired with an Assembly Hub , it can be used to create a browser for a genome assembly not already hosted by UCSC.

Box 4.1Common File Types for Genomic Data

Both the UCSC and Ensembl Genome Browsers allow users to upload their own data so that they can be viewed in context with other genome-scale data. User data must be formatted in a commonly used data structure in order to be interpreted correctly by the browser.

Browser Extensible Data (BED) format is a tab-delimited format that is flexible enough to display many types of data. It can be used to display fairly simple features like the location of transcription binding factor sites, as well more complex ones like transcripts and their exons.

Binary Alignment/Map (BAM) format is the compressed binary version of the Sequence Alignment/Map (SAM) format. It is a compact format designed for use with very large files of nucleotide sequence alignments. Because it can be indexed, only the portion of the file that is needed for display is transferred to the browser. Many tools for next generation sequence analysis use BAM format as output or input.

Variant Call Format (VCF) is a flexible format for large files of variation data including single-nucleotide variants, insertions/deletions, copy number variants, and structural variants. Like BAM format, it is compressed and indexed, and only the portion of the file that is needed for display is transferred to the browser. Many tools for variant analysis use VCF format as output or input.

The UCSC Genome Browser home page lists commonly accessed tools, as well as a frequently updated news section that highlights major data and software updates. To reach the Genome Browser Gateway, the main entry point for text-based searches, click on the Gateway link on the home page ( Figure 4.1). The default assembly is the most recent human assembly, GRCh38, from December 2013. The genomes of other species can be selected from the phylogenetic tree on the left side of the Gateway page, or by typing their name in the selection box. On the human Gateway page, there is also the option to select one of four older human genome assemblies. Details about the GRCh38 assembly and instructions for searching are available on the Gateway page.

To perform a search, enter text into the Position/Search Term box. If the query maps to a unique position in the genome, such as a search for a particular chromosome and position, the Go button links directly to the Genome Browser. However, if there is more than one hit for the query, such as a search for the term metalloprotease, the resulting page will contain a list of results that all contain that term. For some species, the terms have been indexed, and typing a gene symbol into the search box will bring up a list of possible matches. In this example, we will search for the human hypoxia inducible factor 1 alpha subunit ( HIF1A ) gene ( Figure 4.1), which produces a single hit on GRCh38.