LibCat » Книги » Приключения » unrecognised » Bioinformatics

Bioinformatics

Здесь есть возможность читать онлайн «Bioinformatics» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Bioinformatics
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Bioinformatics: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Bioinformatics»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Praise for the third edition of
“This book is a gem to read and use in practice.”
— "This volume has a distinctive, special value as it offers an unrivalled level of details and unique expert insights from the leading computational biologists, including the very creators of popular bioinformatics tools."
— “A valuable survey of this fascinating field. . . I found it to be the most useful book on bioinformatics that I have seen and recommend it very highly.”
— “This should be on the bookshelf of every molecular biologist.”
— The field of bioinformatics is advancing at a remarkable rate. With the development of new analytical techniques that make use of the latest advances in machine learning and data science, today’s biologists are gaining fantastic new insights into the natural world’s most complex systems. These rapidly progressing innovations can, however, be difficult to keep pace with.
The expanded fourth edition of the best-selling
aims to remedy this by providing students and professionals alike with a comprehensive survey of the current field. Revised to reflect recent advances in computational biology, it offers practical instruction on the gathering, analysis, and interpretation of data, as well as explanations of the most powerful algorithms presently used for biological discovery.
offers the most readable, up-to-date, and thorough introduction to the field for biologists at all levels, covering both key concepts that have stood the test of time and the new and important developments driving this fast-moving discipline forwards.
This new edition features:
New chapters on metabolomics, population genetics, metagenomics and microbial community analysis, and translational bioinformatics A thorough treatment of statistical methods as applied to biological data Special topic boxes and appendices highlighting experimental strategies and advanced concepts Annotated reference lists, comprehensive lists of relevant web resources, and an extensive glossary of commonly used terms in bioinformatics, genomics, and proteomics
is an indispensable companion for researchers, instructors, and students of all levels in molecular biology and computational biology, as well as investigators involved in genomics, clinical research, proteomics, and related fields.

Bioinformatics — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Bioinformatics», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

The next part of the header contains the definition lines, providing a succinct description of the kinds of biological information contained within the record. The definition line (DE in ENA, DEFINITION in DDBJ/GenBank) takes the following form.

DE Drosophila melanogaster eukaryotic initiation factor 4E (eIF4E) gene, DE complete cds, alternatively spliced.

Much care is taken in the generation of these definition lines and, although many of them can be generated automatically from other parts of the record, they are reviewed to ensure that consistency and richness of information are maintained. Obviously, it is quite impossible to capture all of the biology underlying a sequence in a single line of text, but that wealth of information will follow soon enough in downstream parts of the same record.

Continuing down the flatfile record, one finds the full taxonomic information on the sequence of interest. The OS line (or SOURCE line in DDBJ/GenBank) provides the preferred scientific name from which the sequence was derived, followed by the common name of the organism in parentheses. The OC lines (or ORGANISM lines in DDBJ/GenBank) contain the complete taxonomic classification of the source organism. The classification is listed top-down, as nodes in a taxonomic tree, with the most general grouping (Eukaryota) given first.

OS Drosophila melanogaster (fruit fly) OC Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; OC Neoptera; Holometabola; Diptera; Brachycera; Muscomorpha; Ephydroidea; OC Drosophilidae; Drosophila; Sophophora.

Each record must have at least one reference or citation, noted within what are called reference blocks . These reference blocks offer scientific credit and set a context explaining why this particular sequence was determined. The reference blocks take the following form.

RN [1] RP 1-2881 RX DOI; .1074/jbc.271.27.16393. RX PUBMED; 8663200. RA Lavoie C.A., Lachance P.E., Sonenberg N., Lasko P.; RT "Alternatively spliced transcripts from the Drosophila eIF4E gene produce RT two different Cap-binding proteins"; RL J Biol Chem 271(27):16393-16398(1996). XX RN [2] RP 1-2881 RA Lasko P.F.; RT ; RL Submitted (09-APR-1996) to the INSDC. RL Paul F. Lasko, Biology, McGill University, 1205 Avenue Docteur Penfield, RL Montreal, QC H3A 1B1, Canada

In this case, two references are shown, one referring to a published paper and the other referring to the submission of the sequence record itself. In the example above, the second block provides information on the senior author of the paper listed in the first block, as well as the author's postal address. While the date shown in the second block indicates when the sequence (and accompanying information) was submitted to the database, it does not indicate when the record was first made public, so no inferences or claims based on first public release can be made based on this date. Additional submitter blocks may be added to the record each time the sequence is updated.

Some headers may contain COMMENT (DDBJ/GenBank) or CC (ENA) lines. These lines can include a great variety of notes and comments ( descriptors ) that refer to the entire record. Often, genome centers will use these lines to provide contact information and to confer acknowledgments. Comments also may include the history of the sequence. If the sequence of a particular record is updated, the comment will contain a pointer to the previous versions of the record. Alternatively, if an earlier version of the record is retrieved, the comment will point forward to the newer version, as well as backwards, if there was a still earlier version. Finally, there are database cross-reference lines (marked DR) that provide links to allied databases containing information related to the sequence of interest. Here, a cross-reference to FlyBase can be seen in the complete header for this record in Appendix 1.1. Note that the corresponding DDBJ/GenBank header in Appendix 1.2does not contain these cross-references.

The Feature Table

Early on in the collaboration between INSDC partner organizations, an effort was made to come up with a common way to represent the biological information found within a given database record. This common representation is called the feature table , consisting of feature keys (a single word or abbreviation indicating the described biological property), location information denoting where the feature is located within the sequence, and additional qualifiers providing additional descriptive information about the feature. The online INSDC feature table documentation is extensive and describes in great detail what features are allowed and what qualifiers can be used with each individual feature. Wording within the feature table uses common biological research terminology wherever possible and is consistent between DDBJ, ENA, and GenBank entries.

Here, we will dissect the feature table for the eukaryotic transcription factor 4E gene from Drosophila melanogaster , shown in its entirety in both Appendices 1.3(in ENA format) and 1.4(in DDBJ/GenBank format). This particular sequence is alternatively spliced, producing two distinct gene products, 4E-I and 4E-II. The first block of information in the feature table is always the source feature, indicating the biological source of the sequence and additional information relating to the entire sequence. This feature must be present in all INSDC entries, as all DNA or RNA sequences derive from some specific biological source, including synthetic DNA.

FT source 1..2881 FT /organism="Drosophila melanogaster" FT /chromosome="3" FT /map="67A8-B2" FT /mol_type="genomic DNA" FT /db_xref="taxon:7227" FT gene 80..2881 FT /gene="eIF4E"

In the first line of the source key, notice that the numbering scheme shows the range of positions covered by this feature key as two numbers separated by two dots (1..2881). As the source key pertains to the entire sequence, we can infer that the sequence described in this entry is 2881 nucleotides in length. The various ways in which the location of any given feature can be indicated are shown in Table 1.1, accounting for a wide range of biological scenarios. The qualifiers then follow, each preceded by a slash. The full scientific name of the organism is provided, as are specific mapping coordinates, indicating that this sequence is at map location 67A8-B2 on chromosome 3. Also indicated is the type of molecule that was sequenced (genomic DNA). Finally, the last line indicates a database cross-reference (abbreviated as db_xref) to the NCBI taxonomy database, where taxon 7227 corresponds to D. melanogaster . In general, these cross-references are controlled qualifiers that allow entries to be connected to an external database, using an identifier that is unique to that external database. Following the source block above is the gene feature, indicating that the gene itself is a subset of the entire sequence in this entry, starting at position 80 and ending at position 2881.

FT mRNA join(80..224,892..1458,1550..1920,1986..2085,2317..2404, FT 2466..2881) FT /gene="eIF4E" FT /product="eukaryotic initiation factor 4E-I" FT mRNA join(80..224,1550..1920,1986..2085,2317..2404,2466..2881) FT /gene="eIF4E" FT /product="eukaryotic initiation factor 4E-II"

Table 1.1 Indicating locations within the feature table.

`345`	Single position within the sequence
`345..500`	A continuous range of positions bounded by and including the indicated positions
`<345..500`	A continuous range of positions, where the exact lower boundary is not known; the feature begins somewhere prior to position 345 but ends at position 500
`345..>500`	A continuous range of positions, where the exact upper boundary is not known; the feature begins at position 345 but ends somewhere after position 500
`<1..888`	The feature starts before the first sequenced base and continues to position 888
`(102.110)`	Indicates that the exact location is unknown, but that it is one of the positions between 102 and 110, inclusive
`123^124`	Points to a site between positions 123 and 124
`123^177`	Points to a site between two adjacent nucleotides or amino acids anywhere between positions 123 and 177
`join(12..78,134..202)`	Regions 12–78 and 134–202 are joined to form one contiguous sequence
`complement(4918..5126)`	The sequence complementary to that found from 4918 to 5126 in the sequence record
`J00194:100..202`	Positions 100–202, inclusive, in the entry in this database having accession number J00194

The next feature in this example indicates which regions form the two mRNA transcripts for this gene, the first for eukaryotic initiation factor 4E-I and the second for eukaryotic initiation factor 4E-II. In the first case (shown above), the joinline indicates that six distinct DNA segments are transcribed to form the mature RNA transcript while, in the second case, the second region is missing, with only five distinct DNA segments transcribed into the mature RNA transcript – hence the two splice variants that are ultimately encoded by this molecule.