LibCat » Книги » Приключения » unrecognised » Biomedical Data Mining for Information Retrieval

Biomedical Data Mining for Information Retrieval

Здесь есть возможность читать онлайн «Biomedical Data Mining for Information Retrieval» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Biomedical Data Mining for Information Retrieval
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
5 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 100
- 1
- 2
- 3
- 4
- 5

Biomedical Data Mining for Information Retrieval: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Biomedical Data Mining for Information Retrieval»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

This book comprehensively covers the topic of mining biomedical text, images and visual features towards information retrieval. Biomedical and Health Informatics is an emerging field of research at the intersection of information science, computer science, and health care and brings tremendous opportunities and challenges due to easily available and abundant biomedical data for further analysis. The aim of healthcare informatics is to ensure the high-quality, efficient healthcare, better treatment and quality of life by analyzing biomedical and healthcare data including patient's data, electronic health records (EHRs) and lifestyle. Previously it was a common requirement to have a domain expert to develop a model for biomedical or healthcare; however, recent advancements in representation learning algorithms allows us to automatically to develop the model. Biomedical Image Mining, a novel research area, due to its large amount of biomedical images increasingly generates and stores digitally. These images are mainly in the form of computed tomography (CT), X-ray, nuclear medicine imaging (PET, SPECT), magnetic resonance imaging (MRI) and ultrasound. Patients' biomedical images can be digitized using data mining techniques and may help in answering several important and critical questions related to health care. Image mining in medicine can help to uncover new relationships between data and reveal new useful information that can be helpful for doctors in treating their patients.

Biomedical Data Mining for Information Retrieval — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Biomedical Data Mining for Information Retrieval», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

In Ref. [25], the main goal is to improve the mortality prediction of the ICU patients by using the PhysioNet Challenge 2012 dataset. Mainly three objectives have accomplished (i) reduction of dimensions, (ii) reduction of uncontrolled variance and (iii) less dependency on training set. Feature reduction techniques such as Principal Component Analysis, Spectral Clustering, Factor Analysis and Tukey’s HSD Test are used. Classification is done using SVM that has achieved better accuracy result of 0.73 than the previous work. The authors in Ref. [26] have extracted 61,533 data from the MIMIC-III v1.4, excluded patients whose age is less than 16, patients who stay less than 4 h and patients whose data is not present in the flow sheet. Finally 50,488 cohort ICU stays are used for experiments. Features are extracted by using window of fixed length. The machine learning models used are Logistic Regression, LR with L1 regularization penalty using Least Absolute Shrinkage and Selection Operator (LASSO), LR with L2 regularization penalty and Gradient Boosting Decision Trees. Severity of illness is calculated using different scores such as APS III, SOFA, SAPS, LODS, SAPS II and OASIS. Two types of experiments are conducted i.e. Benchmarking experiment and Real-time experiment. Models are compared from which Gradient Boosting Algorithm obtained high AUROC of 0.920. Prediction of hospital mortality through time series analysis of an intensive care unit patient in an early stage, during the admission by using different data mining techniques is carried in [27]. Different traditional scoring system such as APACHE, SAPS and SOFA are used to obtain score. 4,000 ICU patients are selected from MIMIC database and 37 time series variables are selected from first 48 h of admission. Synthetic Minority Oversampling Technique (SMOTE) (original and smote) is used to modify datasets where they handle missing data by replacing with mean (rep1), then SMOTE (rep1 and smote) is applied. After replacing missing data, EM-Imputation (rep2) algorithm is applied. Finally, result is obtained by using different classifiers like Random Forest (RF), Partial Decision Tree (PART) and Bayesian Network (BN). Among all these three classifiers, Random Forest has obtained best result with AUROC of 0.83 ± 0.03 at 48 h on the rep1, with AUROC of 0.82 ± 0.03 on original, rep1 and smote at 40 h and with AUROC of 0.82 ± 0.03 on rep2 and smote at 48 h.

Sepsis is one of the reasons for high mortality rate and it should be recover quickly, because due to sepsis [28] there is a chance of increasing risk of death after discharge from hospital. The objective of the paper is to develop a model for one year mortality prediction. 5,650 admitted patients with sepsis were selected from MIMIC-III database and were divided into 70% patients for training and 30% patients for testing. Stochastic Gradient Boosting Method is used to develop one-year mortality prediction model. Variables are selected by using Least Absolute Shrinkage and Selection Operator (LASSO) and AUROC is calculated. 0.8039 with confidence level 95%: [0.8033–0.8045] of AUROC result is obtained in testing set. Finally, it is observed that Stochastic Gradient Boosting assembly algorithm is more accurate for one year mortality prediction than other traditional scoring systems—SAPS, OASIS, MPM or SOFA.

Deep learning is successfully applied in various large and complex data-sets. It is one of the new technique which is outperformed the traditional techniques. A multi-scale deep convolution neural network (ConvNets) model for mortality prediction is proposed in Ref. [29]. The dataset is taken from MIMIC-III database and 22 different variables are extracted for measurements from first 48 h for each patient. ConvNet is a multilayer neural network and discrete convolution operation is applied in the network. Convolution Neural Network models have been developed as a backend using different python packages i.e. Keras and TensorFlow. The result obtained by the proposed model gives better result of ROC AUC (0.8735, ± 0.0025) which satisfies the state of art of deep learning models.

1.3 Materials and Methods

1.3.1 Dataset

The dataset is collected from PhysioNet Challenge 2012 which consists of three sets A, B and C [6]. A total of 12,000 patient records are available. Each set consists of 4,000 records of patients from which only set A dataset of 4,000 records are used in this chapter for simulation. There are 41 variables recorded in dataset, five of these variables (age, gender, height, ICU type and initial weight) are general descriptors and 36 variables are times series variables as described in Table 1.1.

From the above 36 variables, only 15 variables are selected for mortality prediction. These variables are represented below in Table 1.2.

From these 15 variables, first value, last value, highest value, lowest value and median value are calculated for nine variables and taken as features. Only first and last values are taken for four variables. For the dataset A, five outcome-related descriptors (SAPS Score, SOFA Score, length of stay, length of survival and in-hospital death) are available from which inhospital death (0 is represented as a survivor and 1 is represented as died in hospital) is taken as a target value.

1.3.2 Data Pre-Processing

Data pre-processing is one of the technique to filter and remove noisy data. 41 variables are given in the dataset. Among them 15 variables are selected out of which some of the variables are not carefully collected and having missing values. In this chapter, missing data are replaced by zeros.

1.3.3 Normalization

All the variables in the dataset are in different ranges and in different scales. The current values of data cannot be used for classification. If all the variables have the values in better ranges and scales, classifiers will work in a better way. A standard approach, z-score normalization method is used to normalize the variables.

Table 1.1 Time series variables with description and physical units recorded in the ICU [6].

S. no.	Variables	Description	Physical units
1.	Albumin	Albumin	g/dL
2.	ALP	Alkaline Phosphate	IU/L
3.	ALT	Alanine transaminase	IU/L
4.	AST	Aspartate transaminase	IU/L
5.	Bilirubin	Bilirubin	mg/dL
6.	BUN	Blood urea nitrogen	mg/dL
7.	Cholesterol	Cholesterol	mg/dL
8.	Creatinine	Creatinine	mg/dL
9.	DiasABP	Invasive diastolic arterial blood pressure	mmHg
10.	FiO2	Fractional inspired oxygen	[0–1]
11.	GCS	Glasgow Coma Score	[3–15]
12.	Glucose	Serum Glucose	mg/dL
13.	HCO3	Serum Bicarbonate	mmol/L
14.	HCT	Hematocrit	%
15.	HR	Heart Rate	bpm
16.	K	Serum Potassium	mEq/L
17.	Lactate	Lactate	mmol/L
18.	Mg	Serum Magnesium	mmol/L
19.	MAP	Invasive mean arterial blood pressure	mmHg
20.	MechVent	Mechanical Respiration Ventilation	0/1(true/false)
21.	Na	Serum Sodium	mEq/L
22.	NIDiasABP	Non-invasive diastolic arterial blood pressure	mmHg
23.	NIMAP	Non-invasive mean arterial blood pressure	mmHg
24.	NISysABP	Non-invasive systolic arterial blood pressure	mmHg
25.	PaCO2	Partial pressure of arterial carbon dioxide	mmHg
26.	PaO2	Partial pressure of arterial oxygen	mmHg
27.	pH	Arterial pH	[0–14]
28.	Platelets	Platelets	cells/nL
29.	RespRate	Respiration Rate	bpm
30.	SaO2	O2 saturation in hemoglobin	%
31.	SysABP	Invasive systolic arterial blood pressure	mmHg
32.	Temp	Temperature	°C
33.	TropI	Troponin-I	µg/L
34.	TropT	Troponin-T	µg/L
35.	Urine	Urine Output	mL
36.	WBC	White Blood Cells Count	cells/nL