LibCat » Книги » Приключения » unrecognised » Biomedical Data Mining for Information Retrieval

Biomedical Data Mining for Information Retrieval

Здесь есть возможность читать онлайн «Biomedical Data Mining for Information Retrieval» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Biomedical Data Mining for Information Retrieval
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
5 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 100
- 1
- 2
- 3
- 4
- 5

Biomedical Data Mining for Information Retrieval: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Biomedical Data Mining for Information Retrieval»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

This book comprehensively covers the topic of mining biomedical text, images and visual features towards information retrieval. Biomedical and Health Informatics is an emerging field of research at the intersection of information science, computer science, and health care and brings tremendous opportunities and challenges due to easily available and abundant biomedical data for further analysis. The aim of healthcare informatics is to ensure the high-quality, efficient healthcare, better treatment and quality of life by analyzing biomedical and healthcare data including patient's data, electronic health records (EHRs) and lifestyle. Previously it was a common requirement to have a domain expert to develop a model for biomedical or healthcare; however, recent advancements in representation learning algorithms allows us to automatically to develop the model. Biomedical Image Mining, a novel research area, due to its large amount of biomedical images increasingly generates and stores digitally. These images are mainly in the form of computed tomography (CT), X-ray, nuclear medicine imaging (PET, SPECT), magnetic resonance imaging (MRI) and ultrasound. Patients' biomedical images can be digitized using data mining techniques and may help in answering several important and critical questions related to health care. Image mining in medicine can help to uncover new relationships between data and reveal new useful information that can be helpful for doctors in treating their patients.

Biomedical Data Mining for Information Retrieval — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Biomedical Data Mining for Information Retrieval», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

In the course of recent decades, a few seriousness scoring frameworks and machine learning mortality prediction models have been developed [4]. Different traditional scoring techniques such as Acute Physiology and Chronic Health Evaluation (APACHE) [4], Simplified Acute Physiology Score (SAPS) [4], Sequential Organ Failure Assessment (SOFA) [4] and Mortality Probability Model (MPM) [4] and data mining techniques like Artificial Neural Network (ANN) [5], Support Vector Machine (SVM) [5], Decision Tree (DT) [5], Logistic Regression (LR) [5] have been used in the previous researches. Mortality prediction is still an open challenge in an Intensive Care Unit.

The objective of this chapter is to develop a model to predict whether a patient will survive in hospital or not in an ICU using different models such as Discriminate Analysis (DA), Decision Tree (DT), K-Nearest Neighbor (KNN), Naive Bayesian, Support Vector Machine (SVM) and Functional Link Artificial Neural Network (FLANN), a low complexity neural network and its comparison. The dataset have been collected from the PhysioNet Challenge 2012 [6] which consists of 4,000 records of patients admitted in ICU. There are 41 variables during first 48 h after the admission of patients to the ICU from which 5 variables indicate general descriptors—age, gender, height, ICU type and initial weight, 36 variables (time series) from which 15 variables (Temp, HR, Urine, pH, RespRate, GCS, FiO2, PaCO2, MAP, SysABP, DiasABP, NIMAP, NiDiasABP, MechVent, NISysABP) will be taken as input and 5 outcome descriptors—SAPS-1 score, SOFA score, length of stay in days (LOS), length of survival and in-hospital death (0 for survival and 1 for death in hospital) to predict the survival of patients.

The rest of the chapter is organized as follows: Section 1.2describes the previous studies of mortality prediction, Material and methods are presented in Section 1.3where data collection, data-preprocessing, model description is properly described. Section 1.4presents the obtained results. Section 1.5briefly discusses the work with conclusion and finally Section 1.6gives the future work.

1.2 Review of Literature

Many researchers applied different models in PhysioNet Challenge 2012 dataset and obtained different accuracy results.

Silva et al. [7] have developed a method for the prediction of mortality in an in-hospital death (0 takes as survivor and 1 taken as died in hospital). They have collected the data from PhysioNet website and perform the challenges. Dataset consists of three sets: sets A, B and C. Each set has 4,000 records. The challenges are given in two events: event I for a binary classifier measurement performance and event II for a risk estimator measurement performance. For event I scoring criteria are evaluated by using sensitivity and positive predictive value and for event II Hosmer–Lemeshow statistic [8] is used. A baseline algorithm (SAPS-I) is used and obtained score of 0.3125 and 68.58 for events I and II respectively and final score they obtained for events I and II are 0.5353 and 17.58. In Ref. [9] Johnson et al. have described a novel Bayesian ensemble algorithm for mortality prediction. Artifacts and erroneous recordings are removed using data pre-processing. The model is trained using 4,000 records from training set for set A and also with two datasets B and C. Jack-knifing method is performed to estimate the performance of the model. The model has obtained values of 0.5310 and 0.5353 as score 1 on the hidden datasets. Hosmer– Lemeshow statistic has given 26.44 and 29.86 as score 2. The model has re-developed and obtained 0.5374 and 18.20 for scores 1 and 2 on dataset C. The overall performance of the proposed model gives better performance than traditional SAPS model which have some advantages such as missing data handling etc. An improved version of model to estimate the in hospital mortality in the ICU using 37 time series variables is presented in Ref. [10]. They have estimated the performance of various models by using 10-fold cross validation. In the clinical data, it is common to have missing values. These missing values are imputed by using the mean value for patient’s age and gender. A logistic regression model is used and trained using the dataset. The performance of model is evaluated by the two events: Event 1 for the accuracy using low sensitivity and positive predictive value and Event 2 for the Hosmer–Lemeshow H static model for calibration. Their model has resulted 0.516 and 14.4 scores for events 1 and 2 for test set B and 0.482 and 51.7 scores for both the event for test set C. The model performance is better than the existing SAPS model. Another model in Ref. [11] has developed an algorithm to predict the in-hospital death of ICU patients for the event 1 and probability estimation in event 2. Here the missing values are imputed by zero and the data is normalized. Six support vector machine (SVM) classifiers are used for training. For each SVM positive examples and one sixth of the negative examples have taken in the training set. The obtained scores for events 1 and 2 are 0.5345 and 17.88 respectively. An artificial neural network model has developed for the prediction of in-hospital death patients in the ICU under the 48 h observations from the admission [12]. Missing values are handled using an artificial value based on assumption. From all feature sets, 26 features are selected for further process. For classification, two layered neural network having 15 neurons in the hidden layers is used. The model has used 100 voting classifiers and the output it produced is the average of 100 outputs. The mode is trained and tested using 5-fold cross validation. Fuzzy threshold is used to determine the output of the neural network. The model is resulted 0.5088 score for event 1 and 82.211 score for event 2 on the test data set. Ref. [13] has presented an approach that identify time series motifs to predict ICU patients in an in-hospital segmenting the variables into low, high and medium measurements. The method has outperformed the existing scoring systems, SAPS-II, APACHE-II and SOFA and obtained 0.46 score for event 1 and 56.45 score for event 2. An improved mortality prediction using logistic regression and Hidden Markov model has developed for an in-hospital death in Ref. [14]. The model is trained using 4,000 records of patients on set A and validation on other sets of unseen data of 4,000 records. Two different events: event 1 for minimum sensitivity and positive predictive value and for event 2 Hosmer–Lemeshow H statistic is used. The model has given 0.50, 0.50 for event 1 and 15.18, 78.9 for event 2 compared to SAPS-I whose event 1 scores are 0.3170, 0.312 and for event 2 66.03 and 68.58 respectively. An effective framework model for predicting in- hospital death mortality in the ICU stay has been suggested in Ref. [15]. Feature extraction is done by data interpolation and Histogram analysis. To reduce the complexity of feature extraction, it reduces the feature vector by evaluating measurement value of each variable. Then finally Cascaded Adaboost learning model is applied as mortality classifier and obtained the 0.806 score for event 1 and 24.00 score for event 2 on dataset A. On another dataset B the model has obtained 0.379 and 5331.15 score for both events 1 and 2. A decision support application for mortality prediction risk has been reported in Ref. [16]. For the clinical rules the authors have used fuzzy rule based systems. An optimizer is used with genetic algorithm which generates final solutions coefficients. The model FIS achieves 0.39 score for event 1 and 94 score for event 2. To predict the mortality in an ICU, a new method is proposed in Ref. [17]. The method, Simple Correspondence Analysis (SCA) is based on both clinical and laboratory data with the two previous models APACHE-II and SAPS-II. It collects the data from PhysioNet Challenge 2012 of total 12,000 records of Sets A, B and C and 37 time series variables are recorded. SCA method is applied to select variables. SCA combines these variables using traditional methods APACHE and SAPS. This method predicts whether the patient will survive or not. Finally, model has obtained 43.50% score 1 for set A, 42.25% score 1 for set B and 42.73% score1 for set C. The Naive Bayesian Classifier is used in [18] to predict mortality in an ICU and obtain high and small S1 and S2. For S1 sensitivity and predictive positive and for S2 Hosmer–Lemeshow H statistic is defined. It replaces the missing values by NaN (Not-a-Number) if variable is not measured. The model achieves 0.475 for S1 which is the eighth best solution and 12.820 for S2 which is the first best solution on set B. On set C, model has achieved 0.4928 score for event 1 (forth best solution) and 0.247 score for event 2 (third best solution). Di Marco et al . [19] have proposed a new algorithm for mortality prediction with better accuracy for data collected from the first 48 h of admission in ICU. A binary classifier model is applied to obtain result for event 1. The set A is selected which contains 41 variables of 4,000 patients. For feature selection forward sequential with logistic cost function is used. For classification a logistic regression model is used which obtained 54.9% score on set A and 44.0% on test set B. To predict mortality rate Ref. [20] has developed a model based on Support Vector Machine. Support Vector Machine is the machine learning algorithm which tries to minimize error and find the best hyperplane of maximum margin. The two classes represent 0 as survivor or 1 as died in-hospital. For training they read 3,000 data and for testing 1,000 data. They observed an over-fitting of SVM on set A and obtained 0.8158 score for event 1 and 0.3045 score for event 2. For phase 2 they set to improve the training strategies of SVM. They reduce the over-fitting of SVM. The final obtained for event 1 is 0.530 and for set B is 0.350 and for set C final score is 0.333. An algorithm based on artificial neural network has employed to predict patient’s mortality in the hospital in Ref. [21]. Features are extracted from the PhysioNet data and a method is used to detect solar ‘nanoflares’ due to the similarity between solar and time series data. Data preprocessing is done to remove outliers. Missing values are replaced by the mean value of each patient. Then the model is trained and yields 22.83 score for event 2 on set B and 38.23 score on set C. A logistic regression model is suggested in Ref. [22] for the purpose. It follows three phases. In phase 1 selection of derived variables on set A, calculation of the variable’s first value, average, minimum value, maximum value, total time, first difference and last value is done. Phase 2 has applied logistic regression model to predict patients in-hospital death (0 for survivor, 1 for died) on the set A. Third phase applies logistic regression model to obtain events 1 and 2 score. The results obtained are 0.4116 for score1 and 8.843 for score2. The paper [23] also reported a logistic regression model for the prediction of mortality. The experiment is done using 4,000 ICU patients for training in set A and 4,000 patients for testing in set B. During the filtering process it figures out 30 variables for building up model. Results obtained are score 0.451 for event 1 and score 2 45.010 for event 2. A novel cluster analysis technique is used in Ref. [24] to test the similarities between time series data for mortality prediction. For data preprocessing it uses a segmentation based approach to divide variables in several segments. The maximal and minimal values are used to maintain its statistical features. Weighted Euclidian distance based clustering and rule based classification is used. The average result obtained for death prediction is 22.77 to 33.08% and for live prediction is 75 to 86%.