LibCat » Книги » Приключения » unrecognised » Yong Chen - Industrial Data Analytics for Diagnosis and Prognosis

Yong Chen - Industrial Data Analytics for Diagnosis and Prognosis

Здесь есть возможность читать онлайн «Yong Chen - Industrial Data Analytics for Diagnosis and Prognosis» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Industrial Data Analytics for Diagnosis and Prognosis
Автор:
Yong Chen
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
5 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 100
- 1
- 2
- 3
- 4
- 5

Industrial Data Analytics for Diagnosis and Prognosis: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Industrial Data Analytics for Diagnosis and Prognosis»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Discover data analytics methodologies for the diagnosis and prognosis of industrial systems under a unified random effects model

In
, distinguished engineers Shiyu Zhou and Yong Chen deliver a rigorous and practical introduction to the random effects modeling approach for industrial system diagnosis and prognosis. In the book’s two parts, general statistical concepts and useful theory are described and explained, as are industrial diagnosis and prognosis methods. The accomplished authors describe and model fixed effects, random effects, and variation in univariate and multivariate datasets and cover the application of the random effects approach to diagnosis of variation sources in industrial processes. They offer a detailed performance comparison of different diagnosis methods before moving on to the application of the random effects approach to failure prognosis in industrial processes and systems.
In addition to presenting the joint prognosis model, which integrates the survival regression model with the mixed effects regression model, the book also offers readers:
A thorough introduction to describing variation of industrial data, including univariate and multivariate random variables and probability distributions Rigorous treatments of the diagnosis of variation sources using PCA pattern matching and the random effects model An exploration of extended mixed effects model, including mixture prior and Kalman filtering approach, for real time prognosis A detailed presentation of Gaussian process model as a flexible approach for the prediction of temporal degradation signals Ideal for senior year undergraduate students and postgraduate students in industrial, manufacturing, mechanical, and electrical engineering,
is also an indispensable guide for researchers and engineers interested in data analytics methods for system diagnosis and prognosis.

Industrial Data Analytics for Diagnosis and Prognosis — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Industrial Data Analytics for Diagnosis and Prognosis», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Bibliographic Notes

Some examples of good books on system diagnosis and prognosis in engineering area are Lewis et al. [2011], Niu [2017], Wu et al. [2006], Talebi et al. [2009], Gertler [1998], Chen and Patton [2012], Witczak [2007], Isermann [2011], Ding [2008], Si et al. [2017]. Many good textbooks are available on industrial statistics. For example, Montgomery [2009], DeVor et al. [2007], Colosimo and Del Castillo [2006], Wu and Hamada [2011] are on statistical monitoring and design. On the failure event analysis and prognosis, Meeker and Escobar [2014], Rausand et al. [2004], Elsayed [2012] are commonly cited references.

Part I Statistical Methods and Foundation for Industrial Data Analytics

2 Introduction to Data Visualization and Characterization

Before making a chess move, an experienced chess player first explores the positions of the pieces on the chess board for noticeable patterns such as opponent’s threats, special relationships between chess pieces, and the strengths and weaknesses of both sides, before digging into in-depth calculation of move sequences to find the optimal move. Similarly, a data scientist should also start with an exploration of the data set for noticeable patterns before conducting any in-depth analysis by building a sophisticated mathematical model or running a computationally intensive algorithm. Simple data exploration methods can help understand the basic data structure such as dimension and types of variables; discover initial patterns such as relationships among variables; identify missing values, outliers, and skewed distribution for the needs of data pre-processing and transformation. This chapter focuses on basic graphical and numerical methods for data description and exploration. We first look at a data set in the following example.

Example 2.1( auto_specdata) The data set in auto_spec.csv, which is from the UCI Machine Learning Repository [Dua and Graff, 2017], contains the specifications of a sample of cars. The following Rcodes can be used to read the data file and obtain information on basic characteristics and structure of the data set.

# load data

auto.spec.df <- read.csv ("auto_spec.csv", header = T)

# show basic information of data set

dim (auto.spec.df)

names (auto.spec.df)

head(auto.spec.df)

summary(auto.spec.df)> dim(auto.spec.df) [1] 205 23 > names(auto.spec.df) [1] "make" "fuel.type" "aspiration" [4] "num.of.doors" "body.style" "drive.wheels" [7] "engine.location" "wheel.base" "length" [10] "width" "height" "curb.weight" [13] "engine.type" "num.of.cylinders" "engine.size" [16] "fuel.system" "bore" "stroke" [19] "compression.ratio" "horsepower" "peak.rpm" [22] "city.mpg" "highway.mpg" > head(auto.spec.df) Make Fuel.type Aspiration Num.of.doors Body.style Drive wheels 1 Alfa-Romeo Gas Std Two Convertible Rwd 2 Alfa-Romeo Gas Std Two Convertible Rwd 3 Alfa-Romeo Gas Std Two Hatchback Rwd 4 Audi Gas Std Four Sedan Fwd 5 Audi Gas Std Four Sedan Fwd 6 Audi Gas Std Two Sedan Fwd .... Horsepower Peak.rpm City.mpg Highway.mpg 1 111 5000 21 27 2 111 5000 21 27 3 154 5000 19 26 4 102 5500 24 30 5 115 5500 18 22 6 110 5500 19 25 > summary(auto.spec.df) Make Fuel.type Aspiration Num.of.doors Body.style Toyota : 32 Diesel: 20 Std :168 Four:114 Convertible: 6 Nissan : 18 Gas : 185 Turbo: 37 two : 89 Hardtop : 8 Mazda : 17 NA’s: 2 Hatchback :70 Honda : 13 Sedan :96 Mitsubishi : 13 Wagon :25 Subaru : 12 (Other) : 100 .... City.mpg Highway.mpg Min. :13.00 Min. :16.00 1st Qu.:19.00 1st Qu.:25.00 Median :24.00 Median :30.00 Mean :25.22 Mean :30.75 3rd Qu.:30.00 3rd Qu.:34.00 Max. :49.00 Max. :54.00

From the Routputs, we see that this data set contains 205 observations on 23 variables including manufacturer, fuel type, body style, dimension, horsepower, miles per gallon, and other specifications of a car. In statistics and data mining literature, an observation is also called a record, a data point, a case, a sample, an entity, an instance, or a subject, etc. The variables associated with an observation are also called attributes, fields, characteristics, or features, etc. The summary()function shows the basic summary information of each variable such as the mean, median, and range of values. From the summary information, it is obvious that there are two types of variable. A variable such as fuel.typeand body.stylehas a finite number of possible values, and there is no numerical relationship among the values. Such a variable is referred to as a categorical variable . On the other hand, a variable such as highway.mpgand horsepowerhas continuous numerical values, and is referred to as a numerical variable . Beyond the basic data summary, graphical methods can be used to show more patterns of both types of variables, as discussed in the following subsection.

Note from the results of summary(), several variables in the auto_specdata set have missing values , which are represented by NA. Missing values are a common occurrence in real world data sets. There are various ways to handle the missing values in a data set. If the number of observations with missing values is small, those observations might be simply omitted. To do this, we can use the Rfunction na.omit(). From the following Rcodes we can see that there are 205 − 197 = 8 observations with missing values in this data set. So simply removing these observations is a reasonable way to handle the missing values for this data set.

> dim(na.omit(auto.spec.df))

[1] 197 23

If a significant number of observations in a data set have missing values, an alternative to simply removing observations with missing values is imputation , which is a process of replacing missing values with substituted values. A simple method of imputation is to replace missing values with a mean or median of the variable. More sophisticated procedures such as regression-based imputation do exist. These methods play important roles mainly in medical and scientific studies, where data collection from patients or subjects is often costly. In most industrial data analytics applications where data are typically abundant, simpler methods of handling missing values are usually sufficient.

2.1 Data Visualization

Data visualization is used to represent the data using graphical methods. It is one of the most effective and intuitive ways to explore the important patterns in the data such as data distribution, relationship among variables, surprising clusters, and outliers. Data visualization is a fast-growing area and a large number and variety of tools have been developed. This section discusses some of the most basic and useful types of graphical methods or data plots for industrial data analytics applications.

2.1.1 Distribution Plots for a Single Variable

Bar charts can be used to display the distribution of a categorical variable, while histograms and box plots are useful tools to display the distribution of a numerical variable.

Distribution of A Categorical Variable – Bar Chart

In a bar chart, the horizontal axis corresponds to all possible values/categories of a categorical variable. The vertical axis shows the number of observations in each category. To draw a bar chart for a categorical variable in R, we need to first use the table()function to count the number of observations in each category. Then the barplot()function can be used to plot the calculated counts. For example, the following Rcodes plot the distribution of the body.stylevariable in the auto_specdata set.