LibCat » Книги » Приключения » unrecognised » Samprit Chatterjee - Handbook of Regression Analysis With Applications in R

Samprit Chatterjee - Handbook of Regression Analysis With Applications in R

Здесь есть возможность читать онлайн «Samprit Chatterjee - Handbook of Regression Analysis With Applications in R» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Handbook of Regression Analysis With Applications in R
Автор:
Samprit Chatterjee
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Handbook of Regression Analysis With Applications in R: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Handbook of Regression Analysis With Applications in R»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

H
andbook and reference guide for students and practitioners of statistical regression-based analyses in R
Handbook of Regression Analysis
with Applications in R, Second Edition
The book further pays particular attention to methods that have become prominent in the last few decades as increasingly large data sets have made new techniques and applications possible. These include:
Regularization methods Smoothing methods Tree-based methods In the new edition of the
, the data analyst’s toolkit is explored and expanded. Examples are drawn from a wide variety of real-life applications and data sets. All the utilized R code and data are available via an author-maintained website.
Of interest to undergraduate and graduate students taking courses in statistics and regression, the
will also be invaluable to practicing data scientists and statisticians.

Handbook of Regression Analysis With Applications in R — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Handbook of Regression Analysis With Applications in R», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Note that model comparisons are only sensible when based on the same data set. Most statistical packages drop any observations that have missing data in any of the variables in the model. If a data set has missing values scattered over different predictors, the set of observations with complete data will change depending on which variables are in the model being examined, and model comparison measures will not be comparable. One way around this is to only use observations with complete data for all variables under consideration, but this can result in discarding a good deal of available information for any particular model.

Tools like best subsets by their very nature are likely to be more effective when there are a relatively small number of useful predictors that have relatively strong effects, as opposed to a relatively large number of predictors that have relatively weak effects. The strict present/absent choice for a predictor is consistent with true relationships with either zero or distinctly nonzero slopes, as opposed to many slopes that are each nonzero but also not far from zero.

2.3.2 EXAMPLE — ESTIMATING HOME PRICES (CONTINUED)

Consider again the home price data examined in Section 1.4. We repeat the regression output from the model based on all of the predictors below:

Coefficients: Estimate Std.Error t value Pr(>|t|) VIF (Intercept) -7.149e+06 3.820e+06 -1.871 0.065043 . Bedrooms -1.229e+04 9.347e+03 -1.315 0.192361 1.262 Bathrooms 5.170e+04 1.309e+04 3.948 0.000171 1.420 *** Living.area 6.590e+01 1.598e+01 4.124 9.22e-05 1.661 *** Lot.size -8.971e-01 4.194e+00 -0.214 0.831197 1.074 Year.built 3.761e+03 1.963e+03 1.916 0.058981 1.242 . Property.tax 1.476e+00 2.832e+00 0.521 0.603734 1.300 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 47380 on 78 degrees of freedom Multiple R-squared: 0.5065, Adjusted R-squared: 0.4685 F-statistic: 13.34 on 6 and 78 DF, p-value: 2.416e-10

This is identical to the output given earlier, except that variance inflation factor ( картинка 266 ) values are given for each predictor. It is apparent that there is virtually no collinearity among these predictors (recall that картинка 267 is the minimum possible value of the картинка 268 ), which should make model selection more straightforward. The following output summarizes a best subsets fitting:

P L r i Y o B v e p B a i L a e e t n o r r d h g t . t r r . . b y o o a s u . o o r i i t Mallows m m e z l a Vars R-Sq R-Sq(adj) Cp AICc S s s a e t x 1 35.3 34.6 21.2 1849.9 52576 X 1 29.4 28.6 30.6 1857.3 54932 X 1 10.6 9.5 60.3 1877.4 61828 X 2 46.6 45.2 5.5 1835.7 48091 X X 2 38.9 37.5 17.5 1847.0 51397 X X 2 37.8 36.3 19.3 1848.6 51870 X X 3 49.4 47.5 3.0 1833.1 47092 X X X 3 48.2 46.3 4.9 1835.0 47635 X X X 3 46.6 44.7 7.3 1837.5 48346 X X X 4 50.4 48.0 3.3 1833.3 46885 X X X X 4 49.5 47.0 4.7 1834.8 47304 X X X X 4 49.4 46.9 5.0 1835.1 47380 X X X X 5 50.6 47.5 5.0 1835.0 47094 X X X X X 5 50.5 47.3 5.3 1835.2 47162 X X X X X 5 49.6 46.4 6.7 1836.8 47599 X X X X X 6 50.6 46.9 7.0 1836.9 47381 X X X X X X

Output of this type provides the tools to choose among candidate models. The output provides summary statistics for the three models with strongest fit for each number of predictors. So, for example, the best one‐predictor model is based on Bathrooms, while the second best is based on Living.area; the best two‐predictor model is based on Bathroomsand Living.area; and so on. The principle of parsimony noted earlier implies moving down the table as long as the gain in fit is big enough, but no further, thereby encouraging simplicity. A reasonable model selection strategy would not be based on only one possible measure, but looking at all of the measures together, using various guidelines to ultimately focus in on a few models (or only one) that best trade off strength of fit with simplicity, for example as follows:

1 Increase the number of predictors until the value levels off. Clearly, the highest for a given cannot be smaller than that for a smaller value of . If levels off, that implies that additional variables are not providing much additional fit. In this case, the largest values go from roughly to from to , which is clearly a large gain in fit, but beyond that more complex models do not provide much additional fit (particularly past ). Thus, this guideline suggests choosing either or .

2 Choose the model that maximizes the adjusted . Recall from equation (1.7)that the adjusted equalsIt is apparent that explicitly trades off strength of fit () versus simplicity [the multiplier ], and can decrease if predictors that do not add any predictive power are added to a model. Thus, it is reasonable to not complicate a model beyond the point where its adjusted increases. For these data, is maximized at .

The fourth column in the output refers to a criterion called Mallows' Handbook of Regression Analysis With Applications in R - изображение 269 (Mallows, 1973). This criterion equals

Handbook of Regression Analysis With Applications in R - изображение 270

where Handbook of Regression Analysis With Applications in R - изображение 271 is the residual sum of squares for the model being examined, картинка 272 is the number of predictors in that model, and картинка 273 is the residual mean square based on using all картинка 274 of the candidate predicting variables. картинка 275 is designed to estimate the expected squared prediction error of a model. Like картинка 276 , картинка 277 explicitly trades off strength of fit versus simplicity, with two differences: it is now small values that are desirable, and the penalty for complexity is stronger, in that the penalty term now multiplies the number of predictors in the model by картинка 278 , rather than by картинка 279 (which means that using картинка 280 will tend to lead to more complex models than using картинка 281 will). This suggests another model selection rule: