LibCat » Книги » Приключения » unrecognised » Iain Pardoe - Applied Regression Modeling

Iain Pardoe - Applied Regression Modeling

Здесь есть возможность читать онлайн «Iain Pardoe - Applied Regression Modeling» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Applied Regression Modeling
Автор:
Iain Pardoe
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
5 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 100
- 1
- 2
- 3
- 4
- 5

Applied Regression Modeling: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Applied Regression Modeling»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Master the fundamentals of regression without learning calculus with this one-stop resource The newly and thoroughly revised 3rd Edition of
delivers a concise but comprehensive treatment of the application of statistical regression analysis for those with little or no background in calculus. Accomplished instructor and author Dr. Iain Pardoe has reworked many of the more challenging topics, included learning outcomes and additional end-of-chapter exercises, and added coverage of several brand-new topics including multiple linear regression using matrices.
The methods described in the text are clearly illustrated with multi-format datasets available on the book's supplementary website. In addition to a fulsome explanation of foundational regression techniques, the book introduces modeling extensions that illustrate advanced regression strategies, including model building, logistic regression, Poisson regression, discrete choice models, multilevel models, Bayesian modeling, and time series forecasting. Illustrations, graphs, and computer software output appear throughout the book to assist readers in understanding and retaining the more complex content.
covers a wide variety of topics, like:
Simple linear regression models, including the least squares criterion, how to evaluate model fit, and estimation/prediction Multiple linear regression, including testing regression parameters, checking model assumptions graphically, and testing model assumptions numerically Regression model building, including predictor and response variable transformations, qualitative predictors, and regression pitfalls Three fully described case studies, including one each on home prices, vehicle fuel efficiency, and pharmaceutical patches Perfect for students of any undergraduate statistics course in which regression analysis is a main focus,
also belongs on the bookshelves of non-statistics graduate students, including MBAs, and for students of vocational, professional, and applied courses like data science and machine learning.

Applied Regression Modeling — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Applied Regression Modeling», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

After reading this chapter you should be able to:

Summarize univariate data graphically and numerically.

Calculate and interpret a confidence interval for a univariate population mean.

Conduct and draw conclusions from a hypothesis test for a univariate population mean using both the rejection region and p‐value methods.

Calculate and interpret a prediction interval for an individual univariate value.

1.1 Identifying and Summarizing Data

One way to think about statistics is as a collection of methods for using data to understand a problem quantitatively—we saw many examples of this in the introduction. This book is concerned primarily with analyzing data to obtain information that can be used to help make decisions in real‐world contexts.

The process of framing a problem in such a way that it is amenable to quantitative analysis is clearly an important step in the decision‐making process, but this lies outside the scope of this book. Similarly, while data collection is also a necessary task—often the most time‐consuming part of any analysis—we assume from this point on that we have already obtained data relevant to the problem at hand. We will return to the issue of the manner in which these data have been collected—namely, whether we can consider the sample data to be representative of some larger population that we wish to make statistical inferences for—in Section 1.3.

For now, we consider identifying and summarizing the data at hand. For example, suppose that we have moved to a new city and wish to buy a home. In deciding on a suitable home, we would probably consider a variety of factors, such as size, location, amenities, and price. For the sake of illustration, we focus on price and, in particular, see if we can understand the way in which sale prices vary in a specific housing market. This example will run through the rest of the chapter, and, while no one would probably ever obsess over this problem to this degree in real life, it provides a useful, intuitive application for the statistical ideas that we use in the rest of the book in more complex problems.

For this example, identifying the data is straightforward: the units of observation are a random sample of size Applied Regression Modeling - изображение 81 single‐family homes in our particular housing market, and we have a single measurement for each observation, the sale price in thousands of dollars ($), represented using the notation . Here, картинка 83 is the generic letter used for any univariate data variable, while картинка 84 is the specific variable name for this dataset. These data, obtained from Victoria Whitman, a realtor in Eugene, Oregon, are available in the HOMES1data file on the book website—they represent sale prices of 30 homes in south Eugene during 2005. This represents a subset of a larger file containing more extensive information on 76 homes, which is analyzed as a case study in Chapter 6 (refer www.wiley.com/go/pardoe/AppliedRegressionModeling3e).

The particular sample in the HOMES1data file is random because the 30 homes have been selected randomly somehow from the population of all single‐family homes in this housing market. For example, consider a list of homes currently for sale, which are considered to be representative of this population. A random number generator—commonly available in spreadsheet or statistical software—can be used to pick out 30 of these. Alternative selection methods may or may not lead to a random sample. For example, picking the first 30 homes on the list would not lead to a random sample if the list were ordered by the size of the sale price.

We can simply list small datasets such as this. The values of картинка 85 in this case are as follows:

155.5	195.0	197.0	207.0	214.9	230.0	239.5	242.0	252.5	255.0
259.9	259.9	269.9	270.0	274.9	283.0	285.0	285.0	299.0	299.9
319.0	319.9	324.5	330.0	336.0	339.0	340.0	355.0	359.9	359.9

However, even for these data, it can be helpful to summarize the numbers with a small number of sample statistics (such as the sample mean and standard deviation), or with a graph that can effectively convey the manner in which the numbers vary. A particularly effective graph is a stem‐and‐leaf plot , which places the numbers along the vertical axis of the plot, ordered in adjoining data intervals (called “bins”) from the lowest at the top to the highest at the bottom. For example, a stem‐and‐leaf plot for the 30 sample prices looks like the following:

1 | 6 2 | 0011344 2 | 5666777899 3 | 002223444 3 | 666

In this plot, the decimal point is two digits to the right of the stem. So, the “1” in the stem and the “6” in the leaf represents 160 or, because of rounding, any number between 155 and 164.9. In particular, it represents the lowest price in the dataset of 155.5 (thousand dollars). The next part of the graph shows two prices between 195 and 204.9, two prices between 205 and 214.9, one price between 225 and 234.9, two prices between 235 and 244.9, and so on. A stem‐and‐leaf plot can easily be constructed by hand for small datasets such as this, or it can be constructed automatically using statistical software. The appearance of the plot can depend on the type of statistical software used—this particular plot was constructed using R statistical software (as are all the plots in this book). Instructions for constructing stem‐and‐leaf plots are available as computer help #13 in the software information files available from the book website at www.wiley.com/go/pardoe/AppliedRegressionModeling3e.

The overall impression from this graph is that the sample prices range from the mid‐150s to the mid‐350s, with some suggestion of clustering around the high 200s. Perhaps the sample represents quite a range of moderately priced homes, but with no very cheap or very expensive homes. This type of observation often arises throughout a data analysis—the data begin to tell a story and suggest possible explanations. A good analysis is usually not the end of the story since it will frequently lead to other analyses and investigations. For example, in this case, we might surmise that we would probably be unlikely to find a home priced at much less than картинка 86 in this market, but perhaps a realtor might know of a nearby market with more affordable housing.

A few modifications to a stem‐and‐leaf plot produce a histogram —the value axis is now horizontal rather than vertical, and the counts of observations within the bins are displayed as bars (with the counts, or frequency, shown on the vertical axis) rather than by displaying individual values with digits. Figure 1.1shows a histogram for the home prices data generated by statistical software (see computer help #14).