The book contains the following stylistic conventions:
When displaying calculated values, the general approach is to be as accurate as possible when it matters (such as in intermediate calculations for problems with many steps), but to round appropriately when convenient or when reporting final results for real‐world questions. Displayed results from statistical software use the default rounding employed in R throughout.
In the author's experience, many students find some traditional approaches to notation and terminology a barrier to learning and understanding. Thus, some traditions have been altered to improve ease of understanding. These include: using familiar Roman letters in place of unfamiliar Greek letters (e.g., rather than and rather than ); replacing the nonintuitive for the sample mean of with ; using and for null hypothesis and alternative hypothesis, respectively, rather than the usual and .
Major changes for the third edition
The second edition of this book was used in the regression analysis course run by Statistics.com from 2012 to 2020. The lively discussion boards provided an invaluable source for suggestions for changes to the book. This edition clarifies and expands on concepts that students found challenging and addresses every question posed in those discussions.
There is expanded material on assessing model assumptions, analysis of variance, sums of squares, lack of fit testing, hierarchical models, influential observations, weighted least squares, multicollinearity, and logistic regression.
A new appendix provides an informal overview of matrices in the context of multiple linear regression.
I've added learning objectives to the beginning of each chapter and text boxes at the end of each section that summarize the important concepts.
As in the first two editions, this edition uses mathematics to explain methods and techniques only where necessary, and formulas are used within the text only when they are instructive. However, the book also includes additional formulas in optional sections to aid those students who can benefit from more mathematical detail.
I've added many more end‐of‐chapter problems. In total, the number of problems has increased by nearly 70%.
I've updated and added new references.
The book website has been expanded to include instructional videos and practice quizzes.
Iain Pardoe
Nelson, British Columbia
January, 2020
I am grateful to a number of people who helped to make this book a reality. Dennis Cook and Sandy Weisberg first gave me the textbook‐writing bug when they approached me to work with them on their classic applied regression book [Cook and Weisberg, 1999], and Dennis subsequently motivated me to transform my teaching class notes into my own applied regression book. People who provided data for examples used throughout the book include: Victoria Whitman for the house price examples; Wolfgang Jank for the autocorrelation example on beverage sales; Craig Allen for the case study on pharmaceutical patches; Cathy Durham for the Poisson regression example in the chapter on extensions. The multilevel and Bayesian modeling sections of the chapter on extensions are based on work by Andrew Gelman and Hal Stern. A variety of anonymous reviewers provided extremely useful feedback on the second edition of the book, as did many of my students at the University of Oregon and Statistics.com. Finally, I'd like to thank colleagues at Thompson Rivers University and the Pennsylvania State University, as well as Kathleen Santoloci and Mindy Okura‐Marszycki at Wiley.
Iain Pardoe
INTRODUCTION
I.1 STATISTICS IN PRACTICE
Statistics is used in many fields of application since it provides an effective way to analyze quantitative information. Some examples include:
A pharmaceutical company is developing a new drug for treating a particular disease more effectively. How might statistics help you decide whether the drug will be safe and effective if brought to market?Clinical trials involve large‐scale statistical studies of people—usually both patients with the disease and healthy volunteers—who are assessed for their response to the drug. To determine that the drug is both safe and effective requires careful statistical analysis of the trial results, which can involve controlling for the personal characteristics of the people (e.g., age, gender, health history) and possible placebo effects, comparisons with alternative treatments, and so on.
A manufacturing firm is not getting paid by its customers in a timely manner—this costs the firm money on lost interest. You've collected recent data for the customer accounts on amount owed, number of days since the customer was billed, and size of the customer (small, medium, large). How might statistics help you improve the on‐time payment rate?You can use statistics to find out whether there is an association between the amount owed and the number of days and/or size. For example, there may be a positive association between amount owed and number of days for small and medium‐sized customers but not for large‐sized customers—thus it may be more profitable to focus collection efforts on small and medium‐sized customers billed some time ago, rather than on large‐sized customers or customers billed more recently.
A firm makes scientific instruments and has been invited to make a sealed bid on a large government contract. You have cost estimates for preparing the bid and fulfilling the contract, as well as historical information on similar previous contracts on which the firm has bid (some successful, others not). How might statistics help you decide how to price the bid?You can use statistics to model the association between the success/failure of past bids and variables such as bid cost, contract cost, bid price, and so on. If your model proves useful for predicting bid success, you could use it to set a maximum price at which the bid is likely to be successful.
As an auditor, you'd like to determine the number of price errors in all of a company's invoices—this will help you detect whether there might be systematic fraud at the company. It is too time‐consuming and costly to examine all of the company's invoices, so how might statistics help you determine an upper bound for the proportion of invoices with errors?Statistics allows you to infer about a population from a relatively small random sample of that population. In this case, you could take a sample of 100 invoices, say, to find a proportion, p, such that you could be 95% confident that the population error rate is less than that quantity p.
A firm manufactures automobile parts and the factory manager wants to get a better understanding of overhead costs. You believe two variables in particular might contribute to cost variation: machine hours used per month and separate production runs per month. How might statistics help you to quantify this information?You can use statistics to build a multiple linear regression model that estimates an equation relating the variables to one another. Among other things you can use the model to determine how much cost variation can be attributed to the two cost drivers, their individual effects on cost, and predicted costs for particular values of the cost drivers.
You work for a computer chip manufacturing firm and are responsible for forecasting future sales. How might statistics be used to improve the accuracy of your forecasts?Statistics can be used to fit a number of different forecasting models to a time series of sales figures. Some models might just use past sales values and extrapolate into the future, while others might control for external variables such as economic indices. You can use statistics to assess the fit of the various models, and then use the best‐fitting model, or perhaps an average of the few best‐fitting models, to base your forecasts on.
Читать дальше