LibCat » Книги » Приключения » unrecognised » David W. Scott - Statistics

David W. Scott - Statistics

Здесь есть возможность читать онлайн «David W. Scott - Statistics» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Statistics
Автор:
David W. Scott
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
5 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 100
- 1
- 2
- 3
- 4
- 5

Statistics: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Statistics»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Statistic: A Concise Mathematical Introduction for Students and Scientists The book places focus early on continuous measurements, as well as discrete random variables. By invoking simple and intuitive models and geometric probability, discrete and continuous experiments and probabilities are discussed throughout the book in a natural way. Classical probability, random variables, and inference are discussed, as well as material on understanding data and topics of special interest. Topics discussed include:
• Classical equally likely outcomes
• Variety of models of discrete and continuous probability laws
• Likelihood function and ratio
• Inference
• Bayesian statistics
With the growth in the volume of data generated in many disciplines that is enabling the growth in data science, companies now demand statistically literate scientists and this textbook is the answer, suited for undergraduates studying science or engineering, be it computer science, economics, life sciences, environmental, business, amongst many others. Basic knowledge of bivariate calculus, R language, Matematica and JMP is useful, however there is an accompanying website including sample R and Mathematica code to help instructors and students.

Statistics — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Statistics», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

There are two basic tasks for the statistician. First is to characterize the distribution of possible outcomes using a batch of representative data. An actuary may be asked to find a dollar loss for car accidents that is not exceeded 99.999% of the time. An economist may be asked to provide useful summaries of a collection of income data. The histogram is our primary tool here, an idea that did not appear until the 17th century; see Graunt (1662), who analyzed death records during height of the plague outbreak in Europe.

The second task is that of prediction. A bank may wish to understand how credit risk is related to other information that may be available. A mechanical engineer may wish to understand the risk inherent in a new design under extreme conditions. Methods for performing this task underlie many algorithms today, for example, translating foreign languages or image recognition.

The mathematical backbone of all of our statistical methods is probability theory. Thus we study the basics of probability theory and random variables in the first part of this course. Statistical methods and the basics of statistical decision theory form the core of the middle third of this course. Specific tests and data analysis approaches finish our study.

1.1 Exploring the Distribution of Data

Tukey (1977) introduced a number of data summaries in his book Exploratory Data Analysis . Many are based on quantiles or percentiles of the data vector. Percentiles are particular choices of the sorted data. The middlemost is the median, or the 50th percentile. As a measure of spread, Tukey focused on the distance from the 25th to the 75th percentiles, the so‐called interquartile range (IQR). A three‐point summary would list these percentiles. Instead Tukey popularized the box‐and‐whiskers plot, which is a five‐point summary. The additional two points are intended to capture 99% of the data. These are drawn at a distance of картинка 64 from the two quartiles. Any points outside these whiskers are plotted as potential outliers .

1.1.1 Pearson's Father–Son Height Data

We illustrate these ideas on a set of data collected by Karl Pearson over a century ago. He recorded the heights of картинка 65 fathers and an adult son. In the left frame in Figure 1.1, we display a box‐and‐whiskers plot of these data. We see that the sons are taller than their fathers by about an inch. There are also more potential outliers among the sons for some reason.

In the middle frame of Figure 1.1, we show Tukey's stem‐and‐leaf plot of the 1078 differences of the heights of each son and his father. The range of the data is Statistics - изображение 66 and the first seven sorted values rounded to one decimal place are . Each data point is decomposed into a stem and a leaf digit. Thus Statistics - изображение 68 has a stem of Statistics - изображение 69 and a leaf of 0. The top line is actually Statistics - изображение 70 , although it is too small to see. With so much data, each stem is broken into two lines to provide more detail. Thus the next two lines show a stem of Statistics - изображение 71 but no leaves Statistics - изображение 72 twice. The fourth line shows Statistics - изображение 73 and the fifth line reads Statistics - изображение 74 and so on. This figure was generated using the command RCore Team 2018 The default has half as many st - фото 75 command ; RCore Team (2018). (The default картинка 77 has half as many stems.) Thus the stem‐and‐leaf plot shows the frequency count of points for each stem as character strings.

In the right frame of Figure 1.1, we show the frequency counts in a histogram. The histogram uses a parameter called the bin width to construct an equally spaced mesh . Then we count the number of points in each interval. These counts are displayed as a bar chart. (The histogram can use any anchor point, although 0 is a common choice.) For the histogram shown, the anchor point selected was 0, and картинка 80 was chosen using Scott's rule картинка 81 ; see Scott (1979). This rule is discussed in Section 9.1.4.1. The default choice in картинка 82 function histis Sturges' rule, discussed in Section 9.1.4.3, which chooses 11 bins with картинка 83 (not shown).

The choice of картинка 84 is often considered a matter of convenience. The stem‐and‐leaf plot using one‐digit integer stems limits its choices. By way of contrast, any positive real number картинка 85 can be used in a histogram. In Figure 1.2, we show the histograms using картинка 86 by Scott's rule, as well as картинка 87 and картинка 88 . Loosely speaking, the histograms using картинка 89 are missing useful information, while the histograms using картинка 90 display spurious detail. We discuss strategies for finding the best choice of in Section 91 In any case the histogram is a powerful tool for understanding - фото 91 in Section 9.1. In any case, the histogram is a powerful tool for understanding the full distribution of data.