LibCat » Книги » Приключения » unrecognised » Jane M. Horgan - Probability with R

Jane M. Horgan - Probability with R

Здесь есть возможность читать онлайн «Jane M. Horgan - Probability with R» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Probability with R
Автор:
Jane M. Horgan
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Probability with R: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Probability with R»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Provides a comprehensive introduction to probability with an emphasis on computing-related applications This self-contained new and extended edition outlines a first course in probability applied to computer-related disciplines. As in the first edition, experimentation and simulation are favoured over mathematical proofs. The freely down-loadable statistical programming language
is used throughout the text, not only as a tool for calculation and data analysis, but also to illustrate concepts of probability and to simulate distributions. The examples in
cover a wide range of computer science applications, including: testing program performance; measuring response time and CPU time; estimating the reliability of components and systems; evaluating algorithms and queuing systems.
Chapters cover: The R language; summarizing statistical data; graphical displays; the fundamentals of probability; reliability; discrete and continuous distributions; and more.
This second edition includes:
improved R code throughout the text, as well as new procedures, packages and interfaces; updated and additional examples, exercises and projects covering recent developments of computing; an introduction to bivariate discrete distributions together with the R functions used to handle large matrices of conditional probabilities, which are often needed in machine translation; an introduction to linear regression with particular emphasis on its application to machine learning using testing and training data; a new section on spam filtering using Bayes theorem to develop the filters; an extended range of Poisson applications such as network failures, website hits, virus attacks and accessing the cloud; use of new allocation functions in R to deal with hash table collision, server overload and the general allocation problem. The book is supplemented with a Wiley Book Companion Site featuring data and solutions to exercises within the book.
Primarily addressed to students of computer science and related areas,
is also an excellent text for students of engineering and the general sciences. Computing professionals who need to understand the relevance of probability in their areas of practice will find it useful.

Probability with R — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Probability with R», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Standard deviation:The standard deviation ( sd) measures how much the data values deviate from their average. It is the square root of the average squared deviations from the mean. A small standard deviation implies most values are near the mean. A large standard deviation indicates that values are widely spread above and below the mean.

In R

sd(downtime)

yields

[1] 14.27164.

Recall that we calculated the mean to be 25.04 minutes. We might loosely describe the downtime as being “25 minutes on average give or take 14 minutes.”

For the data in картинка 100

sapply(results[2:5], sd, na.rm = TRUE)

gives the standard deviation of each examination subject in картинка 101 :

arch1 prog1 arch2 prog2 24.37469 23.24012 21.99061 27.08082

Quantiles:The quantiles divide the data into proportions, usually into quarters called quartiles, tenths called deciles, and percentages called percentiles. The default calculation in R is quartiles.

quantile(downtime)

gives

0% 25% 50% 75% 100% 0.0 16.0 25.0 31.5 51.0

The first quartile (16.0) is the value that breaks the data so that 25% is below this value and 75% is above.

The second quartile (25.0) is the value that breaks the data so that 50% is below and 50% is above (notice that the 2nd quartile is the median).

The third quartile (31.5) is the value that breaks the data so that 75% is below and 25% is above.

We could say that 25% of the computer systems in the laboratory experienced less than 16 minutes of downtime, another 25% of them were down for between 16 and 25 minutes, and so on.

Interquartile range : The difference between the first and third quartiles is called the interquartile range and is sometimes used as a rough estimate of the standard deviation. In downtime it is Probability with R - изображение 102 , not too far away from 14.27, which we calculated to be the standard deviation.

Deciles : Deciles divide the data into tenths. To get the deciles in R , first define the required break points

deciles <- seq(0, 1, 0.1)

The function seqcreates a vector consisting of an equidistant series of numbers. In this case, seqassigns values in [0, 1] in intervals of 0.1 to the vector called deciles. Writing in R

deciles

shows what the vector contains

[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Adding this extra argument to the quantile function

quantile(downtime, deciles)

yields

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0.0 4.0 12.8 19.8 22.6 25.0 29.2 30.0 34.8 44.8 51.0

Interpreting this output, we could say that 90% of the computer systems in the laboratory experienced less than 45 minutes of downtime.

Similarly, for the percentiles, use

percentiles <- seq(0, 1, 0.01)

as an argument in the quantile function, and write

quantile(downtime, percentiles)

2.3 Overall Summary Statistics

A quicker way of summarizing the data is to use the summaryfunction.

summary(downtime)

returns

Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 16.00 25.00 25.04 31.50 51.00

which are the minimum the first quartile, the median, the mean, the third quartile, and the maximum, respectively.

For картинка 103 , we might write

summary(arch1)

which gives

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 3.00 46.75 68.50 63.57 83.25 100.00 3.00

An entire data frame may be summarized by using the summary command. Let us do this in the data frame картинка 104 . First, it is wise to make a declaration about the categorical variable gender.

gender <- factor(gender)

designates the variable gender as a factor, and ensures that it is treated as such in the summaryfunction.

summary(results) gender arch1 prog1 arch2 prog2 f: 19 Min. : 3.00 Min. :12.00 Min. : 6.00 Min. : 5.00 m:100 1st Qu.: 46.75 1st Qu.:40.00 1st Qu.:40.00 1st Qu.:30.00 Median : 68.50 Median :64.00 Median :48.00 Median :57.00 Mean : 63.57 Mean :59.02 Mean :51.97 Mean :53.78 3rd Qu.: 83.25 3rd Qu.:78.00 3rd Qu.:61.00 3rd Qu.:76.50 Max. :100.00 Max. :98.00 Max. :98.00 Max. :97.00 NA's : 3.00 NA's : 2.00 NA's : 4.00 NA's : 8.00

Notice how the display for gender is different than that for the other variables; we are simply given the frequency for each gender.

2.4 Programming in R

One of the great benefits of R is that it is possible to write your own programs and use them as functions in your analysis. Programming is extremely simple in R because of the way it handles vectors and data frames. To illustrate, let us write a program to calculate the mean of картинка 105 . The formula for the mean of a variable Probability with R - изображение 106 with values is given by

In standard programming languages, implementing this formula would necessitate initialization and loops, but with R , statistical calculations such as these are much easier to implement. For example,

sum(downtime)

gives

576

which is the sum of the elements in картинка 109

length(downtime)

gives

gives the number of elements in картинка 110 .

To calculate the mean, write

meandown <- sum(downtime)/length(downtime) meandown [1] 25.04348

Let us also look at how to calculate the standard deviation of the data in картинка 111 .

The formula for the standard deviation of data points stored in an vector is We illustrate step by step how this is calculated for - фото 113 vector is