LibCat » Книги » Приключения » unrecognised » Jane M. Horgan - Probability with R

Jane M. Horgan - Probability with R

Здесь есть возможность читать онлайн «Jane M. Horgan - Probability with R» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Probability with R
Автор:
Jane M. Horgan
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Probability with R: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Probability with R»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Provides a comprehensive introduction to probability with an emphasis on computing-related applications This self-contained new and extended edition outlines a first course in probability applied to computer-related disciplines. As in the first edition, experimentation and simulation are favoured over mathematical proofs. The freely down-loadable statistical programming language
is used throughout the text, not only as a tool for calculation and data analysis, but also to illustrate concepts of probability and to simulate distributions. The examples in
cover a wide range of computer science applications, including: testing program performance; measuring response time and CPU time; estimating the reliability of components and systems; evaluating algorithms and queuing systems.
Chapters cover: The R language; summarizing statistical data; graphical displays; the fundamentals of probability; reliability; discrete and continuous distributions; and more.
This second edition includes:
improved R code throughout the text, as well as new procedures, packages and interfaces; updated and additional examples, exercises and projects covering recent developments of computing; an introduction to bivariate discrete distributions together with the R functions used to handle large matrices of conditional probabilities, which are often needed in machine translation; an introduction to linear regression with particular emphasis on its application to machine learning using testing and training data; a new section on spam filtering using Bayes theorem to develop the filters; an extended range of Poisson applications such as network failures, website hits, virus attacks and accessing the cloud; use of new allocation functions in R to deal with hash table collision, server overload and the general allocation problem. The book is supplemented with a Wiley Book Companion Site featuring data and solutions to exercises within the book.
Primarily addressed to students of computer science and related areas,
is also an excellent text for students of engineering and the general sciences. Computing professionals who need to understand the relevance of probability in their areas of practice will find it useful.

Probability with R — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Probability with R», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

gives Fig. 3.9

Figure 3.9A Histogram with Five Breaks of Equal Width

Recall that parcan be used to represent all the subjects in one diagram. Type

par (mfrow = c(2,2)) hist(arch1, xlab = "Architecture", main = "Semester 1", ylim = c(0, 35)) hist(arch2, xlab = "Architecture", main = "Semester 2", ylim = c(0, 35)) hist(prog1, xlab = "Programming", main = " ", ylim = c(0, 35)) hist(prog2, xlab = "Programming", main = " ", ylim = c(0, 35))

to get Fig. 3.10. The ylim = c(0, 35)ensures that the ‐axis is the same scale for all the four subjects.

Figure 3.10Histogram of Each Subject in Each Semester

Up until now, we have invoked the default parameters of the histogram, notably the bin widths are equal and the frequency in each bin is calculated. These parameters may be changed as appropriate. For example, you may want to specify the bin break‐points to represent the failures and the various classes of passes and honors.

bins <- c(0, 40, 60, 80, 100)hist(prog1, xlab ="Marks (%)", main = "Programming Semester 1", breaks = bins)

yields Fig. 3.11.

Figure 3.11A Histogram with Breaks of a Specified Width

In Fig. 3.11, observe that the картинка 140 ‐axis now represents the density. When the bins are not of equal length, R returns a normalized histogram, so that its total area is equal to one.

To get a histogram of percentages, write in R

h <- hist(prog1, plot = FALSE, breaks = 5) #this postpones the plot display h$density <- h$counts/sum(h$counts)*100 #this calculates percentages plot(h, xlab = "Marks (%)", freq = FALSE, ylab = "Percentage", main = "Programming Semester 1")

The output is given in Fig. 3.12. The # allows for a comment. Anything written after # is ignored.

Figure 3.12Histogram with Percentages

3.3 STEM AND LEAF

The stem and leaf diagram is a more modern way of displaying data than the histogram. It is a depiction of the shape of the data using the actual numbers observed. Similar to the histogram, the stem and leaf gives the frequencies of categories of the variable, but it goes further than that and gives the actual values in each category.

The marks obtained in Programming in Semester 1 are depicted as a stem and leaf diagram using

stem(prog1)

which yields Fig. 3.13.

The decimal point is 1 digit(s) to the right of the | 1 | 2344 1 | 59 2 | 11 2 | 5556777889999 3 | 0113 3 | 6 4 | 00000000 4 | 6779 5 | 12223344 5 | 56679 6 | 0011123444 6 | 566777888999 7 | 0112344 7 | 5666666899 8 | 001112222334 8 | 5678899 9 | 0122 9 | 7778

FIGURE 3.13A Stem and Leaf Diagram

From Fig. 3.13, we are able to see the individual observations, as well as the shape of the data as a whole. Notice that there are many marks of exactly 40, whereas just one student obtains a mark between 35 and 40. One wonders if this has anything to do with the fact that 40 is a pass, and that the examiner has been generous to borderline students. This point would go unnoticed with a histogram.

3.4 SCATTER PLOTS

Plots of data are useful to investigate relationships between variables. To examine, for example, the relationship between the performance of students in Programming in Semesters 1 and 2, we could write

plot(prog1, prog2, xlab = "Programming Semester 1", ylab = "Programming Semester 2")

to obtain Fig. 3.14.

Figure 3.14A Scatter Plot

When more than two variables are involved, R provides a facility for producing scatter plots of all possible pairs.

To do this, first create a data frame of all the variables that you want to compare.

courses <- results[2:5]

This creates a data frame Probability with R - изображение 143 containing the second to the fifth variables in , that is, and Writing pairscourses or equivalently pairsresults25 will generate Fig - фото 146 . Writing

pairs(courses)

or equivalently

pairs(results[2:5])

will generate Fig. 3.15, which, as you can see, gives scatter plots for all possible pairs.

Figure 3.15Use of the Probability with R - изображение 148 Function

3.5 THE LINE OF BEST FIT

Returning to Fig. 3.14, we can see that there is a Probability with R - изображение 149 in these data. One variable increases with the other; not surprisingly, students doing well in Programming in Semester 1 are likely to do well also in Programming in Semester 2, and those doing badly in Semester 1 will tend to do badly in Semester 2. We might ask, if it is possible to estimate the Semester 2 results from those obtained in Semester 1.

In the case of the Programming subjects, we have a set of points ( картинка 150 , картинка 151 ), and having established, from the scatter plot, that a linear trend exists, we attempt to fit a line that best fits the data. In R

lm(prog2∼prog1)

calculates what is referred to as the linear model (lm) of on or simply the line that best fits the data The output is - фото 152 on , or simply the line