LibCat » Книги » Приключения » unrecognised » Philippe J. S. De Brouwer - The Big R-Book

Philippe J. S. De Brouwer - The Big R-Book

Здесь есть возможность читать онлайн «Philippe J. S. De Brouwer - The Big R-Book» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
The Big R-Book
Автор:
Philippe J. S. De Brouwer
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

The Big R-Book: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «The Big R-Book»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Introduces professionals and scientists to statistics and machine learning using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science.
The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in R Shows readers how to visualize results in static and interactive reports Supplementary materials includes PDF slides based on the book’s content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site
is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.

The Big R-Book — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «The Big R-Book», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

4.3.7.2 Ordering Factors

In the example about creating a factor-object for feedback one will have noticed that the plotfunction does show the labels in alphabetical order and not in an order that for us – humans – would be logical. It is possible to coerce a certain order in the labels by providing the levels – in the correct order – while creating the factor-object.

feedback <- c(‘Good’,‘Good’,‘Bad’,‘Average’,‘Bad’,‘Good’) factor_feedback <- factor(feedback, levels= c(“Bad”,“Average”,“Good”)) plot(factor_feedback)

In Figure 4.2on page 63 we notice that the order is now as desired (it is the order that we have provided via the attribute labelsin the function factor().

Generate Factors with the Function gl()

Function use for gl()

gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE) with

n: The number of levels

k: The number of replications (for each level)

length (optional): An integer giving the length of the result

labels (optional): A vector with the labels

ordered: A boolean variable indicating whether the results should be ordered.

gl()

gl(3,2,, c(“bad”,“average”,“good”),TRUE) ## [1] bad bad average average good good ## Levels: bad < average < good Figure 42 The factor objects appear now in a logical order Question 4 - фото 34

Figure 4.2 : The factor objects appear now in a logical order.

картинка 35 Question #4

Use the dataset mtcars (from the library MASS) and explore the distribution of number of gears. Then explore the correlation between gears and transmission.

картинка 36 Question #5

Then focus on the transmission and create a factor-object with the words “automatic” and “manual” instead of the numbers 0 and 1.

Use the ?mtcarsto find out the exact definition of the data.

mtcars

картинка 37 Question #6

Use the dataset mtcars (fromthe libraryMASS) and explore the distribution of the horsepower (hp). How would you proceed to make a factoring (e.g. Low, Medium, High) for this attribute? Hint: Use the function cut().

cut()

4.3.8 Data Frames

4.3.8.1 Introduction to Data Frames

Data frames are the prototype of all two-dimensional data (also known as “rectangular data”). For statistical analysis this is obviously an important data-type.

data frame

rectangular data

Data frames are very useful for statistical modelling; they are objects that contain data in a tabular way. Unlike a matrix in data frame each column can contain different types of data. For example, the first column can be factorial, the second logical, and the third numerical. It is a composite data type consisting of a list of vectors of equal length.

Data frames are created using the data.frame()function.

data.frame()

# Create the data frame.data_test <- data.frame( Name = c(“Piotr”, “Pawel”,“Paula”,“Lisa”,“Laura”), Gender = c(“Male”, “Male”,“Female”, “Female”,“Female”), Score = c(78,88,92,89,84), Age = c(42,38,26,30,35) ) print(data_test) ## Name Gender Score Age ## 1 Piotr Male 78 42 ## 2 Pawel Male 88 38 ## 3 Paula Female 92 26 ## 4 Lisa Female 89 30 ## 5 Laura Female 84 35 # The standard plot function on a data-frame (Figure 4.3) # with the pairs() function: plot(data_test)

pairs()

Figure 43 The standard plot for a data frame in R shows each column printed - фото 38

Figure 4.3 : The standard plot for a data frame in R shows each column printed in function of each other. This is useful to see correlations or how generally the data is structured.

4.3.8.2 Accessing Information from a Data Frame

Most data is rectangular, and in almost any analysis we will encounter data that is structured in a data frame. The following functions can be helpful to extract information from the data frame, investigate its structure and study the content.

summary()

head()

tail()

# Get the structure of the data frame: str(data_test) ## ‘data.frame’: 5 obs. of 4 variables: ## $ Name : Factor w/ 5 levels “Laura”,“Lisa”,..: 5 4 3 2 1 ## $ Gender: Factor w/ 2 levels “Female”,“Male”: 2 2 1 1 1 ## $ Score : num 78 88 92 89 84 ## $ Age : num 42 38 26 30 35 # Note that the names became factors (see warning below) # Get the summary of the data frame: summary(data_test) ## Name Gender Score Age ## Laura:1 Female:3 Min. :78.0 Min. :26.0 ## Lisa :1 Male :2 1st Qu.:84.0 1st Qu.:30.0 ## Paula:1 Median :88.0 Median :35.0 ## Pawel:1 Mean :86.2 Mean :34.2 ## Piotr:1 3rd Qu. :89.0 3rd Qu.:38.0 ## Max. :92.0 Max. :42.0 # Get the first rows: head(data_test) ## Name Gender Score Age ## 1 Piotr Male 78 42 ## 2 Pawel Male 88 38 ## 3 Paula Female 92 26 ## 4 Lisa Female 89 30 ## 5 Laura Female 84 35 # Get the last rows: tail(data_test) ## Name Gender Score Age ## 1 Piotr Male 78 42 ## 2 Pawel Male 88 38 ## 3 Paula Female 92 26 ## 4 Lisa Female 89 30 ## 5 Laura Female 84 35 # Extract the column 2 and 4 and keep all rowsdata_test.1 <-data_test[, c(2,4)] print(data_test.1) ## Gender Age ## 1 Male 42 ## 2 Male 38 ## 3 Female 26 ## 4 Female 30 ## 5 Female 35 # Extract columns by name and keep only selected rowsdata_test[ c(2 :4), c(2,4)] ## Gender Age ## 2 Male 38 ## 3 Female 26 ## 4 Female 30

картинка 39 Warning – Avoiding conversion to factors

The default behaviour of R is to convert strings to factors when a data.frame is created. Decades ago this was useful for performance reasons. Now, this is usually unwanted behaviour. a To avoid this put stringsAsFactors = FALSEin the data.frame()function.

d <- data.frame( Name = c(“Piotr”, “Pawel”,“Paula”,“Lisa”,“Laura”), Gender = c(“Male”, “Male”,“Female”, “Female”,“Female”), Score = c(78,88,92,89,84), Age = c(42,38,26,30,35), stringsAsFactors = FALSE ) d $Gender <- factor(d $Gender) # manually factorize gender str(d) ## ‘data.frame’: 5 obs. of 4 variables: ## $ Name : chr “Piotr” “Pawel” “Paula” “Lisa” … ## $ Gender: Factor w/ 2 levels “Female”,“Male”: 2 2 1 1 1 ## $ Score : num 78 88 92 89 84 ## $ Age : num 42 38 26 30 35