Philippe J. S. De Brouwer - The Big R-Book

Здесь есть возможность читать онлайн «Philippe J. S. De Brouwer - The Big R-Book» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

The Big R-Book: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «The Big R-Book»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Introduces professionals and scientists to statistics and machine learning using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science. 
The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in R Shows readers how to visualize results in static and interactive reports Supplementary materials includes PDF slides based on the book’s content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site
is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.

The Big R-Book — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «The Big R-Book», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

2 A tibble will report more errors instead of doing something silently (data type conversions, import, etc.), so they are safer to use.

3 The specific print function for the tibble, print.tibble(), will not overrun your screen with thousands of lines, it reports only on the ten first. If you need to see all columns, then the traditional head(tibble) will still work, or you can tweak the behaviour of the print function via the function options().print()head()

4 The name of the class itself is not confusing. Where the function print.data.frame() potentially can be the specific method for the print function for a data.frame, it can also be the specific method for the print.data function for a frame object. The name of the class tibble does not use the dot and hence cannot be confusing.

To illustrate some of these differences, consider the following code:

# -- data frame --df <- data.frame(“value” = pi, “name” = “pi”) df $na # partial matching of column names## [1] pi ## Levels: pi # automatic conversion to factor, plus data frame # accepts strings:df[,“name”] ## [1] pi ## Levels: pi df[, c(“name”, “value”)] ## name value ## 1 pi 3.141593 # -- tibble --df <- tibble(“value” = pi, “name” = “pi”) df $name # column name## [1] “pi” df $nam # no partial matching but error msg.## Warning: Unknown or uninitialised column: ‘nam’. ## NULL df[,“name”] # this returns a tibble (no simplification)## # A tibble: 1 x 1 ## name ## ## 1 pi df[, c(“name”, “value”)] # no conversion to factor## # A tibble: 1 x 2 ## name value ## ## 1 pi 3.14

This partial matching is one of the nicer functions of R, and certainly was an advantage for interactive use. However when using R in batch mode, thismight be dangerous. Partialmatching is especially dangerous in a corporate environment: datasets can have hundreds of columns and many names look alike, e.g. BAL180801, BAL180802, and BAL180803. Till a certain point it is safe to use partial matching since it will only work when R is sure that it can identify the variable uniquely. But it is bound to happen that you create new rows and suddenly someone else's code will stop working (because now R got confused).

Digression – Changing how a tibble is printed

To adjust the default behaviour of print on a tibble, run the function optionsas follows:

options(

tibble.print_max=n, # If there are more than n

tibble.print_min=m, # rows, only print the m first

# (set n to Inf to show all)

tibble.width = l # max nbr of columns to print

# (set to Inf to show all)

)

options()

Tibbles are also data frames, and most older functions – that are unaware of tibbles – will work just fine. However, it may happen that some function would not work. If that happens, it is possible to coerce the tibble back into data frame with the function as.data.frame().

tb <- tibble( c(“a”, “b”, “c”), c(1,2,3), 9L,9) is.data.frame(tb) ## [1] TRUE # Note also that tibble did no conversion to factors, and # note that the tibble also recycles the scalars:tb ## # A tibble: 3 x 4 ## `c(“a”, “b”, “c”)` `c(1, 2, 3)` `9L` `9` ## ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # Coerce the tibble to data-frame: as.data.frame(tb) ## c(“a”, “b”, “c”) c(1, 2, 3) 9L 9 ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # A tibble does not recycle shorter vectors, so this fails:fail <- tibble( c(“a”, “b”, “c”), c(1,2)) ## Error: Tibble columns must have consistent lengths, only values of length one are recycled: ## * Length 2: Column ‘c(1, 2)’ ## * Length 3: Column ‘c(“a”, “b”, “c”)’ # That is a major advantage and will save many programming errors.

картинка 99Hint – Viewing the content of a tibble

The function view(tibble)works as expected and is most useful when working with RStudio where it will open the tibble in a special tab.

While on the surface a tibble does the same as a data.frame, they have some crucial advantages and we warmly recommend to use them.

7.3.2 Piping with R

This section is not about creating beautiful music, it explains an argument passing system in R. Similar to the pipe in Linux, the pipe operator, |, the operator %>%from the package magrittrallows to pass the output of one line to the first argument of the function on the next line. 11

pipe

magrittr

% > %

When writing code, it is common to work on one object for a while. For example, when we need to import data, then work with that data to clean it, add columns, delete some, summarize data, etc.

To start, consider a simple example:

t <- tibble(“x” = runif(10)) t <- within(t, y <-2 *x +4 + rnorm(10, mean=0,sd=0.5))

This can also be written with the piping operator from magrittr

t <- tibble(“x” = runif(10)) %>% within(y <-2 *x +4 + rnorm(10, mean=0,sd=0.5))

What R does behind the scenes, is feeding the output left of the pipe operator as main input right of the pipe operator. This means that the following are equivalent:

# 1. pipe:a %>% f() # 2. pipe with shortened function:a %>%f # 3. is equivalent with: f(a)

Example: – Pipe operator

a <- c(1:10)

a %>% mean()

## [1] 5.5

a %>% mean

## [1] 5.5

mean(a)

## [1] 5.5

картинка 100Hint – Pronouncing the pipe

It might be useful to pronounce the pipe operator, %>%as “then” to understand what it does.

картинка 101Note – Equivalence of piping and nesting

# The following line

c <- a %>%

f()

# is equivalent with:

c <- f(a)

# Also, it is easy to see that

x <- a %>% f(y) %>% g(z)

# is the same as:

x <- g(f(a, y), z)

7.3.3 Attention Points When Using the Pipe

This construct will get into problems for functions that use lazy evaluation. Lazy evaluation is a feature of R that is introduced in R to make it faster in interactive mode. This means that those functions will only calculate their arguments when they are really needed. There is of course a good reason why those functions have lazy evaluation and the reader will not be surprised that they cannot be used in a pipe. So there are many functions that use lazy evaluation, but most notably are the error handlers. These are functions that try to do something, but when an error is thrown or a warning message is generated, they will hand it over to the relevant handler. Examples are try, tryCatch, etc. We do not really discuss error handling in any other parts of this book, so here is a quick primer.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «The Big R-Book»

Представляем Вашему вниманию похожие книги на «The Big R-Book» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «The Big R-Book»

Обсуждение, отзывы о книге «The Big R-Book» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x