2 A tibble will report more errors instead of doing something silently (data type conversions, import, etc.), so they are safer to use.
3 The specific print function for the tibble, print.tibble(), will not overrun your screen with thousands of lines, it reports only on the ten first. If you need to see all columns, then the traditional head(tibble) will still work, or you can tweak the behaviour of the print function via the function options().print()head()
4 The name of the class itself is not confusing. Where the function print.data.frame() potentially can be the specific method for the print function for a data.frame, it can also be the specific method for the print.data function for a frame object. The name of the class tibble does not use the dot and hence cannot be confusing.
To illustrate some of these differences, consider the following code:
# -- data frame --df <- data.frame(“value” = pi, “name” = “pi”) df $na # partial matching of column names## [1] pi ## Levels: pi # automatic conversion to factor, plus data frame # accepts strings:df[,“name”] ## [1] pi ## Levels: pi df[, c(“name”, “value”)] ## name value ## 1 pi 3.141593 # -- tibble --df <- tibble(“value” = pi, “name” = “pi”) df $name # column name## [1] “pi” df $nam # no partial matching but error msg.## Warning: Unknown or uninitialised column: ‘nam’. ## NULL df[,“name”] # this returns a tibble (no simplification)## # A tibble: 1 x 1 ## name ## ## 1 pi df[, c(“name”, “value”)] # no conversion to factor## # A tibble: 1 x 2 ## name value ## ## 1 pi 3.14
This partial matching is one of the nicer functions of R, and certainly was an advantage for interactive use. However when using R in batch mode, thismight be dangerous. Partialmatching is especially dangerous in a corporate environment: datasets can have hundreds of columns and many names look alike, e.g. BAL180801, BAL180802, and BAL180803. Till a certain point it is safe to use partial matching since it will only work when R is sure that it can identify the variable uniquely. But it is bound to happen that you create new rows and suddenly someone else's code will stop working (because now R got confused).
Digression – Changing how a tibble is printed
To adjust the default behaviour of print on a tibble, run the function options
as follows:
options(
tibble.print_max=n, # If there are more than n
tibble.print_min=m, # rows, only print the m first
# (set n to Inf to show all)
tibble.width = l # max nbr of columns to print
# (set to Inf to show all)
)
options()
Tibbles are also data frames, and most older functions – that are unaware of tibbles – will work just fine. However, it may happen that some function would not work. If that happens, it is possible to coerce the tibble back into data frame with the function as.data.frame()
.
tb <- tibble( c(“a”, “b”, “c”), c(1,2,3), 9L,9) is.data.frame(tb) ## [1] TRUE # Note also that tibble did no conversion to factors, and # note that the tibble also recycles the scalars:tb ## # A tibble: 3 x 4 ## `c(“a”, “b”, “c”)` `c(1, 2, 3)` `9L` `9` ## ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # Coerce the tibble to data-frame: as.data.frame(tb) ## c(“a”, “b”, “c”) c(1, 2, 3) 9L 9 ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # A tibble does not recycle shorter vectors, so this fails:fail <- tibble( c(“a”, “b”, “c”), c(1,2)) ## Error: Tibble columns must have consistent lengths, only values of length one are recycled: ## * Length 2: Column ‘c(1, 2)’ ## * Length 3: Column ‘c(“a”, “b”, “c”)’ # That is a major advantage and will save many programming errors.
Hint – Viewing the content of a tibble
The function view(tibble)
works as expected and is most useful when working with RStudio where it will open the tibble in a special tab.
While on the surface a tibble does the same as a data.frame, they have some crucial advantages and we warmly recommend to use them.
This section is not about creating beautiful music, it explains an argument passing system in R. Similar to the pipe in Linux, the pipe operator, |
, the operator %>%
from the package magrittr
allows to pass the output of one line to the first argument of the function on the next line. 11
pipe
magrittr
% > %
When writing code, it is common to work on one object for a while. For example, when we need to import data, then work with that data to clean it, add columns, delete some, summarize data, etc.
To start, consider a simple example:
t <- tibble(“x” = runif(10)) t <- within(t, y <-2 *x +4 + rnorm(10, mean=0,sd=0.5))
This can also be written with the piping operator from magrittr
t <- tibble(“x” = runif(10)) %>% within(y <-2 *x +4 + rnorm(10, mean=0,sd=0.5))
What R does behind the scenes, is feeding the output left of the pipe operator as main input right of the pipe operator. This means that the following are equivalent:
# 1. pipe:a %>% f() # 2. pipe with shortened function:a %>%f # 3. is equivalent with: f(a)
a <- c(1:10)
a %>% mean()
## [1] 5.5
a %>% mean
## [1] 5.5
mean(a)
## [1] 5.5
Hint – Pronouncing the pipe
It might be useful to pronounce the pipe operator, %>%
as “then” to understand what it does.
Note – Equivalence of piping and nesting
# The following line
c <- a %>%
f()
# is equivalent with:
c <- f(a)
# Also, it is easy to see that
x <- a %>% f(y) %>% g(z)
# is the same as:
x <- g(f(a, y), z)
7.3.3 Attention Points When Using the Pipe
This construct will get into problems for functions that use lazy evaluation. Lazy evaluation is a feature of R that is introduced in R to make it faster in interactive mode. This means that those functions will only calculate their arguments when they are really needed. There is of course a good reason why those functions have lazy evaluation and the reader will not be surprised that they cannot be used in a pipe. So there are many functions that use lazy evaluation, but most notably are the error handlers. These are functions that try to do something, but when an error is thrown or a warning message is generated, they will hand it over to the relevant handler. Examples are try
, tryCatch
, etc. We do not really discuss error handling in any other parts of this book, so here is a quick primer.
Читать дальше