$version.string version
[1] "R version 4.4.1 (2024-06-14 ucrt)"
Descriptive Statistics
Kyiv School of Economics
Also you can use:
1RStudio.Version()$version
[1] ‘2023.12.1.402’
.
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
library(gganimate)
library(gapminder)
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)) +
geom_point(alpha = 0.7, show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
# Here comes the gganimate specific bits
labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
transition_time(year) +
ease_aes('linear')
library(kableExtra)
mpg_list <- split(mtcars$mpg, mtcars$cyl)
disp_list <- split(mtcars$disp, mtcars$cyl)
inline_plot <- data.frame(cyl = c(4, 6, 8), mpg_box = "", mpg_hist = "",
mpg_line1 = "", mpg_line2 = "",
mpg_points1 = "", mpg_points2 = "", mpg_poly = "")
inline_plot %>%
kbl(booktabs = TRUE) %>%
kable_paper(full_width = FALSE) %>%
column_spec(2, image = spec_boxplot(mpg_list)) %>%
column_spec(3, image = spec_hist(mpg_list)) %>%
column_spec(4, image = spec_plot(mpg_list, same_lim = TRUE)) %>%
column_spec(5, image = spec_plot(mpg_list, same_lim = FALSE)) %>%
column_spec(6, image = spec_plot(mpg_list, type = "p")) %>%
column_spec(7, image = spec_plot(mpg_list, disp_list, type = "p")) %>%
column_spec(8, image = spec_plot(mpg_list, polymin = 5))
cyl | mpg_box | mpg_hist | mpg_line1 | mpg_line2 | mpg_points1 | mpg_points2 | mpg_poly |
---|---|---|---|---|---|---|---|
4 | |||||||
6 | |||||||
8 |
🔗Source: Movie explorer
library(leaflet)
content <- paste(sep = "<br/>",
"<b><a href='https://kse.ua/ua/'>Kyiv School of Economics</a></b>",
"Mykoly Shpaka St, 3",
"Kyiv, Ukraine"
)
leaflet() %>%
addTiles() %>%
addMarkers(lng = 30.4298435, lat = 50.4584603, popup = content)
[1] 3
[1] -1
[1] 2.5
[1] 8
[1] 6
[1] 1
[1] 40
[1] FALSE
[1] FALSE
[1] TRUE
[1] TRUE
You can read more about logical operators and types here and here.
Much like standard arithmetic, logic statements follow a strict order of precedence. Logical operators (>
, ==
, etc) are evaluated before Boolean operators (&
and |
). Failure to recognise this can lead to unexpected behaviour…
What’s happening here is that R is evaluating two separate “logical” statements:
1 > 0.5
, which is is obviously TRUE.2
, which is TRUE(!) because R is “helpfully” converting it to as.logical(2)
.Solution: Be explicit about each component of your logic statement(s).
!
We use !
as a short hand for negation. This will come in very handy when we start filtering data objects based on non-missing (i.e. non-NA) observations.
Value matching: %in%
To see whether an object is contained within (i.e. matches one of) a list of items, use %in%
.
There’s no equivalent “not in” command, but how might we go about creating one?
Evaluation
We’ll get to assignment shortly. However, to preempt it somewhat, we always use two equal signs for logical evaluation.
Evaluation caveat: Floating-point numbers
What do you think will happen if we evaluate 0.1 + 0.2 == 0.3
?
Problem: Computers represent numbers as binary (i.e. base 2) floating-points. More here.
In R, we can use either <-
or =
to handle assignment.
Assignment with <-
:
<-
is normally read aloud as “gets”. You can think of it as a (left-facing) arrow saying assign in this direction.
Assignment with =
You can also use =
for assignment.
Which assignment operator to use?
Most R users seem to prefer <-
for assignment, since =
also has specific role for evaluation within functions.
Bottom line: Use whichever you prefer. Just be consistent.
For more information on a (named) function or object in R, consult the “help” documentation. For example:
Or, more simply, just use ?
:
Or, just use F1
.
Aside 1: Comments in R are demarcated by #
.
Ctrl+Shift+c
in RStudio to (un)comment whole sections of highlighted code.Aside 2: See the Examples section at the bottom of the help file?
example()
function. Try it: example(plot)
.Vignettes
For many packages, you can also try the vignette()
function, which will provide an introduction to a package and it’s purpose through a series of helpful examples.
vignette("dplyr")
in your console now.I highly encourage reading package vignettes if they are available.
One complication is that you need to know the exact name of the package vignette(s).
dplyr
package actually has several vignettes associated with it: “dplyr”, “window-functions”, “programming”, etc.vignette()
(i.e. without any arguments) to list the available vignettes of every installed package installed on your system.vignette(all = FALSE)
if you only want to see the vignettes of any loaded packages.Similar to vignettes, many packages come with built-in, interactive demos.
To list all available demos on your system:
We’ve seen that we can assign objects to different names. However, there are a number of special words that are “reserved” in R.
See here for a full list, including (but not limited to):
In addition to the list of strictly reserved words, there is a class of words and strings that I am going to call “semi-reserved”.
pi
) that you can re-assign if you really wanted to… but already come with important meanings from base R.Arguably the most important semi-reserved character is c()
, which we use for concatenation; i.e. creating vectors and binding different objects together.
Vectors are very important in R, because the language has been optimised for them. Don’t worry about this now; later you’ll learn what I mean by “vectorising” a function.
(Continued from previous slide.)
In this case, thankfully nothing. R is “smart” enough to distinguish between the variable c = 4
that we created and the built-in function c()
that calls for concatenation.
However, this is still extremely sloppy coding. R won’t always be able to distinguish between conflicting definitions. And neither will you. For example:
Bottom line: Don’t use (semi-)reserved characters!
A similar issue crops up when we load two packages, which have functions that share the same name. E.g. Look what happens we load the dplyr
package.
The messages that you see about some object being masked from ‘package:X’ are warning you about a namespace conflict.
dplyr
and the stats
package (which gets loaded automatically when you start R) have functions named “filter” and “lag”.The potential for namespace conflicts is a result of the OOP approach1.
Whenever a namespace conflict arises, the most recently loaded package will gain preference. So the filter()
function now refers specifically to the dplyr
variant.
But what if we want the stats
variant? Well, we have two options:
stats::filter()
filter = stats::filter
package::function()
We can explicitly call a conflicted function from a particular package using the package::function()
syntax. For example:
Time Series:
Start = 1
End = 10
Frequency = 1
[1] 3 5 7 9 11 13 15 17 19 NA
We can also use ::
for more than just conflicted cases.
dplyr::starwars ## Print the starwars data frame from the dplyr package
scales::comma(c(1000, 1000000)) ## Use the comma function, which comes from the scales package
The ::
syntax also means that we can call functions without loading package first. E.g. As long as dplyr
is installed on our system, then dplyr::filter(iris, Species=="virginica")
will work.
function <- package::function
A more permanent solution is to assign a conflicted function name to a particular package. This will hold for the remainder of your current R session, or until you change it back. E.g.
conflict_prefer()
A final thing to say about namespace conflicts is that they don’t only arise from loading packages. They can arise when users create their own functions with a conflicting name.
In a similar vein, one of the most common and confusing errors that even experienced R programmers run into is related to the habit of calling objects “df” or “data”… both of which are functions in base R!
See for yourself by typing ?df
or ?data
.
numeric
integer
double
character
logical
factor
Date
numeric
is the most common data type in R.integer
or a double
(i.e. a floating-point number).character
is used for text data.logical
is used for binary data.factor
is used for categorical data.race <- factor(
c("istari", "human", "human",
"elf", "dwarf", "hobbit",
"hobbit", "hobbit", "hobbit"),
levels = c("istari", "human", "elf", "dwarf", "hobbit")
)
race
[1] istari human human elf dwarf hobbit hobbit hobbit hobbit
Levels: istari human elf dwarf hobbit
lotr_books <- factor(c("The Fellowship of the Ring",
"The Return of the King",
"The Two Towers"),
levels = c("The Fellowship of the Ring",
"The Two Towers",
"The Return of the King"),
ordered = TRUE)
lotr_books
[1] The Fellowship of the Ring The Return of the King
[3] The Two Towers
3 Levels: The Fellowship of the Ring < ... < The Return of the King
c()
.Type of Coercion:
NULL < raw < logical < integer < double < complex < character < list < expression
Type of Coercion:
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
Popular functions and operators:
rbind()
and cbind()
dim()
rownames()
and colnames()
t()
%*%
det()