February 13, 2020
This presentation is based on ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham
#install.packages("ggplot2") library(ggplot2) mpg <- mpg
mpg is a dataset with fuel economy data from 1999 and 2008 for 38 popular models of car
All plots are composed of:
All plots are composed of:
You will always need to specify:
Data and aesthetic mappings are supplied in ggplot()
Then layers are added on with +
ggplot(mpg, aes(displ, hwy)) + geom_point()
Almost every plot maps a variable to x and y, so the first two unnamed arguments to aes()
will be mapped to x and y and you don’t need to specify those argument names
Remember: Layers are made up of:
Each geom has a set of aesthetics and stats that it understands.
ggplot(mpg, aes(displ)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(mpg, aes(displ)) + geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(mpg, aes(displ)) + geom_density()
ggplot(mpg, aes(drv)) + geom_bar()
ggplot(mpg, aes(displ, cty)) + geom_point()
ggplot(mpg, aes(displ, cty)) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(economics[1:100,], aes(date, unemploy)) + geom_area()
ggplot(economics[1:100,], aes(date, unemploy)) + geom_line()
ggplot(economics[1:100,], aes(date, unemploy)) + geom_step()
ggplot(mpg, aes(drv, cty)) + geom_bar(stat = "identity")
ggplot(mpg, aes(drv, cty)) + geom_boxplot()
ggplot(mpg, aes(drv, cty)) + geom_violin()
ggplot(mpg, aes(displ, cty)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
You can add geoms on top of each other
This becomes helpful when adding information about error.
See geom_crossbar()
, geom_errorbar()
, geom_linerangebar()
, and geom_pointrange()
ggplot(mpg, aes(drv, cty)) + geom_point() + stat_summary(fun.y = "median", color = "red", size = 6, geom = "point")
## Warning: `fun.y` is deprecated. Use `fun` instead.
Stats can be used when you need to do a statistical transformation of the data that a geom can’t already do
Stat_summary
is the most common
Stat functions and geom functions both combine a stat with a geom to make a layer
Remember: Aesthetic mappings describe how variables in the data are mapped to aesthetic attributes
An aesthetic can be mapped to a variable or set to a constant:
geom_()
.ggplot(mpg, aes(displ, cty)) + geom_point(color = "blue")
aes()
in ggplot()
ggplot(mpg, aes(displ, cty, color = class)) + geom_point()
ggplot(mpg, aes(displ, cty, color = hwy)) + geom_point()
Good for continuous and categorical variables
Showed example with categorical variable on last slide, so here is an example with a continuous variable
Fill example:
ggplot(mpg, aes(displ, cty, fill = drv)) + geom_hex()
Good for categorical variables
ggplot(mpg, aes(displ, cty, shape = drv)) + geom_point(size = 4)
Good for continuous variables
ggplot(mpg, aes(displ, cty, size = hwy)) + geom_point()
Good for categorical variables
ggplot(mpg, aes(displ, cty, linetype = drv)) + geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Good for categorical variables
Use with geom_text()
ggplot(mpg, aes(displ, cty, label = drv)) + geom_text()
Common constant aesthetic attributes are the common aesthetic attributes for variables and alpha
Good for overlapping data
ggplot(mpg, aes(displ, fill = drv)) + geom_density(alpha = 0.4)
An alternative to using aesthetics to map properties of the data is to use facetting
Remember: facetting describes how to break up the data into subsets and how to display those subsets
facet_wrap()
: “wraps” a 1d ribbon of panels into 2dfacet_grid()
: produces a 2d grid of panels defined by variables which form the rows and columnsggplot(mpg, aes(displ, cty)) + geom_point() + facet_wrap(~class)
You can control how the ribbon is wrapped into a grid with the following arguments:
ncol
and nrow
control how many columns and rows (you only need to set one)as.table
controls how the facets are laid out
TRUE
: with highest values at the bottom-rightFALSE
: with the highest values at the top-rightdir
controls the direction of wrap: horizontal or verticalfacet_grid()
lays out plots in a 2d grid, as defined by a formula:
. ~ a
spreads the values of a
across the columns. This direction facilitates comparisons of y position, because the vertical scales are aligned.
b ~ .
spreads the values of b
down the rows. This direction facilitates comparison of x position because the horizontal scales are aligned. This makes it particularly useful for comparing distributions.
a ~ b
spreads a across columns and b down rows.
ggplot(mpg, aes(displ, cty)) + geom_point() + facet_grid(. ~ cyl)
ggplot(mpg, aes(displ, cty)) + geom_point() + facet_grid(drv ~ .)
ggplot(mpg, aes(displ, cty)) + geom_point() + facet_grid(drv ~ cyl)
Remember: Scales :
Use scale_()
functions to adjust:
See:
scale_x_continuous()
scale_x_discrete()
scale_fill_gradient()
There are around 40 unique elements that control the appearance of the plot
They can be roughly grouped into five categories:
Some elements affect the plot as a whole:
plot.background
(set with element_rect()
)plot.title
(set with element_text()
)plot.margin
(set with margin()
)axis.line
and axis.ticks
are set with element_line()
axis.ticks.length
is set with unit()
axis.text
, axis.text.x
, axis.text.y
, axis.title
, axis.title.x
, and axis.title.y
, are set with element_text()
The legend elements control the apperance of all legends. You can also modify the appearance of individual legends by modifying the same elements in guide_legend()
or guide_colourbar()
legend.text.align
and legend.title.align
are set with a number from 0 to 1legend.text
and legend.title
are set with element_text()
legend.background
and legend.key
are set with element_rect()
legend.key.size
, legend.key.height
, legend.key.width
, and legend.margin
are set with unit()
There are four other properties that control how legends are laid out in the context of the plot (legend.position
, legend.direction
,legend.justification
, and legend.box
).
aspect.ratio
is set with a numeric value,panel.background
and panel.border
are set with element_rect()
panel.grid.major
, panel.grid.major.x
, panel.grid.major.y
, panel.grid.minor
, panel.grid.minor.x
, and panel.grid.minor.y
are set with element_line()
The main difference between panel.background
and panel.border
is that the background is drawn underneath the data, and the border is drawn on top of it. For that reason, you’ll always need to assign fill = NA
when overriding panel.border
Note that aspect ratio controls the aspect ratio of the panel, not the overall plot
panel.margin
, panel.margin.x
, and panel.margin.y
are set with unit()
strip.background
is set with element_rect()
strip.text
, strip.text.x
, and strip.text.y
, are set with element_text()
strip.text.x
affects both facet_wrap()
or facet_grid()
; strip.text.y
only affects facet_grid()
Coordinate systems (e.g., maps - see coord_map
, coord_polar()
, and coord_trans()
)
Position adjustments (e.g., jittering points, bars on top of each other or side-by-side - see position argument of geom_bar()
and geom_point()
)
Many smaller details - See ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham