Chapter 6 Data Visualization in R

The tidyverse is a bit scary perhaps, when only getting into it. It is very quick to learn though, I promise. The best thing for me personally about the tidyverse, and perhaps about all of R, are the figures one can make with ggplot2. It really does not take that much to produce beautiful figures, that are basically publication-ready. It is not that difficult and the figures are much more beautiful than one could ever make with Excel or Statica.

6.1 ggplot2: The Grammar of Graphics

ggplot2 is the tidyverse package for data visualization, based on Leland Wilkinson’s “Grammar of Graphics.” Unlike base R plotting, ggplot2 builds plots layer by layer using a consistent grammar that makes complex visualizations intuitive and customizable.

The grammar of graphics breaks down any plot into fundamental components:

  • Data: The dataset you want to visualize
  • Aesthetics (aes): How variables map to visual properties (x, y, color, size, etc.)
  • Geometries (geom): The visual elements that represent data (points, lines, bars, etc.)
  • Statistics (stat): Statistical transformations of the data (counts, means, etc.)
  • Scales: How aesthetic mappings translate to visual values
  • Coordinate systems: How data maps to the plot area
  • Themes: Overall visual appearance

6.1.1 Basic ggplot2 Structure

Every ggplot follows this basic template:

# Basic template
ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) +
  <OTHER_LAYERS>

Let’s start with the gapminder dataset to explore these concepts:

library(ggplot2)
library(gapminder)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Preview the data
head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

6.2 Basic Plot Types

6.2.1 Scatter Plots with geom_point()

# Basic scatter plot
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

# Add color mapping
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point()

# Add size mapping
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, 
                           color = continent, size = pop)) +
  geom_point(alpha = 0.7)  # alpha controls transparency

6.2.2 Line Plots with geom_line()

# Line plot showing trends over time
gapminder %>%
  filter(country %in% c("United States", "China", "India", "Germany")) %>%
  ggplot(aes(x = year, y = lifeExp, color = country)) +
  geom_line(size = 1.2)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Multiple lines with points
gapminder %>%
  filter(continent == "Oceania") %>%
  ggplot(aes(x = year, y = gdpPercap, color = country)) +
  geom_line() +
  geom_point()

6.2.3 Bar Plots with geom_col() and geom_bar()

# geom_col: heights of bars represent values in data
gapminder %>%
  filter(year == 2007, continent == "Americas") %>%
  ggplot(aes(x = country, y = pop)) +
  geom_col() +
  coord_flip()  # Flip coordinates for better readability

# geom_bar: counts observations
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = continent)) +
  geom_bar()

# Stacked bar chart
gapminder %>%
  filter(year %in% c(1997, 2007)) %>%
  mutate(year = as.factor(year)) %>%
  ggplot(aes(x = continent, fill = year)) +
  geom_bar(position = "stack")

6.2.4 Histograms with geom_histogram()

# Basic histogram
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = lifeExp)) +
  geom_histogram(bins = 20, fill = "skyblue", color = "black")

# Histogram with different binwidths
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap)) +
  geom_histogram(binwidth = 5000, fill = "lightgreen", alpha = 0.7)

6.2.5 Box Plots with geom_boxplot()

# Basic box plot
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = continent, y = lifeExp)) +
  geom_boxplot()

# Box plot with jittered points
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = continent, y = lifeExp, fill = continent)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.2, alpha = 0.5)

6.3 Aesthetic Mappings

6.3.1 Understanding aes()

Aesthetics map variables to visual properties. You can set them globally (in ggplot()) or locally (in individual geom functions):

# Global aesthetic mappings
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
  geom_point(alpha = 0.7)

# Local aesthetic mappings
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point(aes(color = continent, size = pop), alpha = 0.7)

6.3.2 Fixed vs. Mapped Aesthetics

# WRONG: color inside aes() when you want a fixed color
# This creates a legend with one color
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = "blue")) +
  geom_point()

# CORRECT: fixed color outside aes()
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point(color = "blue")

# CORRECT: mapped color inside aes()
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point()

6.4 Faceting: Small Multiples

6.4.1 facet_wrap()

# Facet by one variable
gapminder %>%
  filter(year %in% c(1977, 1987, 1997, 2007)) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point() +
  facet_wrap(~ year)

# Control facet layout
gapminder %>%
  ggplot(aes(x = year, y = lifeExp, group = country)) +
  geom_line(alpha = 0.3) +
  facet_wrap(~ continent, nrow = 2)

6.4.2 facet_grid()

# Facet by two variables
gapminder %>%
  filter(year %in% c(1997, 2007), 
         continent %in% c("Americas", "Europe", "Asia")) %>%
  mutate(pop_category = ifelse(pop > 50000000, "Large", "Small")) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point() +
  facet_grid(pop_category ~ year)

6.5 Statistical Transformations

6.5.1 Smooth Lines with geom_smooth()

# Add trend line
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point(aes(color = continent), alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "black")
## `geom_smooth()` using formula = 'y ~ x'

# Separate trend lines by group
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

6.5.2 Statistical Summaries

# stat_summary for custom statistics
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = continent, y = lifeExp)) +
  stat_summary(fun = mean, geom = "point", size = 3, color = "red") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
  geom_jitter(alpha = 0.3, width = 0.2)

6.6 Scales: Controlling Aesthetic Mappings

6.6.1 Color Scales

# Manual color scale
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 2) +
  scale_color_manual(values = c("Africa" = "red", "Americas" = "blue", 
                               "Asia" = "green", "Europe" = "orange", 
                               "Oceania" = "purple"))

# Brewer color palette
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 2) +
  scale_color_brewer(type = "qual", palette = "Set2")

# Viridis color palette
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = pop)) +
  geom_point(size = 2) +
  scale_color_viridis_c()

6.6.2 Axis Scales

# Log scale for x-axis
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 2) +
  scale_x_log10()

# Custom breaks and labels
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 2) +
  scale_x_continuous(breaks = seq(0, 50000, 10000),
                     labels = paste0("$", seq(0, 50, 10), "K"))

# Limit axis ranges
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 2) +
  xlim(0, 30000) +
  ylim(40, 85)
## Warning: Removed 21 rows containing missing values or values outside the scale range
## (`geom_point()`).

6.7 Coordinates and Transformations

# Flip coordinates
gapminder %>%
  filter(year == 2007, continent == "Europe") %>%
  ggplot(aes(x = reorder(country, lifeExp), y = lifeExp)) +
  geom_col() +
  coord_flip()

# Polar coordinates (for pie charts)
gapminder %>%
  filter(year == 2007) %>%
  count(continent) %>%
  ggplot(aes(x = "", y = n, fill = continent)) +
  geom_col() +
  coord_polar(theta = "y")

# Fixed aspect ratio
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  coord_fixed(ratio = 300)  # 300 GDP per capita = 1 year life expectancy

6.8 Labels and Themes

6.8.1 Adding Labels

# Basic labels
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 2) +
  labs(
    title = "Life Expectancy vs GDP Per Capita (2007)",
    subtitle = "Each point represents a country",
    x = "GDP per capita (USD)",
    y = "Life expectancy (years)",
    color = "Continent",
    caption = "Data source: Gapminder"
  )

6.8.2 Theme Customization

# Built-in themes
p <- gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 2)

# Different theme options
p + theme_minimal()

p + theme_classic()

p + theme_dark()

p + theme_void()

# Custom theme modifications
p +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.text = element_text(size = 12),
    legend.position = "bottom",
    panel.grid.minor = element_blank()
  )

6.9 Heatmaps

Coming soon …

6.10 Combining Multiple Geoms

# Combining points and lines
gapminder %>%
  filter(continent == "Americas") %>%
  group_by(year) %>%
  summarise(avg_life_exp = mean(lifeExp), .groups = 'drop') %>%
  ggplot(aes(x = year, y = avg_life_exp)) +
  geom_line(size = 1.2, color = "blue") +
  geom_point(size = 3, color = "red") +
  labs(title = "Average Life Expectancy in Americas Over Time",
       y = "Average Life Expectancy")

# Error bars with points
gapminder %>%
  filter(year == 2007) %>%
  group_by(continent) %>%
  summarise(
    mean_life = mean(lifeExp),
    se_life = sd(lifeExp) / sqrt(n()),
    .groups = 'drop'
  ) %>%
  ggplot(aes(x = continent, y = mean_life)) +
  geom_col(fill = "lightblue", alpha = 0.7) +
  geom_errorbar(aes(ymin = mean_life - se_life, 
                   ymax = mean_life + se_life),
                width = 0.2)

6.11 Text and Annotations

# Add text labels to points
gapminder %>%
  filter(year == 2007, continent == "Europe") %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_text(aes(label = country), size = 3, hjust = -0.1)

# Better: use geom_text_repel to avoid overlapping
library(ggrepel)
gapminder %>%
  filter(year == 2007, continent == "Europe") %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_text_repel(aes(label = country), size = 3)

6.12 Practical Example: Complex Multi-layered Plot

# Complex visualization combining multiple elements
complex_plot <- gapminder %>%
  filter(year %in% c(1977, 1987, 1997, 2007)) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  
  # Add points with continent color and population size
  geom_point(aes(color = continent, size = pop), alpha = 0.7) +
  
  # Add trend line
  geom_smooth(method = "lm", se = FALSE, color = "black", size = 0.5) +
  
  # Facet by year
  facet_wrap(~ year, nrow = 2) +
  
  # Use log scale for x-axis
  scale_x_log10(breaks = c(500, 2000, 8000, 32000),
                labels = c("$500", "$2,000", "$8,000", "$32,000")) +
  
  # Customize colors
  scale_color_brewer(type = "qual", palette = "Set2") +
  
  # Adjust size scale
  scale_size_continuous(range = c(1, 8), guide = "none") +
  
  # Add labels
  labs(
    title = "The Relationship Between Wealth and Health Over Time",
    subtitle = "GDP per capita vs Life Expectancy (1977-2007)",
    x = "GDP per capita (log scale)",
    y = "Life expectancy (years)",
    color = "Continent",
    caption = "Point size represents population size. Data: Gapminder"
  ) +
  
  # Customize theme
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 11),
    strip.text = element_text(size = 10, face = "bold"),
    legend.position = "bottom"
  )

complex_plot
## `geom_smooth()` using formula = 'y ~ x'

6.13 ggplot2 Function Summary Table

Function Category Description Example
ggplot() Core Initialize a plot with data and aesthetics ggplot(data, aes(x = var1, y = var2))
aes() Core Define aesthetic mappings aes(x = gdp, y = life, color = continent)
geom_point() Geoms Add points (scatter plot) geom_point(size = 2, alpha = 0.7)
geom_line() Geoms Add lines geom_line(size = 1.2)
geom_col() Geoms Add bars with heights from data geom_col(fill = "blue")
geom_bar() Geoms Add bars that count observations geom_bar()
geom_histogram() Geoms Add histogram geom_histogram(bins = 20)
geom_boxplot() Geoms Add box plot geom_boxplot()
geom_density() Geoms Add density curve geom_density(alpha = 0.5)
geom_violin() Geoms Add violin plot geom_violin()
geom_tile() Geoms Add tiles (heatmap) geom_tile()
geom_smooth() Geoms Add smoothed trend line geom_smooth(method = "lm")
geom_text() Geoms Add text labels geom_text(aes(label = country))
geom_jitter() Geoms Add jittered points geom_jitter(width = 0.2)
facet_wrap() Facets Create subplots in a wrapped layout facet_wrap(~ variable)
facet_grid() Facets Create subplots in a grid facet_grid(var1 ~ var2)
scale_x_continuous() Scales Customize continuous x-axis scale_x_continuous(breaks = c(1,2,3))
scale_x_log10() Scales Use log10 scale for x-axis scale_x_log10()
scale_color_manual() Scales Manually set colors scale_color_manual(values = c("red", "blue"))
scale_color_brewer() Scales Use ColorBrewer palette scale_color_brewer(palette = "Set2")
scale_color_viridis_c() Scales Use Viridis color scale (continuous) scale_color_viridis_c()
xlim() / ylim() Scales Set axis limits xlim(0, 100)
coord_flip() Coordinates Flip x and y coordinates coord_flip()
coord_polar() Coordinates Use polar coordinates coord_polar(theta = "y")
coord_fixed() Coordinates Fix aspect ratio coord_fixed(ratio = 1)
labs() Labels Add/modify plot labels labs(title = "My Plot", x = "X axis")
ggtitle() Labels Add plot title ggtitle("My Plot Title")
xlab() / ylab() Labels Add axis labels xlab("X axis label")
theme() Themes Customize plot appearance theme(legend.position = "bottom")
theme_minimal() Themes Apply minimal theme theme_minimal()
theme_classic() Themes Apply classic theme theme_classic()
theme_dark() Themes Apply dark theme theme_dark()
annotate() Annotations Add annotations to plot annotate("text", x = 1, y = 2, label = "Note")

6.14 Tips and Tricks

Colors A good colorscale can make an enormous difference. For discrete data I like to use the scales Paired, Dark2 and for continuous data I like using the inferno, magma and coolwarm palettes. Make sure the scale you choose is intuitive and easy to interpret. When making figures ready for shared or published work, make sure that it is colorblind-friendly (the viridis color palettes are especially chosen for this). And when preparing for a presentation always make sure that your figure would still be readable if the resolution is low (for example because of an old projector). Be prepared! For more information on color scales, have a look at this website.

Scales and Axes Always label your axes clearly and use units when relevant. If one variable spans many orders of magnitude, consider using a log scale. Also check that your axis limits don’t exaggerate or minimize effects (avoid “truncating” axes unless you have a good reason).

Faceting When comparing groups or categories, faceting (facet_wrap or facet_grid) is often clearer than putting everything into one crowded plot. It makes patterns easier to see across groups.

Annotations Don’t be afraid to annotate! Adding text labels, arrows, or lines (like geom_text, geom_label, geom_vline) can help guide the reader to the most important takeaways.

Themes Use themes (theme_minimal(), theme_classic(), etc.) to quickly adjust the overall look. If you’re preparing figures for teaching or publication, setting a consistent theme across all your figures improves readability. The cool thing is that some very interesting journals and platforms have their own ggplot-themes to be used in R. For this, have a look at the package ggthemes which includes theme_economist() and theme_wj() (Wall Street Journal), among others.

Clarity in Legends Place legends where they don’t overlap with the data, and make sure the labels are descriptive. Sometimes it’s even better to replace the legend entirely with direct labeling inside the plot.

6.15 Conclusion

ggplot2’s grammar of graphics provides a powerful and consistent framework for creating visualizations. The key principles to remember are:

  1. Build plots layer by layer - start with data and aesthetics, then add geoms, scales, and themes
  2. Map variables to aesthetics - use aes() to connect your data to visual properties
  3. Choose appropriate geoms - different geoms for different types of data and relationships
  4. Use faceting - create small multiples to show patterns across groups
  5. Customize with scales and themes - fine-tune appearance and mapping of aesthetic properties

By mastering these concepts and functions, you can create publication-ready visualizations that effectively communicate insights from your data.

6.16 My favorite figures I ever made

While in my internship, I was able to work with a huge dataset on detected enteroviruses in clinical settings all over Europe. I collected, harmonized, analyzed and visualized all that data which even got published! The first figure displays the number of detections per month throughout the study periods (with summer months in red) (fig A) and the relative intensity of specific virus types. A light color means that in that specific month, that specific enterovirus type was very common (fig B). The second figure shows a classic epicurve per specific virus type (in red) with the total number of detected enteroviruses per month in grey (Schrijver et al., 2025).

Caption text
Caption text
Caption text
Caption text