Session 1
Introduction to R & Data Visualization
Session Contents
  • Install R and RStudio
  • RStudio interface overview
  • Basic operations in R
  • Data visualization with ggplot2
10 Basic R Functions

# 1. summary(): Summary statistics
summary(mtcars)

# 2. head(): First rows of a dataset
head(mtcars)

# 3. tail(): Last rows of a dataset
tail(mtcars)

# 4. mean(): Compute the mean
mean(mtcars$mpg)

# 5. median(): Compute the median
median(mtcars$mpg)

# 6. sd(): Standard deviation
sd(mtcars$mpg)

# 7. table(): Frequency table
table(mtcars$cyl)

# 8. length(): Count elements
length(mtcars$mpg)

# 9. str(): Structure of an object
str(mtcars)

# 10. class(): Data type of an object
class(mtcars)
    
`filter()`: Select rows based on conditions

# Select cars with 6 cylinders
library(dplyr)
mtcars %>% filter(cyl == 6)
    

✅ Returns only rows where `cyl` is 6.

`select()`: Choose specific columns

# Select only mpg, cyl, and hp columns
library(dplyr)
mtcars %>% select(mpg, cyl, hp)
    

✅ Returns a table with only the selected columns.

`mutate()`: Create or modify columns

# Add a new column: horsepower per cylinder
library(dplyr)
mtcars %>% mutate(hp_per_cyl = hp / cyl)
    

✅ Adds a new column `hp_per_cyl` with computed values.

`group_by()` + `summarise()`: Aggregate data

# Calculate mean mpg per cylinder group
library(dplyr)
mtcars %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg))
    

✅ Returns a summary table with the mean `mpg` for each `cyl` group.

Basic Plots in Base R

# 1. Scatter plot (plot)
plot(x, y, main="Title", xlab="X Label", ylab="Y Label")

# 2. Boxplot (boxplot)
boxplot(y ~ x, data=df, main="Title", xlab="X Label", ylab="Y Label")

# 3. Histogram (hist)
hist(x, main="Title", xlab="X Label", breaks=10, col="lightblue")

# 4. Barplot (barplot)
barplot(table(x), main="Title", xlab="X Label", ylab="Count")

# 5. Pie chart (pie)
pie(table(x), main="Title")
    
What is ggplot2?
  • Part of the tidyverse
  • Based on Grammar of Graphics
What is the Grammar of Graphics?

Proposed by Leland Wilkinson, the Grammar of Graphics is a system for describing and constructing statistical graphics. It provides a structured approach to visualizing data by defining components like data, scales, geoms, and aesthetics.

Read more about the book here

The seven grammatical elements
Element Description
DataThe data-set being plotted.
AestheticsThe scales onto which we map our data.
GeometriesThe visual elements used for our data.
ThemesAll non-data ink.
StatisticsRepresentations of our data to aid understanding.
CoordinatesThe space on which the data will be plotted.
FacetsPlotting small multiples.
Basic Structure of `ggplot2`

`ggplot2` follows a layered approach to building plots.


# Basic structure of a ggplot:
library(ggplot2)

ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point()
    

✅ This code creates a scatter plot of `wt` vs `mpg` from `mtcars`.

Additional Resources