Introduction
Note
Email me at montserrat_ibarra@berkeley.edu
Note
Email me at clara_reckhorn@berkeley.edu
Note
Email me at jocelyneperez@berkeley.edu





Before course logistics, we would like to get to know a bit about you.
05:00
05:00
What factors influence the amount of sleep a Berkeley student gets on weekdays?
03:00
Assume our data is cleaned and ready to go… we will save the actual data cleaning for later units!
library(tidyverse)
# toy dataset
sleep_data <- tibble(
student_id = 1:10,
sleep_hours = c(6.5, 7, 5.5, 8, 6, 7.5, 6, 5, 8.5, 7),
transfer = c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE),
grad_student = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE),
commute_min = c(45, 10, 60, 15, 20, 50, 5, 70, 25, 30),
favorite_drink = c("Coffee", "Water", "Coffee", "Tea", "Tea",
"Coffee", "Water", "Coffee", "Tea", "Water"),
major = c("Econ", "Stats", "CS", "Psych", "Public Health",
"Econ", "Stats", "CS", "Bio", "Psych"),
units = c(16, 13, 18, 14, 12, 17, 15, 19, 11, 16)
)That code will result in:
# A tibble: 10 × 8
student_id sleep_hours transfer grad_student commute_min favorite_drink major
<int> <dbl> <lgl> <lgl> <dbl> <chr> <chr>
1 1 6.5 TRUE FALSE 45 Coffee Econ
2 2 7 FALSE FALSE 10 Water Stats
3 3 5.5 TRUE FALSE 60 Coffee CS
4 4 8 FALSE FALSE 15 Tea Psych
5 5 6 FALSE TRUE 20 Tea Publ…
6 6 7.5 TRUE FALSE 50 Coffee Econ
7 7 6 FALSE FALSE 5 Water Stats
8 8 5 TRUE FALSE 70 Coffee CS
9 9 8.5 FALSE TRUE 25 Tea Bio
10 10 7 FALSE FALSE 30 Water Psych
# ℹ 1 more variable: units <dbl>
05:00
10:00
# graph 1 - sleep vs commute time (scatter plot with regression line)
graph1 <- ggplot(sleep_data, aes(x = commute_min, y = sleep_hours)) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Sleep vs Commute Time",
x = "Commute Time (minutes)",
y = "Average Weekday Sleep (hours)"
) +
theme_minimal()
# graph 2 - sleep vs favorite drink (bar chart)
graph2 <- sleep_data %>%
group_by(favorite_drink) %>%
summarise(avg_sleep = mean(sleep_hours)) %>%
ggplot(aes(x = favorite_drink, y = avg_sleep)) +
geom_col() +
labs(
title = "Average Sleep by Favorite Drink",
x = "Favorite Drink",
y = "Average Sleep (hours)"
) +
theme_minimal()
# graph 3 - sleep vs units (scatter plot colored by transfer status)
graph3 <- ggplot(sleep_data, aes(x = units, y = sleep_hours, color = transfer)) +
geom_point(size = 3) +
labs(
title = "Sleep vs Units by Transfer Status",
x = "Units Enrolled",
y = "Average Weekday Sleep (hours)",
color = "Transfer Student"
) +
theme_minimal()
# graph 4 - sleep vs major (boxplot colored by grad student status)
graph4 <- ggplot(sleep_data, aes(x = major, y = sleep_hours, fill = grad_student)) +
geom_boxplot() +
labs(
title = "Distribution of Sleep Hours by Major",
x = "Major",
y = "Average Weekday Sleep (hours)",
fill = "Grad Student"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))



and subscribe to NYT!
On a piece of paper, please write your name and answer the following:
- One thing you are excited to learn in this course
- One thing you are nervous about in this course
- If you have any questions, feel free to ask them here
