In this lab we will practice using methods from the tidyverse
package.
See the “Instructions” section of the Introduction to Lab Assignments page for more information about the labs. That page also gives descriptions for the datasets we will be using.
Required reading:
Optional resources:
gather
function documentation (see examples)ex1. how many female and male senators from the Democrat party are older than 60? For this and all assignments, you may approximate age as approx age = current year - birthyear
.
# this solution uses pipes (see required reading)
congress %>%
filter(type=='sen', party=='Democrat', birthyear > 1962) %>%
count(gender)
## gender n
## 1 F 4
## 2 M 11
# this is old-school
tmp <- filter(congress, type=='sen', party=='Democrat', birthyear > 1962)
count(tmp, gender)
## gender n
## 1 F 4
## 2 M 11
ex2. Give the names of the 2 oldest senators.
# this solution uses pipes
congress %>%
filter(type=='sen') %>%
arrange(birthyear) %>%
select(full_name) %>%
head(2)
## full_name
## 1 Dianne Feinstein
## 2 Chuck Grassley
ex3. How many valid Twitter and Facebook account handles are there in congress_contact
?
# your answer here
congress_contact %>%
select(facebook, youtube) %>%
gather(key='platform', value='handle', facebook, youtube) %>%
count(are_valid = handle!='') %>%
filter(are_valid)
## are_valid n
## 1 TRUE 638
congress_contact %>%
select(facebook, youtube) %>%
pivot_longer(facebook:youtube, names_to='platform', values_to='handle') %>%
count(are_valid = handle!='') %>%
filter(are_valid)
## # A tibble: 1 × 2
## are_valid n
## <lgl> <int>
## 1 TRUE 638
1. Describe what the following tidyverse functions do. Also describe the pipe operator “%>%”. You may need to look up the official documentation for each of these.
filter:
select:
mutate:
count:
arrange:
gather:
pivot_longer:
pipe operator ("%>%"):
2. How many male and female members are representatives and senators? Your output should appear as a single dataframe with four rows corresponding to female representatives, female senators, male representatives, and male senators.
NOTE: you can identify representators by type=="rep"
and senators using type=="sen"
.
# your answer here
3. Create a dataframe showing the oldest and youngest female senator democrats using only R code.
HINT: check out the slice()
function.
# your answer here
4. Using mutate
, create a new variable called age
which represents the approximate age of each member of congress. How many democratic senators are over 60 years old?
Note: For this and all future questions this semester, you can approximate age using the formula age = 2022-birthyear
(or replace 2022 with the current year).
# your answer here
5. Create a new column that indicates whether or not the member of congress is more than 55 years old, and create a single dataframe showing the number of male and female members of congress that are over and under 55.
Note: the dataframe should have four rows: number of females over 55, number of males over 55, number of females 55 and under, number of males 55 and under.
# your answer here
6. For this problem, use the congress_contact
dataframe that includes the contact information for each member of congress. Using gather
, create a new dataframe where each row corresponds to a valid twitter, facebook, or youtube social media account username (for youtube you can ignore youtube_id
), and compute the total number of valid accounts across all congress members. Then accomplish the same task using pivot_longer
.
HINT: see the link to the gather
function documentation in the recommended readings to get a better sense of how it works. You can use the example data they show to try it for yourself.
HINT: the account name will be an empty string in cases where the member of congress does not have a valid user account, so you can filter using the conditional column_name != ''
. This compares each value in the column called column_name
to an empty string to determine whether it is a valid account.
# your answer here
Below, I define for you two vectors corresponding to policies that US States have adopted to respond to COVID-19: restrictions on travel (recorded May 20, 2020) and requirements that citizens to wear masks in public (recorded August 17, 2020).
travel_restrictions <- c("WA", "OR", "NV", "CA", "NM", "MN", "IL", "OH", "MI", "PA", "VA", "NY", "MA", "VH", "ME", "DE", "MD", "NJ")
require_masks <- c("HI", "WA", "OR", "NV", "CA", "MT", "CO", "NM", "KS", "TX", "MN", "AR", "LA", "WI", "IL", "AL", "MI", "IN", "OH", "KY", "WV", "NC", "VA", "DC", "DE", "PA", "NY", "VT", "NH", "MA", "RI", "CT", "ME")
7. write code to print only the states who implemented both travel restrictions and mask requirements:
# your answer here
8. Write code to print the states who had implemented mask requirements but not travel restrictions:
# your answer here
9. Describe two broad topics you might be interested in exploring for your final project. How would you use data science to gain insight about these topics? We won’t require you to stick with these topics - we just want to see you brainstorming about what you might be interested in.
your written response here