In this lab we will be practicing the fundamentals of programming in R.
See the “Instructions” section of the Introduction to Lab Assignments page for more information about the labs. That page also gives descriptions for the datasets we will be using.
Required reading:
Optional resources:
ex1. Write a function that accepts a numberical vector and returns a boolean vector indicating whether or not each number is even.
is_even <- function(nums) {
return((nums %% 2)==0)
}
my_numbers <- c(1, 4, 23, 34, 23, 5, 4, 39)
head(is_even(my_numbers))
## [1] FALSE TRUE FALSE TRUE FALSE FALSE
my_numbers %>% is_even() %>% head()# note that this is equivalent if you've loaded dplyr
## [1] FALSE TRUE FALSE TRUE FALSE FALSE
ex2. Write a function to compute the birth decade from the birthyear of each politician. Then write a function to generate a label for each decade. Use these functions within the mutate function to create new columns in the congress dataframe.
get_decade <- function(byear) {
return(round(byear/10)*10)
}
get_decade_label <- function(byear) {
return(paste0(round(byear/10), "0's"))
}
# using functions without mutate
get_decade(congress$birthyear) %>% head()
## [1] 1950 1960 1940 1950 1960 1930
get_decade_label(congress$birthyear) %>% head()
## [1] "1950's" "1960's" "1940's" "1950's" "1960's" "1930's"
congress %>%
mutate(birthdec=get_decade(birthyear), birthdec_label=get_decade_label(birthyear)) %>%
select(full_name, birthdec, birthdec_label) %>%
head()
## full_name birthdec birthdec_label
## 1 Sherrod Brown 1950 1950's
## 2 Maria Cantwell 1960 1960's
## 3 Benjamin L. Cardin 1940 1940's
## 4 Thomas R. Carper 1950 1950's
## 5 Robert P. Casey, Jr. 1960 1960's
## 6 Dianne Feinstein 1930 1930's
1. In your own words, describe what a function is and provide one example of how you might use it in a data science project.
write your answer here
2. Packages in R can contain many useful functions/commands. If you didn’t know what a certain function did, or how it worked, where within RStudio would you look to learn more / see example code? Where would you look outside RStudio?
write your answer here
3. Write a function that takes a character vector as an argument and returns a character vector containing the first 2 letters of each element in the original vector. To show that it works, test it on the character vector sentence
defined below.
sentence <- c('you', 'only', 'understand', 'data', 'if', 'data', 'is', 'tidy')
# your answer here
4. Create your own function which accepts a birthyear vector and returns an approximate current age, then use it on the birthyear
column of the congress
dataframe to create a new age
column with mutate
. Then compute the average age of Male and Female congress members.
Note: functions used inside mutate accept single columns from the original dataframe and should return a column or vector of the same size. This is a valuable tool for developing your data pipelines.
# your answer here
5. Create your own function which accepts a string vector of phone numbers like the phone
column of congress_contact
and returns an area code (first three numbers in a phone number), then use it on the phone
column of the congress_contact
dataframe to create a new area
column with mutate
.
# your answer here
6. Write a function that accepts a dataframe with the columns birthdate
and full_name
, and prints (using the print
function) the names and ages of the k
oldest representatives in congress (i.e. not including senators) using a “for loop”. In this sense, k
is an arbitrary number that should be given as an argument to the function - set the default value to be 5. If you use the dataframe as the first argument, you can use the pipe operator (%>%
) to pass the dataframe directly to the function. Define your function such that you can use it like this: congress %>% print_oldest(3)
.
# your answer here
7. Starting with the function from the previous question, change it such that if k > 5, it only prints the first 5 names and ages. Test using this code: congress %>% print_oldest(100)
(it should print ONLY the first 5 names and ages).
# your answer here
# you will define print_oldest
#congress %>% print_oldest(100)
8. Last week you were asked to come up with two interesting social science research questions you could address with your final project. This week, I’d like you to find at least one potential data source you could analyze (in theory) to answer each of those questions. If you can’t find a potential data source, feel free to change your question (but make sure you state it explicitly). In research that uses data science, there is often a tension between the questions you would like to ask and the data that is available. You can formulate a research question by going back and forth between your question and available data.
# your answer here