Lab #4 Markdown File

Lab Instructions

In this lab we will be practicing the fundamentals of programming in R.

See the “Instructions” section of the Introduction to Lab Assignments page for more information about the labs. That page also gives descriptions for the datasets we will be using.

Required reading:

Optional resources:


Example Questions


ex1. Write a function that accepts a numberical vector and returns a boolean vector indicating whether or not each number is even.

is_even <- function(nums) {
  return((nums %% 2)==0)
}

my_numbers <- c(1, 4, 23, 34, 23, 5, 4, 39)

head(is_even(my_numbers))
## [1] FALSE  TRUE FALSE  TRUE FALSE FALSE
my_numbers %>% is_even() %>% head()# note that this is equivalent if you've loaded dplyr
## [1] FALSE  TRUE FALSE  TRUE FALSE FALSE


ex2. Write a function to compute the birth decade from the birthyear of each politician. Then write a function to generate a label for each decade. Use these functions within the mutate function to create new columns in the congress dataframe.

get_decade <- function(byear) {
  return(round(byear/10)*10)
}
get_decade_label <- function(byear) {
  return(paste0(round(byear/10), "0's"))
}

# using functions without mutate
get_decade(congress$birthyear) %>% head()
## [1] 1950 1960 1940 1950 1960 1930
get_decade_label(congress$birthyear) %>% head()
## [1] "1950's" "1960's" "1940's" "1950's" "1960's" "1930's"
congress %>% 
  mutate(birthdec=get_decade(birthyear), birthdec_label=get_decade_label(birthyear)) %>% 
  select(full_name, birthdec, birthdec_label) %>% 
  head()
##              full_name birthdec birthdec_label
## 1        Sherrod Brown     1950         1950's
## 2       Maria Cantwell     1960         1960's
## 3   Benjamin L. Cardin     1940         1940's
## 4     Thomas R. Carper     1950         1950's
## 5 Robert P. Casey, Jr.     1960         1960's
## 6     Dianne Feinstein     1930         1930's


Questions


1. In your own words, describe what a function is and provide one example of how you might use it in a data science project.

write your answer here


2. Packages in R can contain many useful functions/commands. If you didn’t know what a certain function did, or how it worked, where within RStudio would you look to learn more / see example code? Where would you look outside RStudio?

write your answer here


3. Write a function that takes a character vector as an argument and returns a character vector containing the first 2 letters of each element in the original vector. To show that it works, test it on the character vector sentence defined below.

sentence <- c('you', 'only', 'understand', 'data', 'if', 'data', 'is', 'tidy')
# your answer here


4. Create your own function which accepts a birthyear vector and returns an approximate current age, then use it on the birthyear column of the congress dataframe to create a new age column with mutate. Then compute the average age of Male and Female congress members.

Note: functions used inside mutate accept single columns from the original dataframe and should return a column or vector of the same size. This is a valuable tool for developing your data pipelines.

# your answer here


5. Create your own function which accepts a string vector of phone numbers like the phone column of congress_contact and returns an area code (first three numbers in a phone number), then use it on the phone column of the congress_contact dataframe to create a new area column with mutate.

# your answer here


6. Write a function that accepts a dataframe with the columns birthdate and full_name, and prints (using the print function) the names and ages of the k oldest representatives in congress (i.e. not including senators) using a “for loop”. In this sense, k is an arbitrary number that should be given as an argument to the function - set the default value to be 5. If you use the dataframe as the first argument, you can use the pipe operator (%>%) to pass the dataframe directly to the function. Define your function such that you can use it like this: congress %>% print_oldest(3).

# your answer here


7. Starting with the function from the previous question, change it such that if k > 5, it only prints the first 5 names and ages. Test using this code: congress %>% print_oldest(100) (it should print ONLY the first 5 names and ages).

# your answer here
# you will define print_oldest
#congress %>% print_oldest(100)


8. Last week you were asked to come up with two interesting social science research questions you could address with your final project. This week, I’d like you to find at least one potential data source you could analyze (in theory) to answer each of those questions. If you can’t find a potential data source, feel free to change your question (but make sure you state it explicitly). In research that uses data science, there is often a tension between the questions you would like to ask and the data that is available. You can formulate a research question by going back and forth between your question and available data.

# your answer here