Lab #4 Markdown File

In this lab we will be practicing the fundamentals of programming in R.

Be sure to do the required readings first. While many of the problems can be solved using approaches from the lecture videos, lab videos, or required readings, you may need to do some searching on the internet to solve some of the problems. This will be a valuable skill to learn as you develop your data science skills.

This lab should be submitted in both R Markdown (.Rmd) file and knitted HTML web page (.html) formats, and you should be starting with the Lab markdown file (download here) after replacing the author name with your own. While the R Markdown file should include all of the code you used to generate your solutions, the Knitted HTML file should ONLY display the solutions to the assignment, and NOT the code you used to solve it. Imagine this is a document you will submit to a supervisor or professor - they should be able to reproduce the code/analysis if needed, but otherwise only need to see the results and your write-up. The TA should be able to run the R Markdown file to directly generate the same exact html file you submitted. Submit to your TA via direct message on Slack by the deadline indicated on the course website.

Required reading:

Optional resources:


Questions


1. In your own words, describe what a function is and provide one example of how you might use it in a data science project.

write your answer here


2. Packages in R can contain many useful functions/commands. If you didn’t know what a certain function did, or how it worked, where within RStudio would you look to learn more / see example code? Where would you look outside RStudio?

write your answer here


3. Write a function that takes a character vector as an argument and returns a character vector containing the first letters of each element in the original vector. To show that it works, test it on the character vector sentence defined below.

sentence <- c('you', 'only', 'understand', 'data', 'if', 'data', 'is', 'tidy')
# your answer here


4. Create your own function which accepts a birthyear vector and returns an approximate current age, then use it on the birthyear column of the congress dataframe to create a new age column with mutate.

Note: functions used inside mutate accept single columns from the original dataframe and return a column or vector of the same size. This is a valuable tool for developing your workflow.

# your answer here


5. Write a function that accepts a date string and returns the day of the week it corresponds to, then use it to create a new column of congress representing the weekday of the birth of each politician using mutate.

# your answer here


6. Write a function that accepts a dataframe with the columns birthday and full_name, and prints the names and ages of the k oldest representatives in congress (i.e. not including senators) using a “for loop”. In this sense, k is an arbitrary number that should be given as an argument to the function - set the default value to be 5. If you use the dataframe as the first argument, you can use the pipe operator (“%>%”) to pass the dataframe directly to the function. Define your function such that you can use it like this: congress %>% print_oldest(3).

# your answer here


7. Starting with the function from the previous question, change it such that if k > 5, it only prints the first 5. Test isusing this code: congress %>% print_oldest(100).

# your answer here


8. Last week you were asked to come up with two interesting social science research questions you could address with your final project. This week, I’d like you to find at least one potential data source you could analyze (in theory) to answer each of those questions. If you can’t find a potential data source, feel free to change your question (but make sure you state it explicitly). In research that uses data science, there is often a tension between the questions you would like to ask and the data that is available. You can formulate a research question by going back and forth between your question and available data.

# your answer here