Lab #3 Markdown File

In this lab we will practice using the ggplot2 library to create visualizations of our data. Our standard for visualizations is that each plot should have axis labels, all labels must be readable to someone unfamiliar with your data (e.g. Female and Male instead of F and M, Senator and Representative instead of rep and sen), and we should easily be able to tell what your figure is showing. Failure to do this will result in point deductions.

Be sure to do the required readings first. While many of the problems can be solved using approaches from the lecture videos, lab videos, or required readings, you may need to do some searching on the internet to solve some of the problems. This will be a valuable skill to learn as you develop your data science skills.

This lab should be submitted in both R Markdown (.Rmd) file and knitted HTML web page (.html) formats, and you should be starting with the Lab markdown file (download here) after replacing the author name with your own. While the R Markdown file should include all of the code you used to generate your solutions, the Knitted HTML file should ONLY display the solutions to the assignment, and NOT the code you used to solve it. Imagine this is a document you will submit to a supervisor or professor - they should be able to reproduce the code/analysis if needed, but otherwise only need to see the results and your write-up. The TA should be able to run the R Markdown file to directly generate the same exact html file you submitted. Submit to your TA via direct message on Slack by the deadline indicated on the course website.

Required reading:

Optional resources:


Questions


1. Describe the functionality of each of the following functions:

group_by: 
summarise: 
inner_join: 
left_join: 


2. Create a bar plot to show the average ages of democrat and republican congress members. Now do the same for M and F genders (this second part should include members of all parties).

# your answer here


3. Create two bar charts: one that shows the total number of social media accounts among democrats and republicans (Twitter, Facebook, YouTube), and one that shows the average number of accounts per-politician for each party. Which political party has more social media accounts? Which party has a higher per-politician average?

Note: there are several ways to accomplish this. You could use gather again and then group_by and summarise within politician and then within party, or you could use mutate to get counts for each politican and then average by party. Any other approach is also fine.

# your answer here
Your answers here: Which political party has more social media accounts? Which party has a higher average per-politician?


4. Use an inner join to combine the columns of the committees dataframe with the columns of congress, and create a plot showing the average number of committees that democrats and republicans belong to. Next create a plot showing the averages by gender (note: this second part should include members of other parties as well).

# your answer here


5. Create a bar plot showing the number of members that belong to the 10 largest congressional committees (i.e. committees with the largest number of members). The bars should be sorted based on committee sizes.

Note: Our standard for visualizations is that each plot should have axis labels, all labels must be readable, and we should easily be able to tell what your figure is showing. Failure to do this will result in point deductions.

# your answer here


6. Create a single bar plot that shows the average age of the committees with the 5 highest and lowest average ages. The bars should be sorted based on average committee ages. Which committees have the highest and lowest average ages?

# your answer here


7. Create a line graph showing the total number of politician births in each decade since the 1930’s, with separate lines for senate and house members (see the type column). The labels on your x-axis should look like “1930’s”, “1940’s”, and so on, and your legend should show values “Senator” and “Representative” (i.e. not rep and sen).

Note: The plotted lines may not be continuous if there were no births in some decades.

# your answer here


8. Create a bar chart showing the frequency of politician births by month and another bar chart showing politician births by weekday. The x-labels should be month names and weekday names, respectively, and appear in chronological order.

Note: you can use the lubridate package methods to get weekday and month names.

# your answer here


9. Using the topics you described last week or a new topic you have been thinking about, describe two social science questions that you would be interested in exploring for your final project. Do you think these questions might be answerable using real data?

# your answer here