Lab #5 Markdown File

In this lab we will be practicing the fundamentals of network analysis.

Be sure to do the required readings first. While many of the problems can be solved using approaches from the lecture videos, lab videos, or required readings, you may need to do some searching on the internet to solve some of the problems. This will be a valuable skill to learn as you develop your data science skills.

This lab should be submitted in both R Markdown (.Rmd) file and knitted HTML web page (.html) formats, and you should be starting with the Lab markdown file (download here) after replacing the author name with your own. While the R Markdown file should include all of the code you used to generate your solutions, the Knitted HTML file should ONLY display the solutions to the assignment, and NOT the code you used to solve it. Imagine this is a document you will submit to a supervisor or professor - they should be able to reproduce the code/analysis if needed, but otherwise only need to see the results and your write-up. The TA should be able to run the R Markdown file to directly generate the same exact html file you submitted. Submit to your TA via direct message on Slack by the deadline indicated on the course website.

Required reading:

Optional resources:


Questions


1. Describe the following concepts:

nodes: 
edges: 
vertices:
links: 
edge list: 
adjacency matrix: 
directed network: 
undirected network: 
weighted network: 
unweighted network: 
bipartite network: 


2. Describe three different metrics (e.g. betweenness centrality, degree, etc) that can be used to summarize networks or specific nodes/vertices within a network.

1. 
2. 
3. 


3. For the following problems, we will be working with the committees dataframe. What is the format of this network representation? Is the network weighted/unweighted, directed/undirected, bipartite/unipartite?

# Yor answer here or in text.


4. Create a network representing links between the five largest senate committees and the senators (not representatives) that compose them, and create a basic ggraph visualization to show this network. This will require you to use the data wrangling skills we’ve learned in the previous labs with some igraph functions. What do you see? Can we learn anything about congressional committees from this network?

Hint: you will need to join congress and committee data in order to identify commitees with senate members (all committees are composed of either house or senate members - never both).

# Your answer here
Written answer here.


5. Create a new bipartite network from committees that includes all committees and members of congress. Then create an adjacency matrix that you further use to make a new graph. Was the bipartite property preserved in the graph created from the adjacency matrix? Why does igraph work this way?

Hint: you can create a bipartite network by assigning a value to the type node attribute. You can verify that you have a bipartite network by using the is_bipartite function.

# Your answer here.
written answer here


6. Starting with our bipartite network, we want to compute a network representing the number of common committees each member of congress has with each other. To do this, we can use the bipartite_projection function to create two separate networks: one indicating the number of common committees that each politician serves on, and the other indiciating the number of common members between each committee. In the politican network, add full names, gender, political party, and age as node properties. We will use this graph for future problems.

# Your answer here


7. Make a visual of of the senator co-membership network where the node colors are based on the gender of the politican and node size is based on the politician age. For aesthetic purposes, you should try removing edges that are below a certain weight (weight <= 2 seems reasonable). Try two different layouts and discuss their strengths and weaknesses.

Hint: see some available layouts here.

Note: it may be difficult to make a clean visualization with so many nodes and edges, but do your best - this is a common task when working with network data. You may also try removing vertices with a small number of edges if it is easier to read.

# Your answer here or in text.
describe strengths and weaknesses here.


8. Calculate the strength (weighted degree), eigenvalue, closeness, and betweenness centrality for each politician in this network, and compute the correlations between strength and all the others. Why are some of the correlations negative?

Hint: read the documentation for the centrality calculation functions carefully. The important difference is in how the edge weights are interpreted. This is an important consideration when thinking about the types of centrality you want to use for your analysis.

# Yor answer here or in text.
written explanation here


9. Add the centrality measures from the previous question to the congress dataframe. Which politician has the highest strength (weighted degree) centrality, and how do you interpret this value? How about the highest eigenvector centrality?

# your answer here
written explanation here


10. Create a violin plot showing the average centrality (you choose the measure) for men and women, and then do the same for each political party. What can you conclude from these?

# your answer here
written explanation here


11. In last week’s lab excerice, you were asked to identify several possible datasets you could use for your final project. Now write two specific data science research questions and describe variables in that dataset that could allow you to answer the questions.

What is a good research question? A good data science research question specifies a relationship between two or more variables that you can measure. The question “why did the chicken cross the road?” is not a good research question because it does not explicitly describe the relationship between any variables. The question “do chickens cross the road more frequently than raccoons?” is good because it specifies a relationship between the type of animal (chickens and raccoons) and the frequency with which the animal crosses the road. Your question should be answerable given the specific variables available in your dataset.

# your answer here