Sophie Sanda
28 April 2022
Introduction:
The Covid-19 pandemic has completely reshaped the lives of every living person. In just the United States, there have been 80.3 million cases and 984,000 deaths since the beginning of the pandemic (NYT, 2020). Communications surrounding Covid-19 have focused on the dangers and best ways to protect oneself against the virus. In turn, the Covid-19 pandemic has received some of the most media attention of any world event (Mach et al., 2021).Given its occurrence in the age of social media and the dangers of a modern pandemic, there has been increased interest in the pandemic. However, public interest in Covid-19 has not been stable across the United States or across time.
Therefore, in the paper below, I will investigate how public interest in Covid-19 is affected by the status of the pandemic and how the relationship between interest and status has changed over the course of the pandemic. More specifically, how have the amount of Covid-19 related Google searches in relation to the number of cases, positive tests and deaths changed over the course of the pandemic? Do searches for “covid” increase when the number of cases, positive tests and deaths due to Covid-19 increase? Do these factors’ effects vary across different states?
Understanding how interest in Covid-19 changes during the pandemic is important, as it can help track how worried people are about the disease. Many times when people are worried about a topic, they will tend to try to get more information on it. As Google is the most widely used search engine, this phenomena would likely increase Covid-19 searches (Traczyk et al., 2018). It could also provide information on how safe people feel with current policies across states. Understanding public interest in the pandemic can help public and health officials properly communicate with the public on danger of the disease and proper safety measures. Additionally, changes in interest can help people understand why people may be willing to break against policies related to Covid-19.
The research in this paper will compare weekly Google Trends data from March 1, 2020 to March 13, 2022 and weekly Covid-19 case data by state. The data will be analyzed using mixed effects models with interaction terms for each pandemic year in order to view the relationship over time. Variable importance calculations will be done for each state in order to see which states were most affected by certain variables. The research hypotheses include that searches for “covid” increase when deaths, cases or positive tests due to Covid-19 increase, that the magnitude of the relationship between searches for “covid” and cases, deaths and positive tests decreases from the start of the pandemic to now, and that states that had the highest amount of cases or deaths had more variable importance for predicting Google Trends.
Background:
Many studies have attempted to show the relationship between public interest in Covid-19 and the progression of Covid-19. The most notable study on the Covid-19 public interest using Google searches is a study by Husain et al. The study was run from December 31, 2019 to March 24, 2020. The researchers found a positive correlation between a rise in cases and a rise in Covid-19 related Google searches (Husain et al., 2020). This created support for our research hypothesis in that people will search for “covid” more when there are spikes in cases. However, the study did not investigate how this relationship changed later on in the pandemic. Additionally, the study focused on the beginning of the pandemic, which had very different public interest than Covid-19 does now.
Moreover, studies have shown that increases in Covid-19 related information have led to more searches online out of fear. Covid-19 has been very fear-inducing both for physical danger and lack of information on the topic. It is likely that increased need for information comes out of fear from dangers associated with Covid-19. A study by Traczyk et al. on fear and search effort found that inciting fear in participants increased information search effort in some people (Traczyk et al., 2018). This provides some support for the idea that when a fearful event, such as an increase in Covid-19 cases or deaths, occurs, there is likely a higher effort on searching for information, which could be shown in Google searches.
Additionally, research on other diseases suggests that Covid-19 cases had an effect on the amount of Google searches for diseases. Bansal et al. found that searches for “Myocarditis” (inflammation of the heart) increased significantly during the first wave of the pandemic when cases began to peak in the United States. Potentially, this effect extends to Covid-19 searches, as that is the most related illness to Covid-19. Researchers also found that in places where there were high levels of Covid-19, people searched more for “Myocarditis” (Bansal et al., 2021). This provides evidence that states with more Covid-19 will search more.
Desentization is a common phenomena when people undergo long-term stress and danger. Stevens et al. researched that people are becoming increasingly desensitized to information on Covid-19. They discovered that increasing death tolls were matched with lower levels of anxiety in the third and fourth quartile of deaths compared to earlier numbers. Over time, people seemed to be less nervous over the rise in deaths related to Covid-19, even when exposed to tweets of information on death tolls (Stevens et al., 2020). This supports the hypothesis that Google searches will decrease even with rising Covid-19 cases as the pandemic progresses. People are likely to be more desentized now to the dangers of Covid-19 compared to the beginning of the pandemic.
The research below will utilize Google Trends data as a measure of public interest. Choi and Varian used Google trends data in order to accurately predict economic indicators. They showed that Google Trends closely predicted the true values of many present indicators (Choi and Varian, 2012). Therefore, it is likely that Google Trends should be able to accurately track how interested people are in Covid-19 based on search measures, as many economic indicators represent human behavior.
While the studies above provide a great foundation on public interest in Covid-19 in regards to the status of the pandemic, there are few studies that measure the relationship between public interest and multiple factors (new cases, new positive tests, and new deaths) and how this relationship changes over both time and geographic location. This research will aim to answer this question more deeply by looking at Google searches in relation to pandemic factors over time and space from March 1, 2020 to March 13, 2022. This is vital to understanding how to properly communicate with the public and create interest in safety practice, such as vaccines, and potential dangers, such as new variants. Based on the trends in past research, there is extensive background for the idea that increased danger will increase interest in Covid-19 but not for extended periods of time across all areas of the United States.
Datasets:
The research will primarily pull from two forms of data: Google Trends Data and Center for Disease Control and Prevention Covid-19 State Data. To start, Google Trends data was used in order to track searches for “covid” over the course of the pandemic across states (Google Trends, 2022). “covid” was chosen as the word to track because compared to other Covid-19 related words, such as “covid-19” or “corona”, it had the highest relative searches (Google Trends, 2022). Google Trends showcases normalized values between [0,100] for numbers of searches for a specific word. 100 represents the most popular week of searches for that word, while 0 represents the least popular week of searches within a finite time period. The data was collected on a weekly basis on Sundays from March 1, 2020 to March 13, 2022.
The gtrendsR API was downloaded and used to retrieve data quickly from Google for each state. There existed a rate limit of five on how many states could be requested at once. Therefore, a function was created in order to request the data for each state and consolidate it into a single dataframe. The gtrends function was used to get interest over time for “covid” searches from March 1, 2020 to March 13, 2022 in each state. The output of each gtrends function call was a dataframe with 5 columns: date, hits (normalized search value), keyword (“covid”), geo (state), and time. The data frame was narrowed down to only include date, hits and geo.
However, the API could only pull half of the states due limitations of the package. The states.abb function was used to get a list of state abbreviations along with the current data pulled in order to compare which states needed to be pulled manually from Google’s website. In order to manually download the trends, each state’s data had to be downloaded individually. The CSV file names were changed in order to correspond to the state and then read into R. They included two columns, one for the date and one for the hits. Additionally, these datasets did not include the state names and had to be manually added using the mutate function. Rbind() was then used in order to add all of these datasets into the same dataframe. The final dataset included three columns: geo (state abbreviation), date (week), and hits (normalized values of google searches) for all 50 states. Additionally, the date column had to be transformed by the lubidate function to correctly represent the dates.
The second dataset was obtained from the CDC Covid-19 database (CDC, 2022). The data was downloaded from the CDC data website as a CSV file and read into R. The dataset included 15 columns; however; only the columns for state, submission_date, tot_cases, new_cases, pnew_case, tot_death and new_death were used in the analysis. The dataset had the number of new cases, new positive tests, and new deaths for each day and state. The date variable had to be modified to a date variable using the lubridate function. The data also had to be modified to be weekly compared to daily. Then, the Covid-19 dataset was combined through an inner join with the state Google trends data on “date” and “geo”. Hits of Google searches was a character and had to be converted to a numeric value. The final dataset (state_covid_trends) included eight columns: date, hits, geo, tot_case, new_case, pnew_case, tot_death and new_death.
Methodology:
Three forms of analysis were used in order to answer the research questions. First, a mixed-effects model was created using the lme4 package. A year variable was created from the date variable in order to group dates by year and see variation in relationships between different time periods. This mixed model was created using the lmer function where hits was the response variables and new_case, pnew_case and new_death were dependent variables. Interactions between the dependent variables and year were created in order to see how the effects of the variables varied from year to year. Additionally, the model was grouped by state in order to remove any additional variance created by different states, which was the motivation for using a mixed-effects model. In the model output, the interaction effects for each variable and year were added to the base effect of each variable in order to see how the effect changed in different years.
Secondly, a lagged mixed-effects model was used. This model was identical to the model mentioned above. However, each of the hits entries were paired with the previous week of Covid-19 data for each state. This lag was used in order to study if increased exposure to Covid-19 activity would have a greater effect on searches. Potentially, people need time in order to know about a spike in cases or deaths before searching online. This model’s output was compared to the first mixed-effect model’s output to see if waiting some time to record searches had any effect on the amount of searches. The lagged mixed model included hits as the response variables and new_case, pnew_case and new_death were dependent variables, along with the interactions between year and the dependent variables (see below for model calls). In the results, the interaction effects for each variable and year were added to the base effect of each variable in order to see how the effect changed in different years.
no_lag_model <- lmer(hits ~ new_case*year + pnew_case*year + new_death*year + (1| geo), state_covid_trends)
lagged_model_lmer <- lmer(hits ~ new_case*year + pnew_case*year + new_death*year + (1| geo), state_covid_trends_lag)
Finally, the variable importance of each dependent variable was tested across each state. A function was created in order to run a linear regression model for each state. The linear model was run using the lm function where hits was the response variables and new_case, pnew_case and new_death were the dependent variables. The caret package was then imported in order to run the varImp() function. varImp() function tracks changes in the model statistics when variables are added, and the total reduction of the statistic is used as the variable importance value. This function was run individually on a linear model for each state. A collidated data frame was then created with the columns: state, new_case varImp, pnew_case varImp, and new_death varImp. These values were then converted from characters to numeric values. The total Covid-19 cases and deaths (tot_cases and tot_death) on March 13, 2022 (last day of data) for each state were then added from the state_covid_trends dataset based on corresponding states. Correlations between variable importance for new_cases and total cases as well as new_deaths and total deaths were run across the dataset. The data frame was then arranged in descending order to see which states had the highest variable importance for each variable and which states had the highest levels of cases and deaths. These were then compared to the states that had the highest and lowest cases and deaths. Many states did not have variable importance for pnew_cases, as it was not significant to the individual models. Therefore, this variable was not further analyzed.
Results:
Non-Lagged Mixed-Effects Model:
Overall, the results supported some of the research hypotheses. New_cases, pnew_cases and new_deaths all initially had a positive effect on the number of Google searches for “covid”. However, there was some fluctuation between how the variables effect changed over time. The variance between states was 92.52 and was readjusted in order to gain the fixed effects below. New cases had a positive effect on the number of Google searches with an estimate of 0.0003623 in 2020. When looking at how the estimate changed in 2021 and 2022, the estimates were 0.001473 for 2021 and 0.0017281 for 2022. Although the number of Google searches did increase when new cases increased, the relationship between the variables did not decrease over time, as hypothesized. Relatively, new cases had a greater effect on Google searches for Covid-19 in 2022 than in 2021 and 2020. Positive tests had a positive effect on the number of Google searches as well, except in 2022 where the relationship became negative. For 2020, the estimated effect on searches was 0.0041216. This effect decreased over the course of the pandemic. In 2021 and 2022, the estimated effect was 0.0023695 and -0.0029447, respectively. Therefore, positive tests did increase Google searches for “covid” in the early pandemic, but actually decreased searches towards the end. Finally, new deaths increased the number of Google searches for “covid” increased in 2020. In 2020, the estimated effect of new deaths on Google searches was 0.0157210. However, this effect became negative in 2021, with an estimated effect of -0.0213441, and this trend continued in 2022 where the effect decreased more to -0.1357917. Therefore, while increased deaths corresponded to increased searches in 2020, in 2021 and 2022, increased deaths showed decreased searches. Based on the non-lagged mixed-effects model, new cases was the only variable that corresponded to increased Google searches throughout the pandemic and the relationship did not decrease over time. On the other hand, positive tests and new deaths had a decreased relationship with Google searches as the pandemic progressed on, but these variables also had a negative relationship with Google searches in later years.
Lagged Mixed-Effects Model:
The lagged model measured the effects of one week prior Covid-19 status on the number of Google searches. Similar trends were seen to those in the non-lagged model. Similarly to the non-lagged model, new cases had a positive effect on the number of Google searches in 2020, 2021, and 2022 with estimated effects of 0.0004475, 0.000982, and 0.0014544. The effect of new cases on Google searches increased over the course of the pandemic, similar to the results of the non-lagged model. However, relative to the non-lagged model, the effects for all three years were slightly lower. New cases had a larger effect on Google Searches immediately after they happened.
Positive tests increased the number of Google Searches in 2020, with an estimate of 0.0060657. However, this effect decreased to 0.0019651 in 2021 and turned negative in 2022 with an estimate of -0.0034729. The lagged model shows that positive tests had a smaller effect on Google searches throughout the pandemic, and had a negative effect on searches in 2022. This trend mirrors the non-lagged effects of positive tests. On the other hand, the effects of positive tests are larger for the lagged-model compared to non-lagged model, showing that increased time can increase the effect of positive tests on Google searches.
Lastly, new deaths had a negative relationship with Google searches in 2020 with an estimated effect of -0.0256324. However, the effect increased in 2021 to -0.0165029, but then decreased again to -0.1505418. This is a very different trend from the non-lagged model, where the effect of new deaths consistently decreased throughout the pandemic. Additionally, the effect started off negative, showing that new deaths did not increase the number of Google searches a week after they were reported. Overall, new cases was the only variable to constantly have a positive effect on Google searches throughout the pandemic. However, the effect was increasing instead of decreasing over time. Positive tests had a positive effect on Google Searches in the first two years of the pandemic, but a negative one in 2022. In accordance with the hypothesis, the effect was decreasing. New deaths had a negative effect on Google searches all three years, proving the hypothesis wrong. Additionally, there was an increase in 2021 in the effect of new deaths on Google searches. The effects of new cases and new deaths were lower than in the non-lagged model; however, the effects for positive tests were higher in the lagged model.
Variable Importance in States:
Based on the values outputted by the varImp() function for each state, the five states with the highest variable importance for each variable are shown on the graphs below, along with the five states with the lowest importance for each variable. The states that found new cases the most important compared to other states in predicting Google searches were New York, Washington, Florida, Ohio and Rhode Island. The states with the lowest importance placed on new cases included Massachutts, West Virginia, South Dakota, Minnesota and Illinois. The states that had the highest variable importance for new deaths included Massachutts, Pennsylvania, Montana, Florida and Vermont. The states with the lowest variable importance are New Hamspire, Indiana, Kansas, Maine and Utah. There was very little overlap between the highest groups, outside of Florida or between lowest groups.
The correlation between new cases variable importance and total cases for each state was almost 0, showing there to be almost no relationship between the two variables. The correlation between new deaths variable importance and total deaths was higher than the case correlation but still quite low at 0.26. There seems to be no relationship between how important a variable is in predicting Google searches and how many cases or deaths each state had. The graphs show the states with the highest cases and deaths as well as the lowest cases and deaths. They do not match very well with the states’ varibale importance metrics.
Discussion:
The results did not provide solid support for our research hypotheses. Our prediction that when cases, positive tests or deaths increase, Google searches for “covid” would increase was only correct in 2020 for all variables in regards to the non-lagged analysis. However, new cases always maintained a positive effect on Google searches, and new positive tests had a positive effect in both 2020 and 2021 in both the lagged and non-lagged model. Surprisingly, new cases’ effect continued to increase throughout the pandemic.Therefore, it seems that the stage of the pandemic has a great effect on how these variables impact Google searches. Covid-19 cases and tests were highly publicized (Mach et al., 2021).Potentially, people paid more attention to these factors because of high media attention. More people also took tests or had Covid-19 than those who died from it; therefore, people may be more interested in these factors, leading to increased searches. Additionally, there could be a reciprocal effect between searches and new cases. A study by Lee et al. on public interest in Covid-19 found that many times people searched for Covid-19 related words before testing positive for the virus. However, the effects of searches on new cases was not studied in the current research but may play a role in why the relationship between new cases and searches continues to have a positive effect compared to other factors (Lee et al., 2020). This could explain why new cases’ effect on searches continued to increase throughout the pandemic, as there were more cases. Finally, throughout the pandemic, there was an emphasis on “flattening the curve” (Gandhi and Bienen, 2021). This phenomenon was characterized by getting to a very low level of cases or positive tests. Debecker and Modis studied how there was a huge focus on flattening the curve of cases. However, this slowed the process of getting to the total infection rate, leading to higher cases over time (Debecker and Modis, 2021). It is possible that because of this people were constantly thinking about how many cases there were, which led the new cases to have a bigger effect on searches throughout the pandemic.
Although new positive tests remained positive in 2020 and 2021, there was a decreasing effect on Google searches as the years progressed. This could be related to increased testing abilities. Testing was not widely available or required until the end of 2020 (Davis, 2020). People were likely more nervous about testing when they could not easily get tests, but when it was available it became less interesting, which decreased the effect on searches.
New deaths deviated greatly from the hypothesis. In the non-lagged model, new deaths had a positive effect only in 2020, while a negative effect for 2021 and 2022. All new death effects were negative in the lagged model. Death was less likely than testing positive for Covid-19; therefore, it is possible that people were less concerned with how many people were dying compared to how many were testing positive. Many people also were not at risk of dying of Covid-19 and may feel more motivated to search for “covid” if there was a spike in cases (Mayo Clinic, 2022), but felt distanced from the spike in deaths. New deaths results follow the hypothesis that it will have less effect on Google searches later in the pandemic. Similar toStevens, Jung Oh and Taylor’s study, the results show people are becoming more desentized to Covid-19 deaths, as they are showing less public interest. With increases in vaccines and medications, people were 93% less likely to die compared to unvaccinated people (Bove, 2022), making new deaths increasingly less anxiety provoking.
The trends between the non-lagged model and the lagged model were largely the same. However, effects for new cases and new deaths were slightly less in the lagged model. This could show that cases and death estimates are most impactful directly when they are released. Positive new tests had the opposite effect, where the lagged model had higher effects. This could be the result of positive test counts that are less widely available and may not be seen right away.
Finally, states’ total Covid-19 cases and deaths did not correlate with how important the effect of new cases and new deaths were on searches for different states. This fails to prove that states with more cases or deaths find these factors more important in how much they search about Covid-19. More research is necessary to make a conclusion. It is likely that many more factors play a role in why people search for “covid” in different states, outside of new cases and new deaths.
The study has some limitations in investigating the research questions. The biggest limitation was the normalized values of the Google Trends data. Given that the data was not the raw number of searches over time, the variable coefficients from models could not be interpreted literally but only comparatively between variables and models. Therefore, these models could not be analyzed for statistical significance or prediction of searches. Additionally, a limitation of the methodology was that the lmer() function does not provide significance testing, which limited the ability to say which variables were statically significant in their effect on Google searches. Moreover, many factors in public interest were not controlled for within the analysis. Hospitalization could potentially play a large role in how many people search for “covid”. It would be important to look further into Covid-19 data to see if hospitalizations would change the effects in the model. Other factors to control for would be abilities of the state, such as testing, population and Covid-19 policies within the areas. Controlling these would create a more concise understanding of how much new cases, new deaths and new positive tests impact the number of Google searches.
For future research, it would be important to have more data on Covid-19 hospitalization and testing. It would be interesting to see if these factors play a large role in how much people search on Google. Hospitalizations are likely to be significant given that it shows how dangerous Covid-19 is to people. People may research how to avoid hospitalization or reasons for hospitalizations. Testing would be significant because it likely has an interaction with how important certain variables are in different states, as there are different testing capabilities. Another direction for potential research would be to get raw values of searches from Google, in order to create a predictive model of searches based on Covid-19 factors. This could provide insight into what days people are most interested in the pandemic and when important Covid-19 information should be distributed. Finally, the motivation for why people may search for “covid” could vary greatly. It would be significant to investigate differing reasons, such as fear or intrigue, for searches. This could provide significant insight into why people make the decision they do.
This research shows that different Covid-19 information can have varied effects on public interest in the virus. This research built on previous studies to show that new cases, new positive tests and new deaths all have different effects on Google searches, and that these effects change throughout the pandemic but do not seem to be dependent on the state. This provides insight into how public and health officials can best communicate with the public on Covid-19 procedures. Potentially, using new cases to motivate people to get vaccinated is the best way, as it has the largest positive effect on public interest. There are many more factors that cause fluctuation in public interest, especially for something as important as Covid-19. Future research can build on these results in order to provide more insight into how public interest changes along with the pandemic.
Work Cited:
Bansal, Agam, et al. “Utilizing Google Trends to Assess Worldwide Interest in Covid-19 and Myocarditis.” Journal of Medical Systems, Springer US, 11 Jan. 2021, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7797199/.
Bove, Tristan. “Fully Vaccinated People Are 93% Less Likely to Die of COVID Compared to Unvaccinated People.” Fortune, Fortune, 4 Feb. 2022, https://fortune.com/2022/02/04/fully-vaccinated-93-percent-less-likely-covid-death-compared-unvaccinated/. Bragazzi, Nicola Luigi, et al. “How Often People Google for Vaccination: Qualitative and Quantitative Insights from a Systematic Search of the Web-Based Activities Using Google Trends.” Taylor & Francis, 23 June 2016, https://www.tandfonline.com/doi/abs/10.1080/21645515.2017.1264742.
By, New York Times. “Coronavirus in the U.S.: Latest Map and Case Count.” The New York Times, The New York Times, 3 Mar. 2020, https://www.nytimes.com/interactive/2021/us/covid-cases.html.
Centers for Disease Control and Prevention. “United States Covid-19 Cases and Deaths by State over Time.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 13 Mar. 2022, https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36/data.
Davis, Kayla. “Better Late than Never: Covid-19 Testing across the United States.” Science in the News, 27 May 2020, https://sitn.hms.harvard.edu/flash/2020/covid-19-testing/.
Debecker, Alain, and Theodore Modis. “Poorly Known Aspects of Flattening the Curve of COVID-19.” Technological Forecasting and Social Change, Elsevier Inc., Feb. 2021, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7603980/.
Gandhi, Monica, and Leslie Bienen. “Why Covid-19 Case Counts Don’t Mean What They Used To.” Time, Time, 16 Dec. 2021, https://time.com/6129225/omicron-covid-19-case-counts/.
Google. “Google Trends Search for ‘Covid.’” Google Trends, Google, 13 Mar. 2022, https://trends.google.com/trends/.
Husain1, Iltifat, et al. “Fluctuation of Public Interest in COVID-19 in the United States: Retrospective Analysis of Google Trends Search Data.” JMIR Public Health and Surveillance, JMIR Publications Inc., Toronto, Canada, 7 May 2020, https://publichealth.jmir.org/2020/3/e19969/.
Lee1*, Jinhee, et al. “Public Interest in Immunity and the Justification for Intervention in the Early Stages of the COVID-19 Pandemic: Analysis of Google Trends Data.” Journal of Medical Internet Research, JMIR Publications Inc., Toronto, Canada, 8 Dec. 2020, https://www.jmir.org/2021/6/e26368/.
Mach, Katharine J., et al. “News Media Coverage of COVID-19 Public Health and Policy Information.” Nature News, Nature Publishing Group, 28 Sept. 2021, https://www.nature.com/articles/s41599-021-00900-z. Mayo Clinic Staff. “Covid-19: Who’s at Higher Risk of Serious Symptoms?” Mayo Clinic, Mayo Foundation for Medical Education and Research, 1 Mar. 2022, https://www.mayoclinic.org/diseases-conditions/coronavirus/in-depth/coronavirus-who-is-at-risk/art-20483301.
Stevens, Hannah R, et al. “Desensitization to Fear-Inducing COVID-19 Health News on Twitter: Observational Study.” JMIR Infodemiology, JMIR Publications Inc., Toronto, Canada, 31 Dec. 2020, https://infodemiology.jmir.org/2021/1/e26876.
Traczyk, Jakub, et al. “Does Fear Increase Search Effort in More Numerate People? an Experimental Study Investigating Information Acquisition in a Decision from Experience Task.” Frontiers, Frontiers, 3 Aug. 2018, https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01203/full.
VARIAN, HAL, and HYUNYOUNG CHOI. “Predicting the Present with Google Trends .” Wiley Online Library, 27 June 2012, https://onlinelibrary.wiley.com/doi/full/10.1111/j.1475-4932.2012.00809.x.