Written by: Christina Gagliardo, MD, FAAP

Global Public Health Intelligence Network (GPHIN)
Google Flu Trends

These are a few web-based systems which have contributed to technological improvements in surveillance capacity for significant public health events such as infectious disease outbreaks. These entities collect information and perform syndrome surveillance on specific diseases drawing from sources such as web-based queries, systems such as community-health based reporting with mobile phones, social media, and local news. When used in combination, detection of diseases and outbreaks is more timely and sometimes more sensitive than traditional surveillance systems (1-4). Examples include the Ebola outbreak starting in 2014 and several polio outbreaks in 2013 and 2014 where digital reports generated through these informal surveillance channels often preceded official reports (such as by WHO). For polio, digital surveillance reports were available 14.6 days (range 0-40 days) earlier than official WHO reports (4).

Another online surveillance tool, Google Flu Trends (GFT), was launched in 2008 and was intended to supplement traditional surveillance systems such as that done by CDC to monitor influenza. GFT analyzed Google search queries of influenza-like illness and symptoms and were able to improve early detection of influenza outbreaks one to two weeks ahead of CDC reports. GFT worked well to predict influenza outbreaks in the 2007-2008 flu season as well as during the 2009 H1N1 outbreak (5). In later years, however, it was noted that GFT grossly overestimated flu prevalence and predicted double the amount of doctor's visits than CDC surveillance (6-8). A proposed solution was to combine CDC data with GFT-derived information that was "recalibrated" to be more reliable, with the thought that when used in combination, performance would be better than GFT or CDC alone. This is in line with the original proposed intention of GFT: to be used as a "'complementary signal', rather than a stand-alone forecasting tool" (9).

The exponential increase in the use of social media, such as Twitter, provides another potential opportunity to perform public health surveillance and create predictive models. Once group successfully used Twitter data to show that CDC influenza-like illness rates correlated to the rate of Tweets of influenza infection, and predicted influenza two to four weeks sooner(10). The New York City Department of Health and Mental Hygiene validated their model and showed a strong parallel to local influenza activity as well (11).

A leading researcher in the field of digital epidemiology is Marcel Salathé, PhD. His study showed a correlation between CDC-estimated vaccination rates by region and geo-spatial vaccine sentiments as expressed on Twitter (12). When simulating an infectious disease outbreak, he showed that grouped negative sentiments about vaccines lead to clusters of unprotected people, which increased the likelihood of an outbreak in that area.

Analyzing big data generated through social media to perform "digital epidemiology" and surveillance continues to be enhanced and developed. Eventually it may be a reliable tool which we can easily access to perform infectious disease surveillance on a large scale and more relevant to our everyday practice, on the local level. It is an emerging field with huge amounts of untapped utility and the potential for collaboration between science, medicine, public health, and technology.


  1. Hulth A, Gustaf R, Annika L. Web queries as a source for syndromic surveillance. PLoS One 2009; 4: e4378.
  2. Freifeld CC, Chunara R, Mekaru SR, et al. Participatory epidemiology: use of mobile phones for community-based health reporting. PLoS Med 2010; 7: e1000376.
  3. Barboza P, Vaillant L, Le Strat Y, et al. Factors influencing performance of internet-based biosurveillance systems used in epidemic intelligence for early detection of infectious diseases outbreaks. PLoS One 2014;9: e90536.
  4. Anema A, Kluberg S, Wilson K, Hogg R, Khan K, Hay S, Tatem A, Brownstein J. Digital surveillance for enhanced detection and response to outbreaks. www.thelancet.com/infection Vol 14 November 2014
  5. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, et al. Detecting influenza epidemics using search engine query data. Nature. 2009 Feb 19;457(7232):1012-4. doi: 10.1038/nature07634.
  6. Lazer, David; Kennedy, Ryan; King, Gary; Vespignani, Alessandro (14 March 2014). "The Parable of Google Flu: Traps in Big Data Analysis". Science 343 (6176): 1203–1205. doi:10.1126/science.1248506.
  7. Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. "Google Flu Trends Still Appears Sick: an Evaluation of the 2013‐2014 Flu Season". Copy at http://j.mp/1m6JBX6 
  8. Butler, Declan (13 February 2013). "When Google got flu wrong". Nature 494: 155–156. doi:10.1038/494155a.
  9. Lohr S. Google Flu Trends: The Limits of Big Data. 3/28/14. NY Times. http://bits.blogs.nytimes.com/2014/03/28/google-flu-trends-the-limits-of-big-data 
  10. Paul MJ, Dredze M, Broniatowski D. Twitter Improves Influenza Forecasting. PLOS Currents Outbreaks. 2014 Oct 28. Edition 1. doi: 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117
  11. Sills J. Twitter: Big Data opportunities. Science, Letters. http://www.cs.jhu.edu/~mpaul/files/science_letter_twitter.pdf 
  12. Salathe´ M, Khandelwal S (2011) Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control. PLoS Comput Biol 7(10): e1002199. doi:10.1371/journal.pcbi.1002199
  13. Costello, V. Researchers Changing the Way We Respond to Epidemics with Wikipedia and Twitter. PLoS Blogs. Jan 2015. http://blogs.plos.org/plos/2015/01/researchers-changing-way-respond-epidemics-wikipedia-twitt/