4.6 Article

Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time

Journal

PLOS COMPUTATIONAL BIOLOGY
Volume 10, Issue 4, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1003581

Keywords

-

Funding

  1. National Institutes of Health
  2. National Library of Medicine [1R01LM010812-03]

Ask authors/readers for more resources

Circulating levels of both seasonal and pandemic influenza require constant surveillance to ensure the health and safety of the population. While up-to-date information is critical, traditional surveillance systems can have data availability lags of up to two weeks. We introduce a novel method of estimating, in near-real time, the level of influenza-like illness (ILI) in the United States (US) by monitoring the rate of particular Wikipedia article views on a daily basis. We calculated the number of times certain influenza- or health-related Wikipedia articles were accessed each day between December 2007 and August 2013 and compared these data to official ILI activity levels provided by the Centers for Disease Control and Prevention (CDC). We developed a Poisson model that accurately estimates the level of ILI activity in the American population, up to two weeks ahead of the CDC, with an absolute average difference between the two estimates of just 0.27% over 294 weeks of data. Wikipedia-derived ILI models performed well through both abnormally high media coverage events (such as during the 2009 H1N1 pandemic) as well as unusually severe influenza seasons (such as the 2012-2013 influenza season). Wikipedia usage accurately estimated the week of peak ILI activity 17% more often than Google Flu Trends data and was often more accurate in its measure of ILI intensity. With further study, this method could potentially be implemented for continuous monitoring of ILI activity in the US and to provide support for traditional influenza surveillance tools. Author Summary Although influenza is largely avoidable through vaccination, between 3,000-50,000 deaths occur in the United States each year that are attributed to this disease. The Centers for Disease Control and Prevention continuously monitor the amount of influenza that is present in the American population and compiles this information in weekly reports. However, because it can take a long time to collect and analyze all of this information, the data that is being reported each week is typically between 1-2 weeks old at the time of publishing. For this reason, we are interested in developing new techniques to determine the amount of influenza in the population that are accurate, can return results in real-time, and can be used to supplement traditional monitoring. We have created a method of estimating the amount of influenza-like illness in the American population, at any time of year, by analyzing the amount of Internet traffic seen on certain influenza-related Wikipedia articles. This method is able to accurately estimate the percentage of Americans with influenza-like illness, in real-time, and is robust to influenza seasons that are more severe than normal and to events that promote much media attention, such as the H1N1 pandemic in 2009.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available