Could Behavioral Medicine Lead the Web Data Revolution? FREE ONLINE FIRST
John W. Ayers, PhD, MA1; Benjamin M. Althouse, PhD, ScM2; Mark Dredze, PhD3
JAMA. Published online February 27, 2014. doi:10.1001/jama.2014.1505
Digital footprints left on search engines, social media, and social networking sites can be aggregated and analyzed as health proxies, yielding anonymous and instantaneous insights. At present, nearly all the existing work has focused on acute diseases. This means the value added from web surveillance is reduced because the effectiveness of even high-profile systems such as Google Flu Trends are inferior to already strong traditional surveillance.1 Conversely, the future of web surveillance is promising in an area where traditional surveillance is largely incomplete: behavioral medicine, a multidisciplinary field incorporating medicine, social science, and public health and focusing on health behaviors and mental health.
The proportion of illness (or death) attributable to health behaviors or psychological well-being has steadily increased over the last half century, while surveillance of these outcomes has remained largely unchanged. Investigators simply ask people about their health on surveys. However, surveys have well-known limitations, such as respondents’ reluctance to participate, social desirability biases, difficulty in accurately reporting behaviors, long lags between data collection and availability, and provisions (sometimes legal) curtailing the inclusion of politically sensitive topics like gun violence. Most importantly, the expense of surveys means many topics are either not covered or are covered restrictively (eg, clinical depression screeners are included in the Behavioral Risk Factor Surveillance System just every other year). Given the current budget climate, survey capacity will likely worsen before it improves. To overcome these limits, behavioral medicine should now embrace web data.
First, behavioral medicine requires observing behavior or the manifestation of mental health problems. Doing so online is easier, more comprehensive, and more effective than with surveys because many outcomes are passively exhibited there. For example, one study showed how precise health concerns changed during the US recession of December 2008 through 2011 by systematically selecting Google search queries and using the content of each query to describe the concern and the change in volume to describe the prevalence of each concern. Stomach ulcer symptoms, for example, were 228% higher (95% CI, 35%-363%) than expected during the recession, with queries thematically related to arrhythmia, congestion, and pain (including many foci like head, tooth, and back) also elevated.2 This approach highlights how web data can reveal largely assumption-free insights via systematic data generation of hundreds of possible outcomes rather than arbitrary a priori selection of a few outcomes by investigators.
Second, web data reflect more than the individual because social context can also be captured online. Online networks can reveal how mechanistic drivers such as social norms spread and influence population health. For example, social patterns in obesity promotion and suppression have been described by pooling Facebook posts that encourage television watching or going outdoors, which was associated with variability in neighborhood obesity rates.3 Moreover, social support concepts are often expressed in web data, such as observing specific instances of caregiving and confidence on Twitter. As a result, behavioral medicine can move away from understanding aggregation based purely on location and toward understanding health in the context of our human interconnectedness.
Third, web data are potentially the only source for real-time insights into behavioral medicine. Web data can be available almost immediately compared with a 365-day lag time between annual surveys. By harnessing these data around social events or interventions, programs can be evaluated as they are implemented, hypothetically generating real-time feedback to maximize their effectiveness. Web data used in this way can also hold promise for guiding investigator resources. In 2011, when tobacco journals were debating snus (a smokeless tobacco product) and funders were soliciting proposals to understand the snus pandemic, electronic cigarettes already attracted more searches on Google than any other smoking alternative, snus included.4 In this same way, web data can guide traditional surveillance like vetting the inclusion of questions on surveys using online proxies.
Fourth, given that all hypotheses are based on some data, web data can be an important source for identifying new hypotheses. Many hypotheses in behavioral medicine can be traced directly to data availability and can appear ad hoc to lay audiences. Many studies have explored birthdate seasonality in mental health problems. Why? Birthdates are routinely found in traditional surveillance, while some mental health problems are too rare to assess seasonality in the incidence or increased severity of these problems. As a result, investigators focus on peculiar questions, while obvious questions have never been explored—until now. Is schizophrenia seasonal? Online interest in schizophrenia and its symptoms, as well as 8 other outcomes, closely followed the timing of the seasons peaking in the winter.5 What is the healthiest day? Online interest in quitting smoking across the globe is highest on Monday.6 Behavioral medicine needs to escape the confines of limited data to more fully specify the next frontier of research questions, and going online is one such escape.
Fifth, it is beyond present scientific limits for a hypothetical arm to reach out of the screen to inoculate against infection. In behavioral medicine, however, substantial resources have been used to develop online interventions that treat or prevent illness with effectiveness equivalent to their offline counterparts. For example, as early as the mid-1990s, investigators implemented online programs to promote behavioral health. A meta-analysis found these programs were associated with a relative increase of 44% in smoking cessation,7 yet a research agenda for harnessing the surveillance potential of the web has not been articulated. Improving the online surveillance capacity means online interventions can be better disseminated via online screening or linking potential users to existing online treatments (ie, what advertisements for an online program are most effective).
Sixth, some of the most effective interventions in behavioral medicine involve changes in public policy. Web data can identify alerts for policy changes and pathways for health advocacy. For instance, by archiving online media, places considering policy changes can be identified and this information can then be passed on to advocacy groups. Case in point, Brazilian President Lula’s laryngeal cancer prompted broad changes in media coverage of tobacco control and soon thereafter, Brazil became the largest smoke-free nation to date.8 By prospectively analyzing news media content, it will be possible for advocacy resources to be spent more cost-effectively during opportunistic times, including events like Lula’s diagnosis.
A major criticism is that web data have sampling biases. However, such biases are increasingly eroding at the population level as more people go online. In addition, several studies have demonstrated that valid trends reflecting the entire population, and even subsets of the population, can be extracted from online data.9 For example, computer science has already developed approaches for identifying the sex, ethnicity, or education associated with a Twitter account using the content of a user’s Tweets. Going forward, the public health research community may mimic these studies and validate methods for obtaining high-quality, actionable information in behavioral medicine, then further realizing the comparative value of web data to traditional data.
Billions of digital footprints from nearly all parts of the United States and from countries around the world provide a powerful opportunity to expand the evidence base across medicine. However, for the reasons mentioned previously and more related reasons yet to be expressed, behavioral medicine potentially has the most to gain from web data and could be essential to the broader web data revolution.