阅读理解 In 2009 a new flu virus was discovered. Combining elements of the viruses that cause bird flu and swine flu, this new virus, named H1N1, spread quickly. Within weeks, public health agencies around the world feared a terrible pandemic (流行病) was under way. Some commentators warned of an outbreak on the scale of the 1918 Spanish flu. Worse, no vaccine(疫苗) was readily available. The only hope public health authorities had was to slow its spread. But to do that, they needed to know where it already was.
In the United States, the Centers for Disease Control and Prevention (CDC) required that doctors inform them of new flu cases. Yet the picture of the pandemic that showed up was always a week or two out of date. People might feel sick for days but wait before consulting a doctor. Relaying the information back to the central organizations took time, and the CDC only figured out the numbers once a week. With a rapidly spreading disease, a two-week lag is an eternity. This delay completely blinded public health agencies at the most urgent moments.
Few weeks before the H1N1 virus made headlines, engineers at the Internet giant Google published a paper in Nature. It got experts' attention but was overlooked. The authors explained how Google could "predict" the spread of the winter flu, not just nationally, but down to specific regions and even states. Since Google receives more than three billion search queries every day and saves them all, it had plenty of data to work with.
Google took the 50 million most common search terms that Americans type and compared the list with CDC data on the spread of seasonal flu between 2003 and 2008. The idea was to identify areas affected by the flu virus by what people searched for on the Internet. Others had tried to do this with Internet search terms, but no one else had as much data-processing power, as Google.
While the Googles guessed that the searches might be aimed at getting flu information—typing phrases like "medicine for cough and fever"—that wasn't the point: they didn't know, and they designed a system that didn't care. All their system did was look for correlations(相关性) between the frequency of certain search queries and the spread of the flu over time and space. In total, they processed 450 million different mathematical models in order to test the search terms, comparing their predictions against actual flu cases from the CDC in 2007 and 2008. And their software found a combination of 45 search terms that had a strong correlation between their prediction and the official figures nationwide. Like the CDC, they could tell where the flu had spread, but unlike the CDC they could tell it in near real time, not a week or two after the fact.
Thus, when the H1N1 crisis struck in 2009, Google's system proved to be a more useful and timely indicator than government statistics with their natural reporting lags. Public health officials were armed with valuable information.
Strikingly, Google's method is built on "big data"—the ability of society to handle information in new ways to produce useful insights or goods and services of significant value. However, ▲ . For example, in 2012 it identified a sudden rise in flu cases, but overstated the amount, perhaps because of too much media attention about the flu. Yet what is clear is that the next time a pandemic comes around, the world will have a better tool to predict and thus prevent its spread.