Eric Horvitz,
a distinguished scientist and co-director at Microsoft Research, and Kira
Radinsky, a PhD researcher at the Technion-Israel Institute, claim to have
developed software which can predict upcoming events.
This archetype
uses a combination of archival material from the New York Times and data
from several websites, including Wikipedia. During its setup phase, the system has
used 22 years of New York Times archives,dating from 1986 to 2007.
“One source
we found useful was DBpedia, which is a structured form of the information
inside Wikipedia constructed using crowd sourcing,” Radinsky told told MIT
Technology Review. “We can understand, or see, the location of the places in
the news articles, how much money people earn there, and even information about
politics.” Other sources included WordNet, which helps software understand the
meaning of words, and OpenCyc, a database of common knowledge.
The system
could someday enable aid organizations to be more proactive in tackling disease
outbreaks, Horvitz said. “I truly view this as a foreshadowing of what’s to
come,” he added. “Eventually, this kind of work will start to have an influence
on how things go for people.”
The system
provides some amazing results, apparently, when it is tested on historical
data. Reports of droughts in Angola in 2006 triggered a warning about possible
cholera outbreaks in the country, because previous events had taught the system
that cholera outbreaks were more likely in years following droughts.
A second
warning about cholera in Angola was triggered by news reports of large storms
in Africa in early 2007—and, less than a week later, reports appeared that
cholera had begun to spread. In similar tests involving forecasts of disease,
violence, and high numbers of deaths, the system’s warnings were correct
between 70 and 90 percent of the time.
According to
Horvitz, the system is good enough to expect a more exact version that could be
used in real settings, to assist experts at aid agencies involved in planning
humanitarian response and readiness. “We’ve done some reaching out and plan to
do some follow-up work with such people,” says Horvitz.
Horvitz and
Radinsky are not the first to consider using online news and other data to
forecast future events, but they say they make use of more data sources—more
than 90 in total—which allows their system to be more general-purpose.
Microsoft
doesn’t have plans to commercialize Horvitz and Radinsky’s research as yet, but
the project will continue, says Horvitz, who wants to mine more newspaper
archives as well as digitized books.
“Eventually
this kind of work will start to have an influence on how things go for people,”
Horvitz said.