Web Data Mining – A Tool for Real-time Detection of Disease Outbreaks

Epidemics and infectious diseases pose a continuous threat to global health. The latest research suggests that data mining Google can help detect and track contagious diseases and epidemic outbreaks.

A recent report refers to a study published in Lancet Infectious Diseases which found that digital surveillance can detect infectious diseases such dengue fever measles, malaria, and influenza up to two weeks earlier than traditional surveillance methods.
Traditional surveillance practices constitute reporting by physicians, veterinarians, infection control practitioners, laboratory personnel, and medical examiners followed by epidemiological and laboratory investigation. This method involves two stages:

– the patient recognizing the symptoms and seeking treatment before diagnosis
– the health authorities alerting the authorities about these developments through their health networks

This is an expensive and time consuming method and results in a lag in detecting an emerging infectious disease. Digital surveillance, on the other hand, works faster and is more efficient and cost-effective. According to the recent study, detecting the 2005-06 “bird flu” outbreak would have been possible at least two weeks earlier if the authorities had relied on search engine algorithms such as Google Trends and Google Insights rather than the traditional technique.

Real-time disease surveillance

Last year, a report published in ScienceDaily said that innovative new techniques with the help of news websites, blogs, and social media are being developed to track the spread of infectious diseases. Real-time detection of epidemics is done by mining data shared by people regarding their experience and opinions on health-related topics such as personal health issues, symptoms, treatments, and side-effects. The latest study found social media and micoblogs such as Twitter and Facebook could prove effective in identifying disease outbreaks.

Another instance where digital data mining proved effective was in the case of detecting severe acute respiratory syndrome (SARS), a serious form of pneumonia identified in 2003 that infected thousands of people around the world. The outbreak of SARS was detected by a digital data collection network more than two months before it was reported by the World Health Organization (WHO).
Early detection using a digital disease surveillance system is extremely useful:

* It helps reduce and prevent the spread of epidemics
* Public health authorities get alerted on the situation and adopt adequate risk management strategies including provision of medication
* Patients and doctors can make more informed decisions

Web data mining to build a real-time infectious disease predictor

According to the researchers, it is possible to construct a real-time infectious disease predictor by blending approaches such as aggregator websites, social media, and search engines along with other factors such as climate and temperature.

Today advanced data mining software is available to comb data and extract useful information from it. Once the data is captured, it is subjected to data cleansing. Cleansed data is tagged and labeled in data sets to glean valuable information. Detecting, monitoring and controlling infectious diseases is a worldwide concern and future research should focus on finding ways to use the new Internet-based disease surveillance systems on a global scale.