"When you hear hoofbeats, think horses not zebras" - long a favorite expression - means that doctors are well advised to initially see common symptoms as evidence of common maladies, which is all well and good until the patient happens to be suffering from a rare disease.
It is for the latter circumstance that researchers at the Technical University of Denmark are developing a specialized search engine called FindZebra. Headed by computer scientist Radu Dragusin, they contend that early results show that FindZebra - built upon Indri, open-source search technology from The Lemur Project -- is significantly better at helping diagnosticians than are current tools, including Google.
(Coincidentally, Dragusin was in the news last fall for discovering a security breech on the IEEE's website that exposed some 100,000 passwords belonging to users from the likes of Apple, Google, IBM, Samsung and NASA.)
FindZebra is featured today in a post on MIT Technology Review, which notes: "The site comes with the forlorn message: 'Warning! FindZebra is a research project and it is to be used only by medical professionals.' FindZebra could obviously be a hypochondriac's charter. On the other hand, that's true of any medical dictionary."
What FindZebra does, according to the search site's FAQ page, is index 31,114 articles from a variety of freely available resources, including: Orphanet: an online rare disease and orphan drug data base; The National Organization for Rare Disorders (NORD); The Genetic and Rare Diseases Information Center (GARD); Swedish Information Centre for Rare Diseases; m-Power Rare Disease Database; Health On the Net Foundation; and Wikipedia.
The FindZebra project website, which includes methodology details, explains the basics this way: "This project addresses the task of searching for relevant rare diseases given a query of patient data. The patient data is given as free text, which means that the queries do not have to use a controlled vocabulary or specific query language restrictions as in conventional diagnostic assistance systems. The patient data submitted as a query to the information retrieval (IR) system could consist of patient age, gender, demographic information, symptoms, evidence of diseases, test results, previous diagnoses, and other information that a clinician might find relevant in the differential diagnosis."
As for results, here's the researcher's summary for an article in Medical Informatics: "FindZebra outperforms Google Search in both default set-up and customized to the resources used by FindZebra. We extend FindZebra with specialized functionalities exploiting medical ontological information and UMLS medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular standard web search."
Compromising the sleep of hypochondriacs may be another matter.