A list friend recently noted that the latest International Statistical Classification of Diseases and Related Health Problems (otherwise called "ICD codes”) is very interesting not to mention that it’s also very amusing. The ICD system, now in its 10th version, was designed and is administered by the World Health Organization and is described as:
… the standard diagnostic tool for epidemiology, health management and clinical purposes. This includes the analysis of the general health situation of population groups. It is used to monitor the incidence and prevalence of diseases and other health problems, proving a picture of the general health situation of countries and populations.
What’s so interesting about the ICD-10 is how remarkably specific it is in what it refers to. For example, there are codes for injuries related to walking into things:
W22.02XA, “walked into lamppost, initial encounter"
The coding W22 is for "striking against or struck by other objects” but not for “striking against or struck by object with subsequent fall” (which would be W18.09). The following “.0” indicates “striking against stationary object” while the subsequent “2” indicates a lamppost. If, instead of “2” it was a “3”, the thing struck would be furniture while a “1” would denote a wall (unless, of course, it was a swimming pool wall in which case it would be a “4”).
The “XA” suffix indicates an initial encounter (here “encounter” means “interview with medical professional”), while “XD” would indicate a subsequent encounter. The suffix “XS” would indicate “sequela” which means “a pathological condition resulting from a prior disease, injury, or attack” implying that the lamppost collision caused some kind of ongoing problem (to the patient, not the lamppost).
You think that’s detailed? How about:
W59.21 Bitten by turtle
W59.22 Struck by turtle
W59.29 Other contact with turtle
The thing I can’t fathom here is how one gets “struck by turtle” … turtles are not generally prone to rising up and slapping people nor are they often (if ever) found sailing through the air so how this classification ever arose (ha!) has to be a mystery.
But wait! It gets better. How about:
V91.07 Burn due to water-skis on fire
Really? I mean how often in the history of mankind has anyone other than perhaps Evel Knievel been burnt by their waterskis bursting into flame?
While I’m sure that this incredibly detailed data is considered crucial by someone, somewhere, it raises interesting and relevant questions about the use of such intelligence in Big Data and analytics.
To start with, the accuracy of data collected from real world sources by humans is subject to interpretation and misinterpretation and the greater the specificity of coding the more likely it is that an event might be miscoded.
Secondly, it illustrates the a priori interest bias that is often found in coding systems. The fact that turtles are specifically referenced in the ICD-10 while, for example, okapis aren’t shows that someone in the coding committee had turtle issues and managed to convince the other committee members that her particular hobby horse was crucially important to world health (struck by a hobby horse would probably be coded as the rather more generic “W20.8 Other cause of strike by thrown, projected or falling object” unless, of course, she ran into it when it would be “W22.8 Striking against or struck by other objects").
Thirdly, it shows regionality bias because I’ll bet there aren’t a hell of lot of burning waterski accidents in, say, Iraq while “burn due to camel on fire” is most likely an everyday occurrence (I actually can’t figure out how to code this as it falls between the groups “W20-W49 Exposure to inanimate mechanical forces” and “W50-W64 Exposure to animate mechanical forces”).
With these kinds of issues in a coding system where entities are so minutely specified detail can gain a prominence that isn’t necessarily useful while, at the same time, making the analysis task more complex. Sure, exactly what animal bit you matters but to have a code for turtles but not for okapis when a general qualifying text field would serve the same purpose as well as provide more relevant data would be a better strategy.
Perhaps the medical world moves in far more mysterious ways than I understand but this would seem to illustrate that the maxim “less is more” applies even when you’re dealing with Big Data.
Now, excuse me, I must put a bandage on my okapi bite (covered by the more general “W55.81 Bitten by other mammals”) and extinguish my wakeboard which was what got the okapi upset in the first place. Luckily I didn’t get burned by the wakeboard (which would be “V91.89 Other injury due to other accident to unspecified watercraft”).