Table of Contents

ICD9, the International Statistical Classification of Diseases, is one of the common coding systems often used in medical databases for indicating patient conditions. A handy feature of it is that it’s a hierarchical system. Just as the Dewey Decimal System lets someone quickly find a book on geometry (library aisle 500 for “Natural sciences and mathematics”, bookcase 510 for “Mathematics”, and shelf 516 for “Geometry”), the ICD9 allows researchers to drill down into disease categories, or up from a patient’s specific diagnosis to a more general condition.

Introduction

I’ll be using the ICD9 taxonomy in later notebooks as a convenient way to reduce the number of features for a classifier. The problem I’ll be working on is trying to predict whether a patient has notes with a particular label (e.g. “substance abuse”) from their ICD9 codes. The difficulty is that I don’t have very many labeled notes and there are thousands of ICD9 codes. Classification based on such a dataset is likely to overfit - the machine learning algorithms can essentially memorize the training dataset to achieve good performance, but the algorithm won’t generalize. By using parent conditions (e.g. “Intestinal infectious diseases” rather than “Salmonella gastroenteritis”) we can quickly group together patients who have meaningfully similar conditions.

Happily, I found a very convenient Python library for navigating this hierarchy, located here. The notebook below walks through a few simple operations with it, and in a later post I’ll show how I combined it with scikit-learn to help select medical notes for further annotation.

Notebook

In [1]:
import sys

sys.path.append('icd9')
from icd9 import ICD9
In [2]:
# feel free to replace with your path to the json file
tree = ICD9('icd9/codes.json')

# list of top level codes (e.g., '001-139', ...)
toplevelnodes = tree.children
toplevelcodes = [node.code for node in toplevelnodes]
print('\t'.join(toplevelcodes))
001-139	140-239	240-279	290-319	320-389	390-459	460-519	520-579	580-629	630-679	680-709	710-739	760-779	780-789	790-796	797	798	799	800-999	V01-V06	V07-V09	V10-V19	V20-V29	V30-V39	V40-V49	V50-V59	V60-V69	V70-V82	V83-V84	V85	V86	V87	V88	V89	E979	E849	E800-E807	E810-E819	E820-E825	E826-E829	E830-E838	E840-E845	E846-E848	E850-E858	E860-E869	E870-E876	E878-E879	E880-E888	E890-E899	E900-E909	E910-E915	E916-E928	E929	E930-E949	E959	E956	E954	E950	E951	E952	E953	E955	E957	E958	E960-E969	E970-E978	E980-E989	E990-E999
In [3]:
node = tree.find('003')
In [4]:
node.description
Out[4]:
'Other salmonella infections'
In [5]:
node.codes
Out[5]:
['003.9',
 '003.29',
 '003.24',
 '003.22',
 '003.20',
 '003.23',
 '003.0',
 '003.8',
 '003.21',
 '003.1']
In [6]:
code = tree.find('003.0')
In [7]:
code.description
Out[7]:
'Salmonella gastroenteritis'
In [8]:
code
Out[8]:
<icd9.Node at 0x108a2d1d0>
In [9]:
node.leaves[2].description
Out[9]:
'Salmonella osteomyelitis'
In [10]:
def print_tree(node):
    if node is not None:    
        print('Parents:')
        for c in node.parents:
            print('- {}: {}'.format(c.code, c.description))    

        print('\n-> {}: {}\n'.format(node.code, node.description))

        print('Children:')
        for c in node.children:
            print('- {}: {}'.format(c.code, c.description))
In [11]:
print_tree(node)
Parents:
- ROOT: ROOT
- 001-139: INFECTIOUS AND PARASITIC DISEASES 
- 001-009: INTESTINAL INFECTIOUS DISEASES 
- 003: Other salmonella infections

-> 003: Other salmonella infections

Children:
- 003.9: Salmonella infection, unspecified
- 003.8: Other specified salmonella infections
- 003.1: Salmonella septicemia
- 003.0: Salmonella gastroenteritis
- 003.2: Localized salmonella infections
In [12]:
print_tree(tree.find('003.0'))
Parents:
- ROOT: ROOT
- 001-139: INFECTIOUS AND PARASITIC DISEASES 
- 001-009: INTESTINAL INFECTIOUS DISEASES 
- 003: Other salmonella infections
- 003.0: Salmonella gastroenteritis

-> 003.0: Salmonella gastroenteritis

Children:
In [13]:
print_tree(tree.find('004.8'))
Parents:
- ROOT: ROOT
- 001-139: INFECTIOUS AND PARASITIC DISEASES 
- 001-009: INTESTINAL INFECTIOUS DISEASES 
- 004: Shigellosis
- 004.8: Other specified shigella infections

-> 004.8: Other specified shigella infections

Children:

Comments

comments powered by Disqus