We saw in the previous post that there is promise
to using ICD9 codes for pre-classifying encounters more likely to have our concepts of interest. In
this post we’ll walk through building simple logistic regression classifiers based on a training
data set, and will evaluate their performance on a test data set.
As described in the previous post, our goal here is
to build a classifier based on anything except free text data to select encounters more likely
to have notes containing concepts of interest (e.g. ‘substance abuse’). The reason for this is that
we want to build up our dataset and pre-select notes more likely to have our concepts, which are
normally of low prevalence in the overall dataset. We will later use these notes to train NLP
classifiers to detect the presence of concepts in individual notes based on the text of the note.
In the previous post we found that many of the concepts have patterns of ICD9 codes that are more
likely than normal to appear when the concept is present in a patient’s notes. This is promising
for us in building a classifier as we know that there is information in the ICD9 codes relating
to the class labels. It also tells us that we can build a classifier based on purely linear
interactions (e.g. a + b) and likely will not need to include cross terms (e.g. a*b) to get decent performance.
In this notebook I’ll be using scikit-learn‘s implementation
of logistic regression, and specifically their
LogisticRegressionCV
module, which implements a cross-validation loop to choose hyper-parameters for L2 regularization,
which will be used to whittle down our feature set from 181 features to a more reasonable number
of only relevant features.
One go-to option I could have used is
adaboost. Adaboost
can suffer from the problem that it tends to over-weight mislabeled datapoints - each “decision stump”
is trained in serial with previous wrongly labeled data points being given extra weight. In this problem I
don’t want that as it’s expected that ICD9 codes won’t give good labels that match the notes - this is especially
true because the ICD9 codes cover a patient’s entire multi-day ICU stay, which may have 20 notes, some of which
have our concepts and others don’t. In those cases, every note will have the ICD9 codes assigned
but only some notes may have the concepts.
Another option was random forest,
but that also has its problems. In particular, I wanted to be able to look more at feature importance -
RF has a method of inspecting that, but it’s not as straight-forward as for logistic regression. That said,
RF is great if I have many features that represent continuous measurements, as those can be problematic
in LR as you’d have to normalize everything to comparable ranges for the weights to be comparable.
The overall process followed here is:
-
Randomly assign every note to either test or training set by assigning a random number, then
comparing this random number to a threshold (0.3) to create a 70% - 30% training/test data split.
-
For each category, perform the following
-
Create a logistic regression classifier using LogisticRegressionCV
-
Extract the feature weights and print the most important features
-
Use the classifier to predict labels for our test dataset
-
Find the 50% sensitivity point, corresponding to the threshold at which a point has a 50-50
chance of being a true positive or a false negative. We’ll use this threshold for labeling
our points later.
-
Evaluate performance by calculating and plotting the ROC curve
and confusion matrix.
Overall, very promising results! We can see that we get AUC performances between 0.75 and 0.80,
which will definitely improve our selection of notes for annotation above chance, and improve our
final data set.
Looking in detail we see that the intuition we got from looking at the ICD9 code odds ratios
was confirmed by the logistic regression feature weights. A few examples below:
-
Advanced.Heart.Disease
- (code, 420-429) OTHER FORMS OF HEART DISEASE
- (code, 410-414) ISCHEMIC HEART DISEASE
- (code, 393-398) CHRONIC RHEUMATIC HEART DISEASE
- (code, 785) Symptoms involving cardiovascular system
-
Advanced.Lung.Disease
- (code, 510-519) OTHER DISEASES OF RESPIRATORY SYSTEM
- (code, V46) Other dependence on machines and devices
- (code, 460-466) ACUTE RESPIRATORY INFECTIONS
- (code, 490-496) CHRONIC OBSTRUCTIVE PULMONARY DISEASE AND ALLIED CONDITIONS
-
Alcohol.Abuse
- (code, 570-579) OTHER DISEASES OF DIGESTIVE SYSTEM
- (code, 290-299) PSYCHOSES
- (code, V60) Housing, household, and economic circumstances
- (code, 070-079) OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE
- (code, V08) Asymptomatic human immunodeficiency virus [HIV] infection status
-
Obesity
- (code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS
- (code, 700-709) OTHER DISEASES OF SKIN AND SUBCUTANEOUS TISSUE
- (code, 510-519) OTHER DISEASES OF RESPIRATORY SYSTEM
- (code, 415-417) DISEASES OF PULMONARY CIRCULATION
- (code, 327) ORGANIC SLEEP DISORDERS
There are some oddities, e.g. that Lung Disease’s top-weighted code was “OSTEOPATHIES,
CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES” - possibly a medication for these is
related to lung disease, or it could be spurious, would require further investigation.
Also note that there is no way to assign causation here. For example looking at Obesity, obesity
can lead to various diseases and sleep disorders, but people with sleep disorders and disease
can tend to exercise less or have low metabolism that contributes to weight gain. Similarly,
alcohol abuse and addiction can lead to personal choices that contribute to poor economic
circumstances, or people in poor economic circumstances can be predisposed to become alcohol abusers.
Full notebook available here
Out[133]:
|
|
category |
Advanced.Cancer |
Advanced.Heart.Disease |
Advanced.Lung.Disease |
Alcohol.Abuse |
Chronic.Neurological.Dystrophies |
Chronic.Pain.Fibromyalgia |
Dementia |
Depression |
Developmental.Delay.Retardation |
Non.Adherence |
None |
Obesity |
Other.Substance.Abuse |
Schizophrenia.and.other.Psychiatric.Disorders |
Unsure |
(code, 001-009) |
(code, 030-041) |
(code, 042) |
(code, 047) |
(code, 050-059) |
(code, 062) |
(code, 070-079) |
(code, 110-118) |
(code, 120-129) |
… |
(code, V26) |
(code, V42) |
(code, V43) |
(code, V44) |
(code, V45) |
(code, V46) |
(code, V49) |
(code, V50) |
(code, V53) |
(code, V54) |
(code, V55) |
(code, V58) |
(code, V59) |
(code, V60) |
(code, V62) |
(code, V63) |
(code, V64) |
(code, V65) |
(code, V66) |
(code, V69) |
(code, V70) |
(code, V85) |
(code, V87) |
(code, V88) |
random |
subject_id |
md5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
27572b36bd4c26c322f50cf65d095d16 |
Nursing/Other |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
… |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.432180 |
109 |
27d1f5907fa14b6702837a845f84c54e |
Nursing/Other |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
… |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.670607 |
3e0fff775cfb678fdfa06ece68ebfab5 |
Nursing/Other |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
… |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.620208 |
8efc0a2ff698b75ce183e3183c1bf204 |
Nursing/Other |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
… |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.027760 |
f5f69772c32f1b0ac05b7cf408f7a6db |
Discharge |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
… |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.595216 |
5 rows × 198 columns
|
descr |
weight |
icd9 |
|
|
(code, 190-199) |
MALIGNANT NEOPLASM OF OTHER AND UNSPECIFIED SITES |
2.526317 |
(code, V10) |
Personal history of malignant neoplasm |
1.244528 |
(code, 160-165) |
MALIGNANT NEOPLASM OF RESPIRATORY AND INTRATHORACIC ORGANS |
0.783067 |
(code, 235-238) |
NEOPLASMS OF UNCERTAIN BEHAVIOR |
0.749679 |
(code, 150-159) |
MALIGNANT NEOPLASM OF DIGESTIVE ORGANS AND PERITONEUM |
0.660028 |
(code, 510-519) |
OTHER DISEASES OF RESPIRATORY SYSTEM |
0.545930 |
(code, V13) |
Personal history of other diseases |
0.542035 |
(code, E930) |
Antibiotics |
0.514473 |
(code, E933) |
Primarily systemic agents |
0.493461 |
(code, 260-269) |
NUTRITIONAL DEFICIENCIES |
0.492130 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Advanced.Cancer_log_reg_roc.png
AUC = 0.7756792577866136
0.5 Sensitivity Probability Threshold = 0.10394196524343723
Confusion matrix: [TN FP; FN, TP]
[[479 24]
[ 13 11]]
----------------------------------
Advanced.Heart.Disease
|
descr |
weight |
icd9 |
|
|
(code, 420-429) |
OTHER FORMS OF HEART DISEASE |
0.005062 |
(code, 410-414) |
ISCHEMIC HEART DISEASE |
0.003937 |
(code, V45) |
Other postprocedural states |
0.002815 |
(code, 580-589) |
NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS |
0.001645 |
(code, 393-398) |
CHRONIC RHEUMATIC HEART DISEASE |
0.001249 |
(code, 785) |
Symptoms involving cardiovascular system |
0.000874 |
(code, 270-279) |
OTHER METABOLIC AND IMMUNITY DISORDERS |
0.000752 |
(code, V58) |
Encounter for other and unspecified procedures and aftercare |
0.000590 |
(code, V43) |
Organ or tissue replaced by other means |
0.000558 |
(code, 240-246) |
DISORDERS OF THYROID GLAND |
0.000547 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Advanced.Heart.Disease_log_reg_roc.png
AUC = 0.7463864306784661
0.5 Sensitivity Probability Threshold = 0.1389763286339258
Confusion matrix: [TN FP; FN, TP]
[[348 104]
[ 38 37]]
----------------------------------
Advanced.Lung.Disease
|
descr |
weight |
icd9 |
|
|
(code, 730-739) |
OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES |
1.608568 |
(code, 510-519) |
OTHER DISEASES OF RESPIRATORY SYSTEM |
1.206375 |
(code, V46) |
Other dependence on machines and devices |
1.174741 |
(code, 460-466) |
ACUTE RESPIRATORY INFECTIONS |
0.765785 |
(code, 490-496) |
CHRONIC OBSTRUCTIVE PULMONARY DISEASE AND ALLIED CONDITIONS |
0.735720 |
(code, 240-246) |
DISORDERS OF THYROID GLAND |
0.610108 |
(code, 340-349) |
OTHER DISORDERS OF THE CENTRAL NERVOUS SYSTEM |
0.580053 |
(code, V49) |
Other conditions influencing health status |
0.565586 |
(code, V02) |
Carrier or suspected carrier of infectious diseases |
0.533711 |
(code, V13) |
Personal history of other diseases |
0.514750 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Advanced.Lung.Disease_log_reg_roc.png
AUC = 0.8227696216826652
0.5 Sensitivity Probability Threshold = 0.19510015367599318
Confusion matrix: [TN FP; FN, TP]
[[438 45]
[ 23 21]]
----------------------------------
Alcohol.Abuse
|
descr |
weight |
icd9 |
|
|
(code, 570-579) |
OTHER DISEASES OF DIGESTIVE SYSTEM |
1.163718 |
(code, 290-299) |
PSYCHOSES |
1.160067 |
(code, V60) |
Housing, household, and economic circumstances |
1.134736 |
(code, 789) |
Other symptoms involving abdomen and pelvis |
0.949451 |
(code, 300-316) |
NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS |
0.786526 |
(code, 070-079) |
OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE |
0.775466 |
(code, V08) |
Asymptomatic human immunodeficiency virus [HIV] infection status |
0.665920 |
(code, V15) |
Other personal history presenting hazards to health |
0.638283 |
(code, V11) |
Personal history of mental disorder |
0.594422 |
(code, E888) |
Other and unspecified fall |
0.565786 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Alcohol.Abuse_log_reg_roc.png
AUC = 0.7530608435983576
0.5 Sensitivity Probability Threshold = 0.15219003911941675
Confusion matrix: [TN FP; FN, TP]
[[433 37]
[ 30 27]]
----------------------------------
Chronic.Neurological.Dystrophies
|
descr |
weight |
icd9 |
|
|
(code, 340-349) |
OTHER DISORDERS OF THE CENTRAL NERVOUS SYSTEM |
0.002613 |
(code, 590-599) |
OTHER DISEASES OF URINARY SYSTEM |
0.001986 |
(code, 780) |
General symptoms |
0.001867 |
(code, 330-337) |
HEREDITARY AND DEGENERATIVE DISEASES OF THE CENTRAL NERVOUS SYSTEM |
0.001600 |
(code, 240-246) |
DISORDERS OF THYROID GLAND |
0.001534 |
(code, 350-359) |
DISORDERS OF THE PERIPHERAL NERVOUS SYSTEM |
0.001523 |
(code, 430-438) |
CEREBROVASCULAR DISEASE |
0.001221 |
(code, 500-508) |
PNEUMOCONIOSES AND OTHER LUNG DISEASES DUE TO EXTERNAL AGENTS |
0.001212 |
(code, 700-709) |
OTHER DISEASES OF SKIN AND SUBCUTANEOUS TISSUE |
0.001206 |
(code, 249-259) |
DISEASES OF OTHER ENDOCRINE GLANDS |
0.001183 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Chronic.Neurological.Dystrophies_log_reg_roc.png
AUC = 0.7241262346684033
0.5 Sensitivity Probability Threshold = 0.14620750976262598
Confusion matrix: [TN FP; FN, TP]
[[362 82]
[ 43 40]]
----------------------------------
Chronic.Pain.Fibromyalgia
|
descr |
weight |
icd9 |
|
|
(code, 270-279) |
OTHER METABOLIC AND IMMUNITY DISORDERS |
0.001499 |
(code, 730-739) |
OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES |
0.001358 |
(code, 030-041) |
OTHER BACTERIAL DISEASES |
0.001335 |
(code, V58) |
Encounter for other and unspecified procedures and aftercare |
0.001314 |
(code, 725-729) |
RHEUMATISM, EXCLUDING THE BACK |
0.001144 |
(code, 580-589) |
NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS |
0.001033 |
(code, 590-599) |
OTHER DISEASES OF URINARY SYSTEM |
0.000974 |
(code, 710-719) |
ARTHROPATHIES AND RELATED DISORDERS |
0.000923 |
(code, 530-538) |
DISEASES OF ESOPHAGUS, STOMACH, AND DUODENUM |
0.000849 |
(code, 415-417) |
DISEASES OF PULMONARY CIRCULATION |
0.000804 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Chronic.Pain.Fibromyalgia_log_reg_roc.png
AUC = 0.6382045539380365
0.5 Sensitivity Probability Threshold = 0.10599105530911747
Confusion matrix: [TN FP; FN, TP]
[[357 113]
[ 30 27]]
----------------------------------
Dementia
|
descr |
weight |
icd9 |
|
|
(code, 290-299) |
PSYCHOSES |
0.002410 |
(code, 330-337) |
HEREDITARY AND DEGENERATIVE DISEASES OF THE CENTRAL NERVOUS SYSTEM |
0.000984 |
(code, 410-414) |
ISCHEMIC HEART DISEASE |
0.000682 |
(code, V45) |
Other postprocedural states |
0.000663 |
(code, 420-429) |
OTHER FORMS OF HEART DISEASE |
0.000591 |
(code, 560-569) |
OTHER DISEASES OF INTESTINES AND PERITONEUM |
0.000511 |
(code, 580-589) |
NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS |
0.000466 |
(code, 590-599) |
OTHER DISEASES OF URINARY SYSTEM |
0.000463 |
(code, 785) |
Symptoms involving cardiovascular system |
0.000449 |
(code, 030-041) |
OTHER BACTERIAL DISEASES |
0.000413 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Dementia_log_reg_roc.png
AUC = 0.757051282051282
0.5 Sensitivity Probability Threshold = 0.03856300219804359
Confusion matrix: [TN FP; FN, TP]
[[459 48]
[ 11 9]]
----------------------------------
Depression
|
descr |
weight |
icd9 |
|
|
(code, 300-316) |
NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS |
0.003354 |
(code, 290-299) |
PSYCHOSES |
0.001928 |
(code, 530-538) |
DISEASES OF ESOPHAGUS, STOMACH, AND DUODENUM |
0.001889 |
(code, 070-079) |
OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE |
0.001476 |
(code, 580-589) |
NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS |
0.001269 |
(code, 350-359) |
DISORDERS OF THE PERIPHERAL NERVOUS SYSTEM |
0.001053 |
(code, V45) |
Other postprocedural states |
0.001042 |
(code, 730-739) |
OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES |
0.000959 |
(code, V60) |
Housing, household, and economic circumstances |
0.000855 |
(code, 490-496) |
CHRONIC OBSTRUCTIVE PULMONARY DISEASE AND ALLIED CONDITIONS |
0.000694 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Depression_log_reg_roc.png
AUC = 0.6824449748077434
0.5 Sensitivity Probability Threshold = 0.16291146320902805
Confusion matrix: [TN FP; FN, TP]
[[304 115]
[ 55 53]]
----------------------------------
Developmental.Delay.Retardation
|
descr |
weight |
icd9 |
|
|
(code, 317-319) |
MENTAL RETARDATION |
4.468493 |
(code, 150-159) |
MALIGNANT NEOPLASM OF DIGESTIVE ORGANS AND PERITONEUM |
1.593733 |
(code, E939) |
Psychotropic agents |
1.532755 |
(code, 500-508) |
PNEUMOCONIOSES AND OTHER LUNG DISEASES DUE TO EXTERNAL AGENTS |
1.451041 |
(code, 480-488) |
PNEUMONIA AND INFLUENZA |
1.393855 |
(code, 780) |
General symptoms |
1.289353 |
(code, 290-299) |
PSYCHOSES |
1.198573 |
(code, 240-246) |
DISORDERS OF THYROID GLAND |
1.193123 |
(code, 690-698) |
OTHER INFLAMMATORY CONDITIONS OF SKIN AND SUBCUTANEOUS TISSUE |
1.041017 |
(code, 555-558) |
NONINFECTIOUS ENTERITIS AND COLITIS |
0.975399 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Developmental.Delay.Retardation_log_reg_roc.png
AUC = 0.968956043956044
0.5 Sensitivity Probability Threshold = 0.29711871330799156
Confusion matrix: [TN FP; FN, TP]
[[515 5]
[ 5 2]]
----------------------------------
Non.Adherence
|
descr |
weight |
icd9 |
|
|
(code, 530-538) |
DISEASES OF ESOPHAGUS, STOMACH, AND DUODENUM |
0.001452 |
(code, V15) |
Other personal history presenting hazards to health |
0.001374 |
(code, 350-359) |
DISORDERS OF THE PERIPHERAL NERVOUS SYSTEM |
0.001329 |
(code, 580-589) |
NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS |
0.001171 |
(code, 360-379) |
DISORDERS OF THE EYE AND ADNEXA |
0.000959 |
(code, 070-079) |
OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE |
0.000900 |
(code, V58) |
Encounter for other and unspecified procedures and aftercare |
0.000789 |
(code, 270-279) |
OTHER METABOLIC AND IMMUNITY DISORDERS |
0.000754 |
(code, 300-316) |
NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS |
0.000565 |
(code, V60) |
Housing, household, and economic circumstances |
0.000385 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Non.Adherence_log_reg_roc.png
AUC = 0.8049337957124844
0.5 Sensitivity Probability Threshold = 0.07632155947215179
Confusion matrix: [TN FP; FN, TP]
[[447 41]
[ 20 19]]
----------------------------------
None
|
descr |
weight |
icd9 |
|
|
(code, V42) |
Organ or tissue replaced by transplant |
0.210770 |
(code, 996-999) |
COMPLICATIONS OF SURGICAL AND MEDICAL CARE, NOT ELSEWHERE CLASSIFIED |
0.199843 |
(code, 001-009) |
INTESTINAL INFECTIOUS DISEASES |
0.121922 |
(code, 786) |
Symptoms involving respiratory system and other chest symptoms |
0.121564 |
(code, 480-488) |
PNEUMONIA AND INFLUENZA |
0.110942 |
(code, 440-449) |
DISEASES OF ARTERIES, ARTERIOLES, AND CAPILLARIES |
0.107852 |
(code, 360-379) |
DISORDERS OF THE EYE AND ADNEXA |
0.104099 |
(code, 420-429) |
OTHER FORMS OF HEART DISEASE |
0.100312 |
(code, 788) |
Symptoms involving urinary system |
0.099602 |
(code, 393-398) |
CHRONIC RHEUMATIC HEART DISEASE |
0.095794 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_None_log_reg_roc.png
AUC = 0.6605450236966824
0.5 Sensitivity Probability Threshold = 0.6548677574348872
Confusion matrix: [TN FP; FN, TP]
[[154 57]
[164 152]]
----------------------------------
Obesity
|
descr |
weight |
icd9 |
|
|
(code, 270-279) |
OTHER METABOLIC AND IMMUNITY DISORDERS |
0.001250 |
(code, 700-709) |
OTHER DISEASES OF SKIN AND SUBCUTANEOUS TISSUE |
0.001237 |
(code, 510-519) |
OTHER DISEASES OF RESPIRATORY SYSTEM |
0.000885 |
(code, 415-417) |
DISEASES OF PULMONARY CIRCULATION |
0.000885 |
(code, 327) |
ORGANIC SLEEP DISORDERS |
0.000819 |
(code, 580-589) |
NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS |
0.000713 |
(code, 680-686) |
INFECTIONS OF SKIN AND SUBCUTANEOUS TISSUE |
0.000630 |
(code, 780) |
General symptoms |
0.000586 |
(code, V58) |
Encounter for other and unspecified procedures and aftercare |
0.000498 |
(code, 590-599) |
OTHER DISEASES OF URINARY SYSTEM |
0.000498 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Obesity_log_reg_roc.png
AUC = 0.6649800796812749
0.5 Sensitivity Probability Threshold = 0.04417569663751994
Confusion matrix: [TN FP; FN, TP]
[[387 115]
[ 14 11]]
----------------------------------
Other.Substance.Abuse
|
descr |
weight |
icd9 |
|
|
(code, 960-979) |
POISONING BY DRUGS, MEDICINAL AND BIOLOGICAL SUBSTANCES |
2.868189 |
(code, 070-079) |
OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE |
2.261325 |
(code, V60) |
Housing, household, and economic circumstances |
2.162543 |
(code, E939) |
Psychotropic agents |
1.824646 |
(code, E888) |
Other and unspecified fall |
1.799809 |
(code, E935) |
Analgesics, antipyretics, and antirheumatics |
1.577303 |
(code, 110-118) |
MYCOSES |
1.473258 |
(code, 300-316) |
NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS |
1.347367 |
(code, E928) |
Other and unspecified environmental and accidental causes |
1.288913 |
(code, E854) |
Accidental poisoning by other psychotropic agents |
1.189443 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Other.Substance.Abuse_log_reg_roc.png
AUC = 0.8683927932227659
0.5 Sensitivity Probability Threshold = 0.33163980391797443
Confusion matrix: [TN FP; FN, TP]
[[477 16]
[ 18 16]]
----------------------------------
Schizophrenia.and.other.Psychiatric.Disorders
|
descr |
weight |
icd9 |
|
|
(code, 290-299) |
PSYCHOSES |
0.003059 |
(code, 300-316) |
NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS |
0.002363 |
(code, 730-739) |
OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES |
0.001407 |
(code, 340-349) |
OTHER DISORDERS OF THE CENTRAL NERVOUS SYSTEM |
0.001083 |
(code, 070-079) |
OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE |
0.001064 |
(code, 240-246) |
DISORDERS OF THYROID GLAND |
0.000945 |
(code, V60) |
Housing, household, and economic circumstances |
0.000929 |
(code, 030-041) |
OTHER BACTERIAL DISEASES |
0.000914 |
(code, 270-279) |
OTHER METABOLIC AND IMMUNITY DISORDERS |
0.000852 |
(code, 787) |
Symptoms involving digestive system |
0.000769 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Schizophrenia.and.other.Psychiatric.Disorders_log_reg_roc.png
AUC = 0.7535087719298247
0.5 Sensitivity Probability Threshold = 0.11327678496224783
Confusion matrix: [TN FP; FN, TP]
[[394 76]
[ 30 27]]
----------------------------------
Unsure
|
descr |
weight |
icd9 |
|
|
(code, 420-429) |
OTHER FORMS OF HEART DISEASE |
0.001652 |
(code, 270-279) |
OTHER METABOLIC AND IMMUNITY DISORDERS |
0.001351 |
(code, 580-589) |
NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS |
0.001012 |
(code, V58) |
Encounter for other and unspecified procedures and aftercare |
0.000998 |
(code, 730-739) |
OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES |
0.000880 |
(code, 440-449) |
DISEASES OF ARTERIES, ARTERIOLES, AND CAPILLARIES |
0.000816 |
(code, 790) |
Nonspecific findings on examination of blood |
0.000695 |
(code, 393-398) |
CHRONIC RHEUMATIC HEART DISEASE |
0.000681 |
(code, 780) |
General symptoms |
0.000679 |
(code, 240-246) |
DISORDERS OF THYROID GLAND |
0.000676 |
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Unsure_log_reg_roc.png
AUC = 0.5328101155439284
0.5 Sensitivity Probability Threshold = 0.18846364109275968
Confusion matrix: [TN FP; FN, TP]
[[213 204]
[ 56 54]]
----------------------------------
Out[202]:
['log_dir', 'input_dir', 'results_dir', 'repo_data_dir']
Saving classifiers to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_icd9_log_reg.pkl
Comments
comments powered by Disqus