Evaluating the C-section rate of different physician practices: using machine learning to model standard practice.
Evaluating the C-section Rate of Different Physician Practices:
Using Machine Learning to Model Standard Practice
Rich Caruana1, Radu S. Niculescu2, R. Bharat Rao3, Cynthia Simms MD4
; ; ;
1
Cornell University, Computer Science, 4157 Upson Hall, Ithaca, NY 14853
2
Carnegie Mellon University, Computer Science, 5000 Forbes Avenue, Pittsburgh, PA 15213
3
Siemens Medical Solutions, Inc., 51 Valley Stream Parkway, Malvern PA 19355
4
Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh,
Magee-Womens Hospital, 300 Halket St., Pittsburgh PA 15213
ABSTRACT
The C-section rate of a population of 22,175 expectant
mothers is 16.8%; yet the 17 physician groups that serve
this population have vastly different group C-section
rates, ranging from 13% to 23%. Our goal is to
determine retrospectively if the variations in the observed
rates can be attributed to variations in the intrinsic risk of
the patient sub-populations (i.e. some groups contain
more ``high-risk C-section'' patients), or differences in
physician practice (i.e. some groups do more C-sections).
We apply machine learning to this problem by training
models to predict standard practice from retrospective
data. We then use the models of standard practice to
evaluate the C-section rate of each physician practice.
Our results indicate that although there is variation in
intrinsic risk among the groups, there is also much
variation in physician practice.
1. INTRODUCTION
Our goal is to determine if groups of patients seen by
different physician practices have different intrinsic risks
for C-section. Our approach is as follows: we train a
model to predict standard practice using machine learning
(in this study, bagged probabilistic decision trees). We
use the model to estimate the intrinsic risk of each group
by averaging the C-section risk the model predicts for
each patient in that group. Differences between the
observed and predicted C-section rates indicate physician
groups with behavior different from that predicted by the
standard practice model.
Intrinsic factors are factors related to patient health that
should be used to make care decisions. Our data includes
82 intrinsic factors: pre-pregnancy health-and-physical
factors such as maternal age, weight, smoking, diabetes,
and prior pregnancy; mid-pregnancy factors such as
changes in maternal blood sugar and estimated fetal
weight; and labor factors such as maternal blood pressure
and fetal distress. These intrinsic factors are the inputs to
the model trained to predict C-section. Extrinsic factors
are all factors not entailed by these inputs. Extrinsic
factors include type of physician practice, type of patient
insurance, and patient socio-economic status. The model
trained to predict standard practice is allowed to use
intrinsic variables to predict patient risk. If the model is
accurate, it will compensate for differences between
patients (or groups of patients) caused by the intrinsic
variables, but will not compensate for differences due to
extrinsic variables it did not have access to. This will
allow us to determine if the variations in observed Csection rates can be attributed to variations in intrinsic
risk of the patient sub-populations (i.e., some groups see
more ``high-risk C-section'' patients), or if they are due to
differences in physician practice (i.e., some groups do Csections more often).
Section 2 discusses the problem of C-section rate. Section
3 describes our methodology. We use bagged decision
trees to train a model of standard practice. Section 4 uses
this model to predict the intrinsic risk of different groups
of patients. Differences between observed and predicted
risk represent a possible difference between physician
behavior and standard practice. Section 5 discusses the
assumptions made by this approach.
2. BACKGROUND
2.1 Problem Definition
In the U.S. about 17% of births are by C-section. In
Europe, the C-section rate is substantially lower, but
outcomes do not appear to be worse. Poma notes that the
C-section rate in the U.S. increased significantly, yet there
has not been a related improvement in neonatal outcomes,
suggesting the rate is unnecessarily high [4].
The
Pennsylvania Health Care Cost Containment Council
notes that cesarean deliveries carry increased risk of
complications and longer patient recovery times as well as
higher health care costs [3]. The average cost of a Csection in Southwestern PA in 1998 was $7,885 and the
average cost for a vaginal delivery was $4,787.
There are medical and financial benefits to a lower Csection rate if outcomes are not adversely affected.
AMIA 2003 Symposium Proceedings − Page 135
Insurance companies in the U.S. have begun applying
financial pressure to lower the C-section rate. One such
policy is to pay for a fixed percentage of C-sections. If a
practice has a rate higher than the quota, it must make up
the difference. If the rate is lower, it makes more profit.
There are problems with using financial pressure to reduce
C-sections. One problem is the tragedy of the commons:
individual doctors often have incentives not to lower their
C-section rate, even though groups of physicians would
benefit by lowering their group rate. This problem is
complicated by the fact that doctors do not see patients of
equal risk.
Some doctors specialize in high-risk
pregnancies and thus should have a higher C-section rate.
To evaluate practices fairly, an objective model needs to
be developed that can predict whether or not patients
should have received C-section.
In [1], the C-section rates of different hospitals are
compared after correcting for the fact that hospitals saw
patients with different risks. They constructed a logistic
regression model to predict patient risk. Recent studies by
members of our group indicated that machine learning
methods such as decision trees and neural nets might be
preferable to logistic regression [2].
Commonly agreed upon C-section risk factors were used
in [3] to distinguish between high and low-risk patients. In
[4], an attempt was made to determine obstetrician
characteristics that affect C-section rate. The extrinsic
factors correlated with lower C-section rates were:
younger obstetrician age, graduation from a domestic
medical school, belonging to a group practice, and a
smaller number of births.
leaf node. We often find that MML trees excel at
predicting probabilities. To further improve the predicted
probabilities, we applied bagging [9],[10] to the MML
decision trees. See [9] for a description of why bagging
usually improves the quality of probabilities predicted by
decision trees. The bagged trees were trained as follows:
1.
2.
3.
Bootstrap samples are drawn to form 100 train
sets T1…T100.
An MML decision tree is grown on each Ti.
For each example in the dataset, we average the
predictions of the trees that did not contain this
example in their training set.
4. RESULTS
It is critical that the probabilities generated by the mod (...truncated)