Keywords
neurosurgery - arificial intelligence - degenerative lumbar spine disease - machine
learning - Random Forest Classifier - lumbar canal stenosis
Introduction
Low back pain (LBP) is one of the leading causes of nonfatal health loss.[1] LBP can be caused by the involvement of any spinal structure that is innervated
and is susceptible to disease or injury.[2]
In the context of LBP, degenerative changes in the spine were subcategorized into
spondylolisthesis, disc degeneration, and spinal stenosis to outline the types of
pathology and the potential ramification of surgical intervention.
Surgical treatment is effective—lumbar discectomy being the standard surgical procedure
for patients with lumbar disc herniation.[3] Spontaneous regression of herniated disc tissue can occur in most patients, and
can be treated with conservative strategies.[4]
Bony decompression by laminectomy is still considered the gold standard of surgery
and the most common technique for lumbar spinal stenosis.[5] Leg pain relief and better back-related functional status favored those initially
receiving surgical treatment.[6]
Laminectomy and posterior instrumented spinal fusion are the standard of care and
are the most commonly performed surgical procedure for the management of degenerative
spondylolisthesis.[7] Patients with degenerative spondylolisthesis and spinal stenosis treated surgically
showed considerably more improvement in pain and function than patients treated conservatively.[8]
The surgical outcome can be measured as a change in symptoms intensity, resumption
of activity, and patient satisfaction.
Machine learning (ML) classification is a domain of artificial intelligence that enables
algorithms to learn patterns in large, complex datasets and generate useful predictive
outputs.[9] It represents a set of powerful technologies capable of effectively predicting outcomes
to support decision-making in neurosurgery.[10]
Objective
The aim of the study was to develop a prognostic model using artificial intelligence
for patients undergoing lumbar spine surgery for degenerative spine disease for change
in pain, functional status, and patient satisfaction based on preoperative variables
(sociodemographic, clinical, and radiological).
Methods and Materials
Study Population
This prospective study was conducted in the Neurosurgery Department of Sawai Man Singh
Medical College, Jaipur, after obtaining ethical clearance from University Ethics
Committee. Patients included those undergoing lumbar spine surgery for degenerative
spine disease. These included Open discectomy for disk degeneration with prolapsed
intervertebral disk, Decompression – Open laminectomy with foraminotomy for patients
with Spinal canal stenosis, Decompression with posterolateral fixation for those with
spinal canal stenosis with instability. All patients failed to respond to at least
6 weeks of conservative management, including physical therapy, anti-inflammatory
medications, and analgesics. A cohort of patients managed conservatively were followed
up for 6 months.
Inclusion Criteria
Patients older than 18 years of age were admitted from January 2019 to January 2021
for complaints of LBP with or without radiculopathy, diagnosed as having degenerative
spine disease using symptomatology and magnetic resonance imaging (MRI).
Exclusion Criteria
Exclusion criteria included trauma, neoplasm, infection, congenital deformation, and
chronic illness such as rheumatoid arthritis. Patients with an extraspinal cause of
back/neck pain or radiculopathy were excluded.
Prognostic Factors
Sociodemographic Factors
Sociodemographic factors included age, gender, body mass index (BMI), occupation (sedentary,
light, medium, heavy), and smoking status.
Symptomatology
A reliable questionnaire was filled out by patients before surgery that included severity
of back pain as compared with leg pain, duration of symptoms, whether ambulant independently
or with support or bedridden, history of previous lumbar spine surgery, Hamilton Anxiety
scale (HAM-A), visual analog scale (VAS), Neurogenic Claudication Outcome Score (NCOS)
and Modified Oswestry Disability Index (MODI). A thorough neurological examination
was done and the presence of neurological deficit was noted that included objective
weakness and cauda equina syndrome.
Psychosocial Risk Factors
HAM-A was developed to measure the severity of anxiety symptoms.[11] It was used as an attempt to remove the confounding effect of subjective patient-reported
scores.
Radiological Factors
Pfirrmann grading system is used as a standardized and reliable assessment of disc
morphology based on MRI.[12] Disc herniation was categorized into normal, symmetric disc bulging, disc protrusion,
disc extrusion, and free fragment.[3]
Modic et al described the types of signal changes, classification criteria of the
lumbar endplate, and bone marrow changes on MRI scans.[13]
Central canal area at the level of maximum compression was used to grade central spinal
canal stenosis.
Lateral spinal canal stenosis was graded according to Bartynski and Lin.[14]
Foraminal spinal stenosis was graded according to perineural intraforaminal fat,[15] hypertrophic facet degeneration,[16] foraminal nerve root impingement,[17] and size and shape of the foramen.[16]
Outcome Measures
The primary outcome measures were the VAS, MODI, and NCOS collected preoperatively
and at 6 months postoperatively during clinic review.
Surgical Procedures
The study surgeons had an average of 10 years' experience in spinal surgery in the
neurosurgery department and regularly performed the surgeries employing standard procedures.
Bony decompression by laminectomy and foraminotomy for lumbar canal stenosis and discectomy
for disc herniation were classified as decompression (management—one) and laminectomy
with posterior instrumented spinal fixation for degenerative spondylolisthesis as
decompression with fixation (management—two). Patients managed conservatively were
included in management—zero.
Data Analysis
For the prediction of improvement in various indices, we subtracted the index value
before and after surgery. Later the improvement was classified into various classes.
VAS improvement was divided into four classes, that is,, 0, 1, 2, and 3 for a range
of 0 to 2, 3 to 4, 5 to 6, and 7 to 10, respectively. MODI improvement was also divided
into four classes for a range of 0 to 5, 5 to 10, 11 to 15, and more than 15. NCOS
improvement was further divided into four classes for a range of 0 to 10, 11 to 20,
21 to 30, and more than 30. Label encoding was used for other discrete input classes
that could be used to train the model.
For classification, different algorithms used were Logistic Regression, Decision Tree
Classifier, Random Forest Classifier, Support Vector Machine, and K-Nearest Neighbor.
Two different models for each type of classifier were trained. The first model was
trained with the sociodemographic factors, symptomatology scoring, psychosocial factors,
and radiological factors as mentioned previously. The second model was trained with
the type of management used.
The dataset was divided into training and testing datasets. The first subset is used
to fit the model and is referred to as the training dataset that is 80% of the complete
dataset. The second subset is not used to train the model; instead, the input element
of the dataset is provided to the model, then predictions are made and compared with
the expected values. This second dataset is referred to as the test dataset that is
the remaining 20% of the dataset left apart. The dataset was divided in such a way
that the class ratio of all classes remained proportional in the training and test
dataset. The objective was to estimate the performance of the ML model on new data
(data not used to train the model and have all the classes present for validation).
Example: Training of VAS improvement model with logistic regression—the first logistic
regression model was trained with features like sex and BMI, as mentioned previously.
The second model was trained with management class as input. The final predicted probability
was the average of individual probability and the outcome was evaluated based on ROC
(receiver-operating characteristic) – AUC (area under the curve) score. A similar
approach was followed to train all other models.
Results
There were a total of 180 patients with lumbar spine disease over 18 years of age
enrolled in our study. Our study had 53 patients treated conservatively, 96 patients
underwent decompression, and 31 patients underwent decompression with posterior instrumented
spinal fixation. The average age of patients in the study was 50.26 years. About 40.55%
of patients had complaints of back pain without radiculopathy, whereas the rest of
the patients complained of radiculopathy with or without back pain. Forty-two (23.33%)
patients had symptoms for less than 3 months of which 15 were treated conservatively,
22 underwent decompression, and 5 underwent decompression with fixation. Fifty-eight
(32.22%) patients had symptoms for 3 to 6 months of which 22 were treated conservatively
and 30 underwent decompression and 6 underwent fixation. Eighty (44.44%) patients
had symptoms for more than 6 months of which 16 were treated conservatively, 44 underwent
decompression, and 20 underwent decompression with fixation. Nine (5%) patients had
a history of previous lumbar spine surgery. Twenty-six (14.4%) patients presented
with neurological deficit out of which 25 were operated on and only 4 had improvement
in their weakness postoperatively. These four had symptoms for less than 3 months.
The average improvement in VAS, NCOS, and MODI scores categorized according to the
management is given in [Table 1].
Table 1
Average improvement in VAS, NCOS, and MODI scores according to management
|
Conservative
|
Decompression
|
Decompression + fixation
|
p-Value
|
|
VAS score (0–10)
|
|
Before
|
6.09 ± 1.19
|
8.22 ± 1.17
|
7.97 ± 1.14
|
0.02
|
|
After
|
3.34 ± 1.48
|
2.73 ± 1.50
|
3.42 ± 1.63
|
< 0.001
|
|
Improvement
|
2.75 ± 1.81
|
5.49 ± 1.89
|
4.55 ± 1.69
|
< 0.001
|
|
MODI (6–60); 60—maximum disability
|
|
Before
|
14.94 ± 4.07
|
16.60 ± 3.89
|
17.84 ± 3.62
|
< 0.001
|
|
After
|
20.04 ± 5.11
|
29.47 ± 5.11
|
29.97 ± 3.82
|
< 0.001
|
|
Improvement
|
5.09 ± 3.06
|
12.86 ± 4.95
|
12.13 ± 2.91
|
< 0.001
|
|
NCOS (0–100); 100—asymptomatic, full function
|
|
Before
|
85.15 ± 7.30
|
82.70 ± 7.65
|
80.23 ± 7.87
|
0.011
|
|
After
|
78.66 ± 8.46
|
61.17 ± 10.48
|
59.81 ± 8.32
|
< 0.001
|
|
Improvement
|
6.49 ± 4.32
|
21.53 ± 11.04
|
20.42 ± 8.55
|
< 0.001
|
Abbreviations: MODI, Modified Oswestry Disability Index; NCOS, Neurogenic claudication
Outcome Score; VAS, visual analog scale.
Average improvement in VAS score, NCOS, and MODI scores in patients between preoperative
period and at 6 months follow-up is described. It is categorized based on management
strategy into conservative, decompression, and decompression with fixation.
For patients managed conservatively, multiple logistic regression applied to the dataset
obtained showed active occupation associated with improved outcome in VAS, lesser
BMI, back pain not associated with radiculopathy, and no foraminal nerve root impingement
on MRI associated with improved outcome in MODI scores. The preoperative scores in
the patients treated conservatively were not that poor, which could explain the lack
of significant improvement in the scores.
For patients undergoing lumbar decompression, multiple logistic regression analysis
suggested younger age, independent ambulation, duration of symptoms less than 3 months,
and back pain without associated radiculopathy and no foraminal nerve root impingement
on MRI associated with improved outcome in VAS; younger age, independent ambulation,
and lesser disc degeneration grading on MRI were associated with improved outcome
in MODI scores; back pain without radiculopathy was associated with improved outcome
in NCOS.
For patients undergoing lumbar decompression with fixation, multiple logistic regression
applied showed back pain not associated with radiculopathy and no foraminal nerve
root impingement on MRI associated with improved outcome in VAS, lesser BMI, involvement
in sports activities, no past spine surgery, and no neurological deficit associated
with improved outcome in NCOS.
The AUC was calculated from the ROC analysis to evaluate the discrimination capability
of various ML models ([Table 2]). Random Forest Classifier gave the best ROC-AUC score in all three classes and
was therefore used. The AUC score for VAS, MODI, and NCOS was 0.863, 0.831, and 0.869,
respectively, and the macroaverage AUC score was found to be 0.842.
Table 2
Discrimination capability of the machine learning models for VAS, NCOS, and MODI scores
|
Machine learning algorithm
|
VAS improvement ROC-AUC score
|
MOD index improvement ROC-AUC score
|
NCOS improvement ROC-AUC score
|
|
Logistic Regression
|
0.817
|
0.829
|
0.826
|
|
Decision Tree Classifier
|
0.753
|
0.613
|
0.657
|
|
Random Forest Classifier
|
0.863
|
0.831
|
0.869
|
|
Support Vector Machine
|
0.768
|
0.732
|
0.673
|
|
K-Nearest Neighbor
|
0.681
|
0.689
|
0.614
|
Abbreviations: MOD, Modified Oswestry Disability Index; NCOS, Neurogenic Claudication
Outcome Score; ROC-AUC, receiver-operating characteristic-area under the curve; VAS,
visual analog scale.
The AUC was calculated from the ROC analysis to evaluate the discrimination capability
of various machine learning algorithms. Random Forest Classifier gave the best ROC-AUC
score in all three classes and was therefore used.
A graphical user interface (GUI) tool was built to take input details of patients.
The surgeon can select the initial management used, based on the inputs the model
predicts the improvement in all three indices based on management. Further, the surgeon
can change the type of management used and see the difference in the improvement in
the indices and find the best suitable management for the particular patient.
Discussion
Degenerative lumbar spine disease is a multifactorial entity in its causation, pathology
as well as management. Anatomy of the spine and its relationship with spinal cord
and nerve roots is complex, thus degenerative disease process causing symptoms has
to be dealt with keeping in mind multiple variables involved.
A longer duration of symptoms more than 3 months was associated with a less favorable
outcome as measured by improvement in VAS. Similar results have been found in previous
studies.[18]
[19]
[20]
Lesser BMI was associated with improved outcomes in patients managed conservatively
and those undergoing spinal decompression with instrumented fixation as measured by
improvement in MODI scores and NCOS, respectively. In the Spine Patient Outcomes Research
Trial, obese patients showed less improvement from baseline with conservative management.[21]
Independent ambulation preoperatively was associated with better outcomes in patients
undergoing decompression surgery as measured by improvement in VAS and MODI scores.
In our study, leg symptoms signifying radiculopathy were associated with poor outcomes
in all management groups. Neurological deficit was associated with poor outcomes following
decompression and fixation as denoted by NCOS. Many studies suggest that the presence
of a radicular deficit (i.e., foot drop) presurgery is a negative predictive factor
in terms of patient satisfaction.[22]
[23]
History of past lumbar spine surgery was associated with poorer outcomes in patients
undergoing decompression with fixation as denoted by NCOS. A study done by Hébert
et al showed previous spine surgery to be associated with poor leg pain outcome following
surgery for degenerative lumbar spine surgery.[24]
Our aim with this study is to investigate prognostic factors for clinical outcome
after lumbar spine surgery for degenerative spine disease and create a prognostic
model using artificial intelligence that could potentially aid in decision-making.
This study shows that a ML model could be used as a predictive tool for deciding the
type of management that a patient should undergo to achieve the best results. A combination
of various parameters, in a ML model, could be applied to estimate the improvement
in patient scores with a high degree of accuracy.
The area under the ROC curve (AUC) is widely recognized as the measure of a diagnostic
test's discriminatory power. The maximum value for the AUC is 1.0, thereby indicating
a (theoretically) perfect test (i.e., 100% sensitive and 100% specific). An AUC value
of 0.5 indicates no discriminative value (i.e., 50% sensitive and 50% specific).[25]
The macroaverage AUC score was found to be 0.842 that is demonstrating moderate discriminatory
power, therefore it suggests potential utility as a tool or a support system that
could be used by experts as one of the inputs into the decision-making process.
The limitation of this model would be a small dataset. To make the model more accurate,
the study should be repeated with larger dataset that would make the prediction more
accurate. More variables could be added that would help in a truer prediction of outcome
following the management.
Randomized clinical trials are needed to establish benefits and to confirm these findings.
Finally, artificial intelligence will never replace human expert decision-makers,
but they can assist in double-checking and enhancing the routine decision-making process
and thus help the patient.
The toolkit can be accessed online on URL: http://134.209.148.167:5000.
Conclusion
This study demonstrates that ML can be used as a tool to help tailor the decision-making
process for a patient to achieve the best results. The GUI tool helps to incorporate
the study results into active decision-making. This study would encourage further
inroads into the use of artificial intelligence in the medical field for assistance
in the decision-making in patient management.