Keywords hospital stay - machine learning - total hip arthroplasty
Introduction
In Chile, total hip arthroplasty (THA) for the treatment of severe osteoarthritis is guaranteed by law for patients over 65 years of age.[1 ] However, little is known about the results of THA in this particular group of patients, and there is no national scientific publication (to our knowledge) that addresses the issue of hospital stay, which has a leading role in the era of value-based arthroplasty.
In the world and particularly in the United States, a sustained decrease in the length of hospital stay of patients after THA has been observed, without increased risks.[2 ] Recently it has even been proven that the outpatient modality can be successful in a select group of patients.[3 ]
[4 ] The length of hospital stay for patients over 65 years of age in the United States (2015-2016) averaged 1.8 days.[5 ] In Chile, these data have not been published.
Several tactics can be used to reduce the hospital stay after THA, including standardized management protocols,[6 ]
[7 ] and other tactics that go hand in hand with the prediction of potential perioperative complications.[8 ]
[9 ] Among the challenges of THA in our country, we have described the relevance of keeping our perioperative approach updated and with the same standards as those of the leading countries on the subject.[10 ]
As we advance in the global crisis caused by the COVID-19 pandemic, elective surgery is performed with a reduced hospital stay, without compromising patients' safety.[11 ]
[12 ] Surgeons should be able to predict the occurrence of possible complications, as well as to determine the possible length of hospital stay in their patients.
Machine learning is one of the branches of artificial intelligence[13 ], and it is understood as the manner in which computer algorithms (that is, machines) can “learn” relationships or complex patterns based on empirical data and, therefore, produce mathematical models that link a large number of covariates to a target variable of interest.[14 ]
In the medical field, among other applications, this means being able to predict, based on data extracted from specialized electronic records, risk scores (in the form of regression and prognosis) to help clinicians make more efficient and accurate decisions; therefore, machine learning can be a support tool in clinical decision making. Specifically in arthroplasty, studies[15 ]
[16 ]
[17 ] involving this technology have gained momentum, providing assistance to solve complex problems that we face in our practice.[18 ]
Our hypothesis is that the machine learning process can predict the length of hospital stay in patients undergoing THA, which has a dual purpose in clinical practice: 1) to help improve the group with a high probability of a short stay, further reducing their stay; and 2) to identify the group with a low probability of a short stay, to improve their perioperative care and eventually bring them safely to the short stay group.
The objective of the present study is to develop and validate, using machine learning, a tool capable of predicting the length of hospital stay of patients over 65 years of age undergoing THA for osteoarthritis.
Materials and Methods
Funding
The present research project and manuscript were funded by the 2020 Research Grant of the Chilean Society of Orthopaedics and Traumatology.
Data Source and Study Population
The present is a registry study. Databases of hospital discharges for the years 2016, 2017 and 2018 were collected from the website of the Department of Health Statistics and Information (Departamento de Estadísticas e Información en Salud, DEIS, in Spanish) of the Chilean Ministry of Health.[19 ] Each of these databases contains de-identified records of all hospital discharges from both public and private centers in our country, including 39 columns with data related to each of the individualized hospital discharges. Each of these records contains characterisctics pertaining to demographics, the hospital center, discharges, diagnosis, etc. In the studied period, the data of 4,944,017 hospital discharges were collected. Considering the 39 aforementioned columns, the total volume of individual variables to be discriminated and evaluated was of 192,816,663.
Considering that the data of each particular case is de-identified and comes from a public database (the identification is an alphanumeric code with no personal data, not linkable to an individual patient), the present study did not require authorization from the ethics committee.
From the primary data source, a derived database was created, including only patients aged ≥ 65 years who underwent THA (or total hip endoprosthesis) for osteoarthritis. These cases are covered under the Explicit Guarantees in Health.[1 ] These cases were selected through codes 2104129 (Total hip endoprosthesis, does not include prosthesis ) and 2104229 (Total hip endoprosthesis, includes prosthesis ) of the Chilean National Health Fund (Fondo Nacional de Salud, FONASA, in Spanish), which correspond to the M16 diagnosis (coxarthrosis) on the International Classification of Diseases, 10th revision (ICD-10), with all its secondary classifications. Patients with any kind of health insurance and from all parts of Chile operated between 2016 and 2018 were included. Procedures coded as 2104129 and 2104229 performed for a diagnosis of proximal femur fracture (S72 diagnosis on the ICD-10) and cases that were discharged from the hospital categorized as “deceased” were excluded. The sample included all the cases registered in our country for the indicated period.
Clinically-Relevant Outcome (Variable to Predict)
According to literature,[20 ] hospital stays longer than three days can be considered prolonged in the context of elective THA. In the present study, short stay will be defined as shorter than or equal to three days, and prolonged stay , as those longer than three days, it must be considered that, for the studied period, the experience in outpatient THA was limited to certain groups in our country.[4 ]
A prediction of the length of hospital stay was made as a binary variable, described as a function of two classes based on the days of hospitalization. Thus, the variable to be modeled takes two possible values: “short stay” or “prolonged stay”.
Predictive variables
From the group of 39 individual variables for each of the DEIS hospital discharges corresponding to the study population, 21 were chosen ([Table 1 ]) because they were considered relevant by the group of authors at the time of data processing. The data records were complete for each of the variables. Of these, 16 variables were used when performing a predictive process for hospital discharge. In addition, the variable “percentage of poverty in the borough” extracted from the database of the Chilean Ministry of Social Development was included.[21 ] There were no missing data in the registry used, so it was not necessary to perform imputation techniques.[22 ] It is important to note that the DEIS database contains variables collected for epidemiological purposes, and does not capture enough data at the level of individual patients. Consequentely, this model excludes variables such as comorbidities, functionality, and surgical details that could certainly influence the length of hospital stay.
Table 1
Item from the DEIS hospital discharge database
No
Variable name
Description
Datatype
Used in the model
1
ID_PACIENTE
Unique and anonymous identifier of the patient
Text
Just to discard duplicates
2
ESTABLECIMIENTO_SALUD
Hospital code
Number
Included as a possible predictor
3
GLOSA_ESTABLECIMIENTO_SALUD
Hospital name
Text
Not included in the model
4
PERTENENCIA_ESTABLECIMIENTO_SALUD
Hospital classification (part of the National Health Services System or not)
Text
Included as a possible predictor
5
SEREMI
SEREMI (Regional Ministerial Health Department) code
Number
Included as a possible predictor
6
SERVICIO_DE_SALUD
Health Service code
Number
Included as a possible predictor
7
SEXO
Code of the biological sex of the patient
Number
Included as a possible predictor
8
FECHA_NACIMIENTO
Patient's birthdate
Date
Not included in the model
9
EDAD_CANT
Numerical record of the patient's age at admission
Number
Included as a possible predictor
10
TIPO_EDAD
Unit of measurement of age, according to the modality described in values
Number
Not included in the model
11
EDAD_AÑOS
Age in years of the patient at the time of admission
Number
Not included in the model
12
PUEBLO_ORIGINARIO
Code of the town of origin code
Number
Not included in the model
13
PAIS_ORIGEN
Code of the country of origin
Number
Not included in the model
14
GLOSA_PAIS_ORIGEN
Classification of the country of origin
Text
Used to exclude foreign patients
15
COMUNA_RESIDENCIA
Code of the borough of residence
Text
Included as a possible predictor
16
GLOSA_COMUNA_RESIDENCIA
Name of the borough of residence
Text
Not included in the model
17
REGION_RESIDENCIA
Code of the region of residence
Text
Included as a possible predictor
18
GLOSA_REGION_RESIDENCIA
Name of the region of residence
Text
Not included in the model
19
PREVISION
Patient's health insurance code at the time of admission
Number
Included as a possible predictor
20
BENEFICIARIO
FONASA beneficiary code
Text
Included as a possible predictor
21
MODALIDAD
FONASA modality Code
Number
Included as a possible predictor
22
PROCEDENCIA
Code of origin of the patient at the time of admission
Number
Not included in the model
25
ANO_EGR
Year of discharge
Number
Not included in the model
26
FECHA_EGR
Date of discharge
Date
Not included in the model
27
AREA_FUNCIONAL_EGRESO
Code of the level of care or functional area from which the patient was discharged
Number
Included as a possible predictor
28
DIAS_ESTAD
Days of total stay
Number
Variable that was the objective
29
CONDICION_EGRESO
Code of the condition at patient discharge
Number
Used to exclude discharges resulted from decease
30
DIAG1
International Classification of Diseases, 10th revision (ICD-10) code of the main diagnosis
Text
Included as a possible predictor
31
GLOSA_DIAG1
Classification of the main diagnosis
Text
Included as a possible predictor
32
DIAG2
Code of the external cause
Text
Not included in the model
33
GLOSA_DIAG2
Classification of the external cause
Text
Not included in the model
34
INTERV_Q
Surgical intervention code
Number
Used to exclude discharges without associated surgery
35
CODIGO_INTERV_Q_PPAL
FONASA main surgical intervention code
Text
Used to identify cases
36
GLOSA INTERV_Q_PPAL
Classification of the main surgical intervention
Text
Included as a possible predictor
37
PROCED
Procedure code
Number
Not included in the model
38
CODIGO_PROCED_PPAL
FONASA main procedure code
Text
Not included in the model
39
GLOSA_PROCED_PPAL
Classification of the main procedure
Text
Not included in the model
*40
% POBREZA COMUNA
Poverty rate in the borough
Number
Included as a possible predictor
Data Preparation (Sample Balance)
For the correct processing of the nominal variables, they were transformed using one-hot encoding, that is, multiple dichotomous columns that represented the existence or not of a particular characteristic for each specific hospital discharge. In terms of the processing of continuous variables, their scale was standardized in the range between 0 and 1, with 0 corresponding to the minimum value in the original data, and 1, to the maximum for each of them. Furthermore, given that there is a higher proportion of cases with 3 or more days, it was necessary to balance the training sample[23 ] following an oversampling procedure of the underrepresented class.[24 ]
Training and Testing of the Classification Algorithms
For the present study, different algorithms and hyperparameter configurations available in computer code libraries for the Python programming language were tested. In particular, seven algorithms available in the sklearn package were tested (logistic regression, decision tree classifier, linear support vector machine, naive bayes, random forest classifier, adaboost, and multilayer perceptron). Although a detailed description of the operation of each algorithm is outside the scope of the objectives of this article, the intuition behind this selection refers to the trade-off between predictive power and the possible interpretation and transparency capacity of the models created (so that the evaluation of the predictors of the model are not under the influence of the authors once they have been integrated into the project). In the literature on machine learning, it is common to group algorithms depending on whether they use systems of mathematical equations as a fundamental modeling strategy, or whether they generate computational decision rules, the latter tending to be easier to interpret. The most advanced models, for example, random forest or multilayer perceptron (a type of artificial neural network) can contain thousands of decision rules or mathematical equations, potentially having millions of parameters to estimate and interpret. Thus, the algorithms of logistic regression, support vector machines, naive bayes, and multilayer perceptron are based on systems of mathematical equations. On the other hand, the decision trees, random forest, and adaboost algorithms generate a set of computational decision rules.
As aforementioned, as the number of equations or decision rules generated by the algorithms increases, it is typically expected that the predictive performance of the algorithm improves. However, increasing the complexity of the model by adding equations or rules also increases the difficulty of human interpretation of the models created. Therefore, it is also possible to group the algorithms into “open boxes” or “closed boxes”. According to this classification, the algorithms of logistic regression, decision trees, support vector machines, and naive bayes are considered more of the “open-box” type, generating fewer or more equations according to the order in which they were listed, and the algorithms random forest, adaboost, and multilayer perceptron, as “closed boxes”, generating fewer or more decision rules according to the order in which they were listed.
In addition, due to their good level of performance in other similar binary classification tasks, an additional family of algorithms, called gradient boosting trees, was included, which would also belong to the group of “closed boxes” that generates a large number of computational rules, which was implemented through the XGBoost package (an open-source software library).
The model was tested using 80% of the available data, and the remaining 20% was reserved to confirm the predictive capabilities of the model. This part of the data is traditionally called a test sample. Additionally, a resampling process, or boostrapping of 100 iterations, was carried out in order to obtain confidence intervals of the adjustment and performance figures of the selected models.
Evaluation and Adjustment of Models
To evaluate the performance of the algorithms and predictive models, we used their discrimination power (quantified as the area under the receiver operating characteristic curve, AUC-ROC[25 ]) in the data.
The optimization metrics were evaluated and ranked using AUC-ROC, which corresponds to how well a model can distinguish between two groups. The level of discrimination was classified as excellent (0.9–1), good (0.8–0.89), fair (0.7–0.79), poor (0.6–0.69), and failed (0.5–0.59).[26 ]
Other traditional metrics for classification problems are also reported, which include: “accuracy”: the ratio of the correct number of predictions over the total samples; “average precision”: average accuracy of the predictions based on the percentage of positive predictions that are correct; “precision”: a measurement of the accuracy of a prediction based on the percentage of positive predictions that are correct; “recall”: measurements of the percentage of positive scientific predictions against possible positives in the dataset; and “F1”: harmonic average of precision and recall, with the best value being 1 (perfect precision), and the worst, 0. For each of the above, the estimated confidence intervals are also reported based on the resampling or bootstrapping procedure.
Model Report
The report of the model in the present manuscript uses international recommendations for this type of study,[27 ]
[28 ] with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) checklist.[28 ]
Results
In total, 8,970 cases were included ([Figure 1 ]): 5,662 women (63.12%) and 3,308 (36.88%) men. Their median age was of 72 years, with an interquartile range of 9 years, and a range between 65 and 97 years ([Figure 2 ]).
Fig. 1 Total hip arthroplasty due to arthrosis between 2016 and 2018 (codes 2104129 and 2104229, with ICD-10 diagnosis: M16 and its derivatives).
Fig. 2 Population pyramid according to gender for the 8970 cases of primary THA due to coxarthrosis.
The final sample included 6,746 (75.21%) FONASA patients, 1,599 (17.82%) patients from private healthcare insurers (instituciones de salud previsional, ISAPRES, in Spanish), and 625 (6.97%) patients from other health insurers. Of the FONASA patients, 286 (4.2%) were type-A beneficiaries, 4,801 (71.2%), type-B beneficiaries, 469 (6.9%), type-C beneficiaries, and 1,191 (13.3%), type-D beneficiaries. In this same group of FONASA patients, 5,321 (78.9%) were operated on under the institutional-care modality, and 1,425 (21.1%), under the free-choice modality.
The 4 most frequent diagnoses were M169 (6,124 cases; 68.27%), M161 (1,623 cases; 18.09%), M160 (862 cases; 9.61%), and M167 (176 cases; 1.96%).
The 5 most frequent boroughs of origin of the patient were Las Condes (426 cases; 4.75%), Viña del Mar (365 cases; 4.07%), La Florida (253 cases; 2.82%), Puente Alto (239 cases; 2.66%), and Santiago (235 cases; 2.62%), which corresponds to 16.92% of the total number of cases in Chile.
One hundred hospital centers performed THAs in patients with osteoarthritis in the period studied. A total of 5,133 (81.88%) cases were operated on in centers that are part to the National Health Services System, and 1,136 cases (18.12%) were operated on in private centers.
The median number of days of stay was 4, with an interquartile range of 2 days and a range between 1 and 143 days. The histogram of days of stay is shown in [Figure 3 ].
Fig. 3 Days of stay.
The days of stay categorized by type of hospital and health insurance are shown in [Figure 4 ].
Fig. 4 Days of stay according to health insurance and type of hospital center.
In total, 2,968 patients had a short stay (33.09%), and 6,002 had a prolonged stay (66.91%).
Performance of the Decision Algorithms
Eight algorithms were evaluated both in the training and the testing samples; however, these were ordered in a ranking according to their performance in the test sample. The latter is considered a better measurement of the performance of the model when applied in real scenarios. Among them, the XGBoost algorithm had the best performance, with an average AUC-ROC of 0.86 (SD: 0.0087). This means that the XGBoost algorithm had the best performance when discriminating between short and long hospital stays (longer or shorter than three days). Secondly, we observe that the Linear-SVM algorithm showed a very close AUC-ROC, of 0.8568 (SD: 0.0086), but with a lower SD.
[Table 2 ] shows the different classification metrics for each of the evaluated algorithms. Following the concept of accuracy (ratio of the correct number of predictions over the total of samples), the XGBoost algorithm was able to correctly predict 81.74% of the time when a case corresponded to a short or long stay.
Table 2
Results of the training sample
Bootstrap of 100 samples. Standard deviation is reported in parentheses
Overall accuracy
Class recall 0
Class recall 1
Class precision 0
Class precision 1
f1 score 0
f1 score 1
Area under the curve
XGBoost – Gradient-Boosted Trees
81.56%
77.44%
86.05%
84.76%
79.24%
80.92%
82.50%
90.46%
(0.86%)
(1.40%)
(1.34%)
(1.20%)
(1.00%)
(0.94%)
(0.85%)
(0.77%)
Support Vector Machines
81.19%
78.76%
83.94%
83.07%
79.81%
80.86%
81.82%
89.55%
(0.38%)
(0.62%)
(0.68%)
(0.57%)
(0.44%)
(0.39%)
(0.39%)
(0.27%)
AdaBoost
79.65%
76.79%
83.11%
81.98%
78.17%
79.30%
80.56%
88.16%
(0.43%)
(0.75%)
(0.93%)
(0.74%)
(0.47%)
(0.41%)
(0.45%)
(0.27%)
Logistic Regression
81.13%
78.32%
84.37%
83.37%
79.56%
80.76%
81.89%
89.62%
(0.42%)
(0.61%)
(0.79%)
(0.68%)
(0.44%)
(0.42%)
(0.45%)
(0.27%)
Random Forest
79.40%
74.91%
83.68%
82.15%
76.96%
78.34%
80.16%
86.99%
(1.15%)
(2.07%)
(1.88%)
(1.62%)
(1.44%)
(1.37%)
(1.20%)
(0.91%)
Neural Net – Multilayer Perceptron
89.99%
91.03%
88.79%
89.04%
90.84%
90.02%
89.80%
97.19%
(0.50%)
(1.21%)
(0.69%)
(0.57%)
(1.09%)
(0.62%)
(0.54%)
(0.31%)
Decision Tree
66.04%
63.32%
68.33%
74.35%
70.46%
61.45%
64.69%
74.05%
(2.33%)
(27.95%)
(25.14%)
(14.06%)
(10.91%)
(13.47%)
(8.31%)
(2.03%)
Naive Bayes
65.07%
38.05%
94.97%
88.33%
60.56%
53.07%
73.94%
67.51%
(1.60%)
(3.89%)
(0.68%)
(0.89%)
(1.38%)
(3.81%)
(0.89%)
(1.73%)
Test Sample Results
Bootstrap of 100 samples. Standard deviation is reported in parentheses
Overall accuracy
Class recall 0
Class recall 1
Class precision 0
Class precision 1
f1 score 0
f1 score 1
Area under the curve
XGBoost – Gradient-Boosted Trees
81.74%
75.62%
80.23%
88.56%
61.97%
81.56%
69.90%
86.01%
(0.87%)
(1.60%)
(2.24%)
(1.19%)
(1.73%)
(0.92%)
(1.31%)
(0.87%)
Support Vector Machines
81.35%
77.21%
78.81%
88.05%
63.12%
82.26%
70.07%
85.68%
(0.37%)
(1.40%)
(1.98%)
(1.08%)
(1.86%)
(0.90%)
(1.48%)
(0.86%)
AdaBoost
79.95%
75.81%
79.98%
88.45%
62.06%
81.63%
69.87%
85.55%
(0.40%)
(1.33%)
(1.81%)
(0.99%)
(1.61%)
(0.83%)
(1.26%)
(0.90%)
Logistic Regression
81.34%
76.60%
78.49%
87.81%
62.40%
81.81%
69.51%
85.16%
(0.43%)
(1.33%)
(1.88%)
(1.03%)
(1.73%)
(0.87%)
(1.39%)
(0.90%)
Random Forest
79.30%
72.70%
77.43%
86.70%
58.43%
79.06%
66.56%
82.32%
(1.23%)
(2.32%)
(2.88%)
(1.54%)
(2.33%)
(1.56%)
(2.04%)
(1.36%)
Neural Net – Multilayer Perceptron
89.91%
82.12%
64.44%
82.37%
64.07%
82.24%
64.23%
82.07%
(0.58%)
(1.16%)
(2.43%)
(1.13%)
(1.77%)
(0.81%)
(1.70%)
(0.95%)
Decision Tree
65.82%
62.70%
66.65%
83.63%
53.84%
66.05%
54.06%
72.58%
(2.47%)
(28.09%)
(25.86%)
(8.78%)
(12.33%)
(17.75%)
(4.52%)
(2.15%)
Naive Bayes
66.51%
36.80%
90.04%
88.14%
41.39%
51.81%
56.69%
64.35%
(1.70%)
(4.05%)
(1.36%)
(1.63%)
(1.73%)
(4.14%)
(1.59%)
(1.94%)
To also inquire about the relative importance of the explanatory variables, the importance score assigned by the algorithm to the thirty most important variables is reported in [Figure 5. ] In this sense, the fact that the region of residence, the health service, the health center where the patient was operated on, and the care modality are the variables that most determine the length of stay of patients.
Fig. 5 Relative importance of the 30 most important variables of the model for length of stay.
[Figure 6 ] shows a representative classification tree of the XGBoost algorithm.
Fig. 6 A representative classification tree of the XGBoost algorithm.
Discussion
Our research project successfully developed and validated a model to predict the length of hospital stay in Chilean patients over 65 years of age undergoing THA using artificial intelligence in its machine learning modality and big data of Chilean origin. The XGBoost algorithm had the best predictive performance by discriminating when the hospital stay is classified as shortened or prolonged (longer or shorter than three days). We also found that the five most important factors in this prediction, all freely accessible in the ministerial database, are the region of residence, the health service, hospital, and the FONASA modality of care. The accuracy of the algorithm in terms of classification is good.
According to Ramkumar et al.,[29 ] machine learning could be described as a software that perform tasks automatically based on a data source without an explicit programming. This technology has rapidly been incorporated into medicine, and it represents the natural extension of traditional statistical methods. Specifically in the arthroplasty literature, there are several recent publications that use machine learning to create prediction models of length of hospital stay and payments related to surgeries,[29 ] probability of complications,[26 ] satisfaction[30 ] etc. All of these publications, like the present one, use extensive databases that can be considered big data.[31 ]
Our study has several limitations and some notable aspects. The first limitation is that it is a registry study; therefore, there is a possible collection and coding bias that could finally alter the results, especially considering that the ICD-10 and FONASA codes are used to identify the studied cases. Despite this observation, we believe that since it is a ministerial database, with all the rigor that this implies, it is solid enough to overcome this limitation. Secondly, none of the database studies contains enough information at the patient level.[32 ] This is especially important in our work, considering that most of the studies carried out in the Northern Hemisphere using this methodology use variables at the patient level, including comorbidities and, in some cases, functionality.[16 ]
[26 ]
[30 ] We consider this to be the main flaw in our work; however, the database used is the only one that allowed us to freely access big data at the national level. Despite this observation, it is necessary to emphasize that the role of the individual characteristics of the patient may not be the most relevant one in explaining the length of hospital stay in elective arthroplasty. Kang et al.[33 ] demonstrated, in a series of two thousand patients, that the main determinants of prolonged stay in arthroplasty are social factors: admission to hospital the day before surgery, and late start with postoperative rehabilitation. In paralel, Burn et al.[34 ] showed that, although the individual factors of the patients are relevant to explain the length of hospital stay in arthroplasty, between 1997 and 2014 in the United Kingdom, a reduction in the length of stay was achieved due to the improvement in the efficiency of the practices, given that the profile of the patients remained stable. Further reinforcing the fact that the individual characteristics of the patients are secondary when explaining variability at the time of hospital discharge, the Cleveland Clinic OME Arthroplasty Group demonstrated (using American big data) that, in elective THA patients, “while the factors related to patients explain some variation in the hospital stay, the main culprits are the factors related to the procedure, specifically the hospital”[35 ] where the patient was operated on, with the surgical approach used also having a determinant role. This mentioned evidence helps to understand the results of our study and to weigh the lack of individual variables as a non-critical limitation of our model. Thirdly, considering that the COVID-19 pandemic could have influenced the practice of THA[11 ] in Chile in terms of its postoperative period and earlier discharge from hospital,[12 ]
[36 ] we believe that the data corresponding to the years 2016-2018 may not be completely representative of the scenario that we are going to experience in 2021. However, the fundamentals of our algorithm can be used to evaluate the results of after THA hospital discharges registered for the year 2020 and beyond.
The question that arises is: is this calculator useful in our scenario? The evaluation of the possibility of early or late discharge from a highly-frequent surgery guaranteed by law is of total relevance in public policies. Calculating the different possibilities of early discharge for a FONASA patient who undergoes surgery in hospital A versus hospital B, or clinic X, is useful to visualize the variability that exists in practices. When generating bundled-payment models, it is important to predict whether the patient operated on in Hospital A will have a longer hospital stay than in Hospital B. The usefulness of the “bedside” calculator may be limited by the absence of free-access clinical big data in Chile, but, on the other hand, the usefulness from the perspective of evaluating the performance of institutions is very high. As we stated in the objectives of the study, the identification of groups with a high probability of a shortened stay (certain patients in some hospitals) can help institutions to further improve their practices. On the other hand, the identification of hospitals that are not efficient in the management of their hospital stays may help the authorities to allocate resources in order to improve their practices.
Regarding the strengths of our study, we believe that the first and most important is the achievement of a multidisciplinary effort involving four experts, two of them surgeons and two engineers with formal education in artificial intelligence, who performed the first study involving big data and artificial intelligence in our specialty in Chile.
Conclusion
In the present study, we developed machine-learning algorithms based on free-access Chilean big data, and we were able to validate a tool that demonstrates an adequate discriminatory capacity to predict the probability of a shortened versus prolonged hospital stay in elderly patients undergoing THA for osteoarthritis.