Keywords
training - educational program - simulation - guidelines and protocols - emergency
Background and Significance
Background and Significance
Trauma is an unpredictable medical emergency that requires a rapid response. Reducing the time to treatment to reduce the number of preventable deaths that occur during the first peak of trauma deaths remains a challenge.[1]
[2] To provide this treatment, trauma specialists must respond immediately without hesitation. Therefore, proper training is required.[3]
[4]
[5]
[6]
As noted by Wurmb et al,[7] severely injured patients are those who need treatment that best avoids preventable errors because they have a direct impact on mortality and morbidity. Therefore, a standardized educational program is required for these patients. Advanced Trauma Training and Life Support (ATLS), developed by the American College of Surgeons Commission on Trauma, is recognized and practiced worldwide.[8] Additionally, there is evidence that this training provides techniques and skills for the rapid and efficient treatment of trauma patients, but the principles need to be translated into local contexts.[9]
[10]
[11]
Given the potential for unpredictable situations, there is a debate about the adequacy of trauma management guidelines and protocols.[12]
[13]
[14]
[15] Nevertheless, there are studies that show clear improvements in trauma management after following trauma guidelines, clinical pathways, or protocols.[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23] These are three levels of standardized procedures, from more general guidelines first, to more detailed clinical pathways, then to protocols.[12]
[24]
[25]
[26]
[27]
[28] However, what has not been discussed is the ability to study deviations from defined protocols to improve training, build flexibility in protocols, and constantly evaluate whether changes in analyzed protocols have been considered.[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36] How to develop and implement these standardized procedures is an important aspect that should be addressed[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]; nevertheless, only a few studies have focused on their evaluation.[27]
[45]
[46]
[47]
[48]
[49]
Objectives
Since trauma management includes simulation as an important part of training, the inclusion of an objective assessment system in simulation is a clear need.[50] The use of simulators to learn and more specifically practice standard trauma procedures is already a reality. New technologies make it possible by incorporating new modalities, as the current simulation environment is quite diverse. Therefore, different simulation modalities would allow different medical aspects to be learned and practiced.[51]
[52]
[53]
[54]
[55]
[56] A web-based trauma simulator shows flexibility and a good possibility of implementing standard procedures that can be practiced by a large number of participants simultaneously, and which would allow the incorporation of an objective assessment system, which is a clear need in simulation.[57]
[58]
[59] To do this, it is important to consider all possible treatments for one patient and keep in mind that some actions may be equivalent, similar, or completely different.
Current assessment methods available within simulation consider written assessment checklists in which simulation information is not automatically linked. Thus, the objective of this article is to develop and implement an objective evaluation system within a web-based trauma simulator to obtain objective information about how the simulation is performed in real time.
Methods
Several aspects should be considered when evaluating how a sequence of actions is performed in a simulation environment. On one hand, the number of correct and incorrect actions is important, but on the other hand, the order of performing different actions is important. Therefore, a distance metric should be defined to measure the distance between the sequence of actions performed by the trainee and the sequence of actions that should be performed according to the trauma protocol. There are several distance measures: edit distance, token-based distance, and sequence-based distance.[60] Edit compares the distance between two strings by counting the minimum number of operations required to convert one string to another. Token-based distance compares two strings by checking string units (tokens). The sequence-based distance of two strings compared them by examining different sequences of the strings. From these criteria, it was decided to focus on the Needleman–Wunsch algorithm.[60] This is because it is an edit distance algorithm that allows for different considerations: if there are matches, mismatches, or gaps between two sequences. This gives us the flexibility to tailor the algorithm to compare two trauma management sequences. In addition, two other metrics were established. One is the diagonal score (DS), which provides information about the correct sequence of actions, and the other is the subsequence score (SS), which provides information about the number of subsequent sequences of actions performed during the simulation.
Modified Needleman–Wunsch algorithm
Initially, the Needleman–Wunsch algorithm appeared as a dynamic programming algorithm that would provide a global solution to the problem of comparing two amino acid sequences.[61] This global alignment (GA) would allow the distance between two sequences to be measured. This algorithm consists of three steps: the first is to initialize the score matrix, the second is to calculate the scores to construct the traceback matrix, and finally, the third step is to derive the best alignment from the traceback matrix.
Following this algorithm, a modification has been included considering that some actions may have the same impact on the patient's vital signs, some others may have a similar impact, some others may be completely opposite, and some others may just be different. All these options are included in the algorithm to create a traceback matrix that would then allow finding the best alignment. The way the score is calculated to obtain the traceback matrix, D (i, j), is as follows: (Eq. 1)
being s (xi, yi) different scores with respect to four different possibilities: actions that match, S
match
, actions that are equivalent and therefore could be swapped, S
swap
, actions that are opposed, S
contrary, or actions that are just different and, therefore, they are considered a mismatch, S
mismatch. The score provided to a gap is S
g.
Assuming that the sequence of actions to be performed is: “5 7 8 6 3 2 4 1 9 6” and that a trainee performed the following sequence: “7 4 1 3 2 8 6 9” as shown in [Fig. 1]; the traceback matrix is built ([Fig. 2]) considering the values of the scores mentioned above and the best alignment obtained is shown in [Fig. 3].
Fig. 1 Example of the sequence to accomplish: “5 7 8 6 3 2 4 1 9 6” and the sequence of actions performed by the trainee “7 4 1 3 2 8 6 9.”
Fig. 2 Example of the traceback matrix calculation and the best alignment obtained for two sequences: “5 7 8 6 3 2 4 1 9 6,” which is the sequence that should be done and “7 4 1 3 2 8 6 9,” which is the sequence of actions performed by the trainee.
Fig. 3 Best sequence alignment obtained after applying the modified Needleman–Wunsch algorithm for two sequences: “5 7 8 6 3 2 4 1 9 6,” the ideal one and “7 4 1 3 2 8 6 9,” the one performed by the trainee.
Therefore, the best sequence alignment is [Fig. 3]. The blue color in [Fig. 3] means that a gap has been introduced in the sequence, which means that nothing has been done or should not be done to better fulfill the sequence of actions that the student should take. The green color in [Fig. 3] means that the actions between the two sequences match, and the dark red color means that there is a mismatch. In the best alignment path shown in [Fig. 2], the vertical arrows indicate gaps entered in the upper level of the matrix, diagonal arrows indicate that there is a match or mismatch, and horizontal arrows indicate gaps introduced in the sequence on the left side of the matrix. Then in this example, the best sequence alignment has four matches, one mismatch, and eight gaps. The maximum score is given to the match and then the swap. Later, if there is a gap, the score provided is higher than the contrary score and the mismatch score as when dealing with trauma patients, doing something that should not be done is worse than doing nothing. Ultimately, the worst-case scenario is taking an action that has the opposite impact to what the patient should have. A total of 15 trauma experts were asked to define the four categories of actions above. For example, actions that are considered equivalent are “oxygenate the patient with an oxygen mask,” “oxygenate with a self-inflating bag,” and “use an oropharyngeal airway.”
Therefore, all scores set to fulfill the traceback matrix shown in [Fig. 2] follow the criteria: S
match
> S
swap
> S
gap
> S
contrary
> S
mismatch.
Then, once the traceback matrix is created and the best sequence for comparison is obtained, a score is provided for that comparison. This score is called the GA score and is calculated as follows: (Eq. 2)
Eq. 2:
GA = n
match
S
match
+ n
swap
S
swap
+ n
gap
S
gap
+ n
mismatch
S
mismatch
+ n
contrary
S
contrary (2)
where n
match, n
swap, n
gap, n
mismatch, and n
contrary are the number of matches, swaps, gaps, mismatches, and contrary actions that exist within the best alignment. Given the maximum and the minimum possible punctuation values, this score is normalized to GA ϵ [−1, 1]. A negative value of this score means that the two sequences are different, and a positive value means that they are similar. The higher the value, the more different and the more alike are the two sequences. Therefore, this score provides information on how the trainee performed the simulation, considering all the options above.
Diagonal Score
This score is created to provide information about the correct actions taken during the simulation. If an action is executed at the correct time, it will receive more points than if it is executed at any other time during the simulation. Therefore, a score matrix, S (i, j), is built representing the actions performed by the trainee as rows and the actions to be performed as columns. Then, if the two actions match, a score of 1 is entered into the matrix, otherwise zero is entered as shown in Eq. 3.
(Eq. 3)
seq1 is the sequence of actions performed by the trainee and seq2 is the sequence of actions that should have been performed. [Fig. 4] shows an example of how to construct a score matrix. The sequence “7 4 1 3 2 8 6 9” is performed by the trainee and the sequence “5 7 8 6 3 2 4 1 9 6” is the one that should have been performed ([Fig. 4]).
Fig. 4 Example of a score matrix for two sequences: “5 7 8 6 3 2 4 1 9 6” and “7 4 1 3 2 8 6 9,” in which S (i, j) is 1 when the two sequences are identical and 0 when they are not.
To calculate the DS, the values in the score matrix called individual scores are summed along the diagonal of the matrix, diagi
, being i the number of the diagonals. Following this same example, the values of the diagonals are shown in [Fig. 5] with the subsequent values: diag−7
= 0, diag−6
= 0, diag−5
= 0, diag−4
= 0, diag−3
= 2, diag−2
= 0, diag−1
= 0, diag0
= 0, diag1
= 3, diag2
= 0, diag3
= 1, diag4
= 0, diag5
= 2, diag6
= 0, diag7
= 0, diag8
= 0 diag9
= 0 ([Fig. 5]).
Fig. 5 Diagonals used to sum the individual values of the score matrix, S (i, j), for two sequences: “5 7 8 6 3 2 4 1 9 6” and “7 4 1 3 2 8 6 9.”
Then, they are all squared and summed up together according to Eq. 4 to obtain the DS:
(Eq. 4)
DS = ∑
n
i
=0 diag
i
2
i
2 (4)
being i the number of diagonals from 0 to n. Considering that the maximum DS is obtained when the sequences are identical, this score is normalized. The values of this score: DS ϵ [0, 1]. As the values provided are always positive, the maximum value obtained when two sequences are identical is 1 and the value 0 means that they are completely different.
Subsequences Score
This score focuses on identifying the correct subsequences performed by the trainee and the length of each subsequence. As mentioned for DS, a matrix S (i, j) is built with both sequences containing a 1 when two actions match and a 0 when they do not match. This SS identifies the number of actions that are executed in order and join them together until an action that should not be executed is found. This is done by comparing the value of S (i, j) and the value of S (i + 1, j + 1) according to Eq. 5.
(Eq. 5)
The actions that fulfill the first condition are included into a subsequence vector until the value of S (i + 1, j + 1) = 0, which means that the values of two sequences are not the same. When this is the case, the subsequence ends, and the algorithm tries to find other subsequences that might appear. By applying this algorithm in the example presented, the subsequences identified are: [4, 1], [3, 2], and [8, 6]. Therefore, as shown in [Fig. 6], there are three subsequences with a length of two ([Fig. 6]).
Fig. 6 The sequence of actions performed by the trainee: “7 4 1 3 2 8 6 9” finds a first subsequence comparing it with the actions that should have been performed: “5 7 8 6 3 2 4 1 8 6” when he or she performs the actions [4,1], then a second one [3,2] and finally, the actions [8,6].
Then, SS is calculated as follows: (Eq. 6)
being i the number of subsequences from 0 to n. The value of this score is also normalized to: SS ϵ [0, 1]. As for the DS, considering the values provided to the score matrix and that the length of the sequences is always positive, a 1 means that there is a unique subsequence which matches entirely with the sequence of actions that should have been done, and 0 means that there is not a single subsequence in the actions performed by the trainee.
Pilot Study
Once these scores are set, a pilot study is accomplished using the web-based simulator developed in Larraga-García et al.[62] This pilot study was accomplished with 24 participants: 14 final-year medical students from Universidad Autónoma de Madrid and 10 first-year residents from Hospital Universitario La Paz. On the first day, they were asked to manage four different trauma scenarios. Then, for the next 2 weeks, the participants trained with the simulator at least one simulation per day. After each simulation, the trainee could download a pdf document with all the steps taken as well as their impact along the simulation. Finally, the first four different trauma scenarios were repeated. Within this pilot study, 91 simulations were performed before training with the web-based simulator and 66 simulations were repeated after the training. This difference is due to the fact that not all the trainees performed all the scenarios after the training period. Then, the scores previously defined are obtained for the simulations pre- and post-training. To compare the evolution, a Wilcoxon's rank-sum test was used, and statistical significance is obtained when the p-value is lower than 0.05.[63]
Results
Once the pilot study was accomplished, all the data were gathered and analyzed. The four trauma scenarios trained were: a prehospital pelvic trauma, a hospital pelvic trauma, a prehospital lower limb trauma, and a hospital lower limb trauma. All the results are shown comparing the pre-training simulation, which is shown in blue, and the post-training simulation, which is in orange. From the 91 simulations performed pre-training, 26 are prehospital pelvic, 22 are hospital pelvic, 23 are prehospital lower limb, and 20 are hospital lower limb scenarios. From the 66 simulations performed post-training, 17 are prehospital pelvic, 17 are hospital pelvic, 16 are prehospital lower limb, and 16 are hospital lower limb scenarios.
Modified Needleman–Wunsch Algorithm
The evolution of the GA score obtained with the modified Needleman–Wunsch algorithm, pre- and post-training, is shown in [Fig. 7].
Fig. 7 Global alignment scores pre- and post-training for different trauma scenarios: (A) prehospital pelvic trauma scenario, (B) hospital pelvic trauma scenario, (C) prehospital lower limb trauma scenario, and (D) hospital lower limb trauma scenario.
This score remains practically the same in [Fig. 7(a], c), having the same median value in the first case and a slightly lower value in the second one. In [Fig. 7a], the median values are zero, whereas in [Fig. 7c], the median values are negative. Nevertheless, in [Fig. 7b], a drop is shown in the GA score being in both, pre- and post-training, a negative value. Finally, in [Fig. 7d], a clear improvement is shown pre- and post-training showing statistical significance with a p-value of 0.025. All the relevant values are shown in [Table 1], in which the median values, the interquartile ranges, the statistics, and the p-values are presented.
Table 1
Main results of the global alignment scores for all the trauma scenarios
|
Median
|
Q1
|
Q3
|
Stats
|
p-Value
|
Pre/post
|
Prehospital pelvic
|
0.0
|
−0.06578
|
0.10045
|
0.36343
|
0.14660
|
Pretraining
|
|
0.0
|
−0.13544
|
0.04574
|
|
|
Posttraining
|
Hospital pelvic
|
−0.02542
|
−0.15477
|
0.10541
|
1.61158
|
0.20426
|
Pretraining
|
|
−0.18758
|
−0.20147
|
−0.01478
|
|
|
Posttraining
|
Prehospital lower limb
|
−0.10445
|
−0.14978
|
0.01445
|
1.88409
|
0.69861
|
Pretraining
|
|
−0.11254
|
−0.15007
|
−0.03014
|
|
|
Posttraining
|
Hospital lower limb
|
−0.17585
|
−0.20144
|
−0.02504
|
2.88435
|
0.02500
|
Pretraining
|
|
−0.09887
|
−0.11578
|
0.02355
|
|
|
Posttraining
|
Diagonal Score
The DS improves in all scenarios except in the prehospital lower limb scenario in which it slightly drops. In [Fig. 8(a], b, d), the DS improves showing statistical significance only in the hospital lower limb scenario ([Fig. 8d]), with a p-value of 0.03. In the case in which the DS decreases the median value, it drops only from 0.0468 to 0.0443. All the relevant values are shown in [Table 2], in which the median values, the interquartile ranges, the statistics, and the p-values are presented.
Table 2
Main results of the diagonal scores for all the trauma scenarios
|
Median
|
Q1
|
Q3
|
Stats
|
p-Value
|
Pre/post
|
Prehospital pelvic
|
0.04687
|
0.03472
|
0.05769
|
0.36343
|
0.54660
|
Pretraining
|
|
0.05228
|
0.04273
|
0.06923
|
|
|
Posttraining
|
Hospital pelvic
|
0.04769
|
0.04127
|
0.06201
|
0.01747
|
0.89483
|
Pretraining
|
|
0.05128
|
0.03921
|
0.06070
|
|
|
Posttraining
|
Prehospital lower limb
|
0.04687
|
0.03125
|
0.05555
|
0.02078
|
0.88537
|
Pretraining
|
|
0.04437
|
0.03375
|
0.05236
|
|
|
Posttraining
|
Hospital lower limb
|
0.03703
|
0.03047
|
0.04887
|
4.69810
|
0.03019
|
Pretraining
|
|
0.05428
|
0.03928
|
0.07164
|
|
|
Posttraining
|
Fig. 8 Diagonal scores pre- and post-training for different trauma scenarios: (A) prehospital pelvic trauma scenario, (B) hospital pelvic trauma scenario, (C) prehospital lower limb trauma scenario, and (D) hospital lower limb trauma scenario.
Subsequences Score
For the subsequences score, the median value improves in all cases in the post-training simulation. However, [Fig. 9a] shows, in some simulations, that the subsequences score values are better in the pre-training simulation than in the post-training one. Nevertheless, the median value increases from 0 to 0.125 in the post-training simulation. . All the relevant values are shown in [Table 3], in which the median values, the interquartile ranges, the statistics, and the p-values are presented.
Table 3
Main results of the subsequences scores for all the trauma scenarios
|
Median
|
Q1
|
Q3
|
Stats
|
p-Value
|
Pre/post
|
Prehospital pelvic
|
0.0
|
0.0
|
0.25
|
0.36343
|
0.54660
|
Pretraining
|
|
0.125
|
0.0
|
0.13333
|
|
|
Posttraining
|
Hospital pelvic
|
0.0
|
0.0
|
0.09090
|
1.99754
|
0.15755
|
Pretraining
|
|
0.125
|
0.0
|
0.16666
|
|
|
Posttraining
|
Prehospital lower limb
|
0.0
|
0.0
|
0.0
|
1.17371
|
0.27863
|
Pretraining
|
|
0.0
|
0.0
|
0.1357
|
|
|
Posttraining
|
Hospital lower limb
|
0.039
|
0.035
|
0.0498
|
0.0
|
1
|
Pretraining
|
|
0.04
|
0.0
|
0.1357
|
|
|
Posttraining
|
Fig. 9 Subsequences scores pre- and posttraining for different trauma scenarios: (A) prehospital pelvic trauma scenario, (B) hospital pelvic trauma scenario, (C) prehospital lower limb trauma scenario, and (D) hospital lower limb trauma scenario.
Discussion
The obtained results show that the simulations improve after training with the web-based simulator. Nevertheless, improvements are clearly seen in the diagonal and SSs, whereas in the GA scores these improvements are not so clear. This may be because the diagonal and SSs consider only a unique aspect. The DS analyzes the correct sequence of actions, and the SS examines the number of correct sequences and their length. However, the GA score takes into account several aspects. This score provides additional information that can be provided as real-time feedback to trainees such as when and in what order to perform the actions. The web-based simulator used for this pilot study did not provide this real-time feedback to trainees during the simulation, which may explain the poor improvement achieved for this score.
Hospital Pelvic Trauma Scenario
Interestingly, for hospital pelvic trauma, the DS increases, meaning more actions are performed correctly at the right time, but the GA score decreases. This is because even if the right actions and the right moments are improved, there is still a significant number of mismatches, gaps, or contrary actions. It is important to emphasize that mismatches, gaps, or contrary actions have negative score values in the GA score. Thus, obtaining a negative GA score provides information about the balance between positive actions, similar and equivalent actions taken, and negative actions, opposite actions, mismatches, and gaps. Indeed, the Needleman–Wunsch literature provides different score values for match, mismatch, and gap penalties, but there are no guidelines that provide recommendations on which criteria should be used. In general, a negative value for the GA score means that the two sequences are different, and a positive value means that they are similar. In addition, the inclusion of different scores for matches, swaps, gaps, contrary, and mismatches in the GA score provides more comprehensive information for trauma management. Furthermore, it is important to include this information in web-based simulators. This provides real-time information about student performance, highlights errors, and suggests what to do next. Integrating this information into the simulator supports the learning rate of the trauma protocol. The DS and SS provide information not included in the GA score, providing a more comprehensive assessment of the simulation performance according to predefined clinical trauma guidelines.
Prehospital Pelvic Trauma Scenario
Considering the prehospital pelvic trauma scenario, there are improvements on the diagonal and the SSs, indicating better trainee performance in the post-training simulations. However, no statistical significance is obtained. Furthermore, the GA score has not improved. This improvement is less clear compared with the hospital pelvic trauma scenario, as there is a small improvement in the DS and the GA score worsens. Nevertheless, there is an improvement on the SS, but it does not show statistical significance. Then, an analysis of these simulations was performed with trauma specialists considering that there was an improvement on how well the trainees performed after the training. Nevertheless, incorporating all the information provided by these new metrics would support the trauma training process. This allows us to identify areas that need further reinforcement to enhance learning of trauma management protocols.
Prehospital Lower Limb Trauma Scenario
The prehospital lower limb trauma scenario decreases on the GA and DSs, but slightly improves SSs. This contrasts with the apparent improvement in hospital lower limb trauma scenario.
Hospital Lower Limb Trauma Scenario
In this scenario, the improvement is statistically significant for global and diagonal alignment scores, with p-values of 0.014 and 0.03, respectively. This is the only trauma scenario where a statistical difference is observed. This means that hospital management for lower limb injuries may be better learned whereas the prehospital setting must be improved.
Conclusion
The goal of this article was to develop and implement an objective evaluation system obtaining real-time objective information. To do so, new metrics have been developed. This supports the need to include, in clinical simulation, an objective evaluation system and, additionally, to provide valuable and objective information about how the simulation has been performed.
These metrics have been successfully built and implemented in a web-based simulator to evaluate trauma protocols to treat two specific trauma injuries: pelvic and lower limb lesions and they could be easily adapted for other types of traumas.
Additionally, these metrics could be used to provide real-time information to trainees during the learning process, providing information with respect to the actions that are equivalents or the ones that are not done in order, amongst other. This would make the simulation a self-explanatory learning tool that could be adapted to different levels of expertise. Moreover, these metrics are a powerful tool for trainers as they could objectively evaluate a simulation with all the data coming directly from the simulator.
Nonetheless, this pilot study should be extended to a larger community and to other trauma injuries to continue analyzing the evolution of the trauma management learning process. Additionally, with a larger sample of studies, a deeper statistical analysis should be performed.
Clinical Relevance Statement
Clinical Relevance Statement
Trauma management training is of key importance considering that trauma is one of the leading death causes worldwide. Trauma deaths have followed a classical trimodal distribution and even though the epidemiology of these deaths has changed since the year 2000 toward a bimodal distribution, the number of immediate deaths is still quite high. Therefore, trauma training remains to be a necessary task. There are several trauma trainings that have been created since the ATLS started in 1978 which use different simulation modalities to achieve the goal to train on trauma management skills. Considering that not so many studies in the literature were found incorporating an objective evaluation system, this research presents metrics to evaluate trauma management learning. This allows us to, objectively and automatically, evaluate knowledge acquisition. Therefore, incorporating this information in real time would help encourage trainees and help them acquire the skills more efficiently.
Multiple Choice Questions
Multiple Choice Questions
-
Which is the third step of the Needleman–Wunsch algorithm?
-
The initialization of the score matrix
-
The calculation of the individual scores
-
The traceback matrix
-
To find the best alignment
Correct Answer: The correct answer is option d. The calculation of the individual scores can be changed by together with the traceback matrix is the second step as to di one, implies to accomplish the other one. Therefore, the third step is to find the best alignment.
-
Which type of action evaluates if the trainee did not perform an action that should have been performed?
-
A mismatch action
-
A contrary action
-
A gap action
-
A swap action
Correct Answer: The correct answer is option c. A gap means that either nothing has been done in the sequence of actions performed by the trainee when he or she should be doing something according to the trauma management protocol to evaluate.
-
Which is the score that provides information about how well the actions are accomplished with respect to the correct timing in which they should be done?
-
Subsequence score
-
Diagonal score
-
Match score
-
Swap score
Correct Answer: The correct answer is option b. The diagonal score is built to provide information with respect to the correct actions accomplished along the simulation considering that, if they are done at the correct timing, the score will be higher than if they are done in a different moment.