Skip to content

Machine Learning Decision Support Tool For Trauma Activation Level



Professor Ajay Anand, Deputy Director, Goergen Institute of Data Science, University of Rochester


Nicole A. Wilson, PhD, MD, University of Rochester Medical Center, Department of Pediatrics


With the advent of technology based decisions in the health sector, there has been immense improvements in terms of accuracy and time management and reduced dependencies on staff members. Determining Triage of patients with traumatic injuries is one such use cases involving complex decision-making. Our machine-learning based model (Ensemble technique) is designed to process pre-treatment inputs from patients to classify them as Full activation (critical patients) or partial activation (general patients), and thus utilising appropriate resources for their treatment. We are able to attain a satisfiable level of Over-triage (False Positive Rate) and Under-triage (False Negative rate) error rate when compared to ED (Emergency Department) staff practitioners, along with their meaningful comparative insights based upon demographics (age), time of the day and mechanism of the injury.



Trauma centers exist to provide immediate care to injuries that require immediate attention. In order to ensure patients are treated with a level of care appropriate to their injury, practitioners must make decisions about how to prioritize the resources allocated to each admission. Ideally, these decisions would be impartial and unaffected by emotional responses of practitioners. Achieving this goal would not be feasible with human judgement alone: unavoidable discrepancies in decision-making tendencies and the sensitivity of different individuals to different factors that play into deciding how to prioritize patient care cause inconsistency in how trauma center resources are allocated. Because of this, implementation of a machine learning-based decision support tool to aid practitioners has been recommended.


Our goal was to use machine learning techniques and a dataset containing patient information to develop a classification tool for assigning an appropriate level of care to a patient upon arrival at the ED.


The raw data is provided by the sponsor. Data contains 10959 patient records admitted to ED from 2014 to 2021. There are over 200 features in the raw data that are related to the patient’s health history, and pre-hospital treatments. These features include patient demographic information, mechanism of injury, prehospital interventions, comorbidities of patients, ED procedures performed for patients, and so on.

As to the ground truth variable, the data includes two of them: the Cribari variable and the NFTI variable. They show the level of care the patient needs. The ground truth variables are determined after the patient’s treatment using different criteria. Neither of the ground truth variables is optimal. The Cribari variable is available in all records in the data whereas the NFTI variable is not available in records from 2014 to 2019.

Both ground truth variable is binary. ‘1’ is ‘Full Activation’ which means the patient needs a high level of care .‘0’ is ‘Partial Activation’ which means the patient does not need a high level of care.

Data Pre-Processing

Categorical: One-Hot Encoding

Null Values: Removed features with more than 70% null values and imputed the remaining with Multiple Imputation by Chained Equations (MICE) algorithm.

Normalization: Min-Max Scaling

Basic Feature Selection: Removed all post-treatment related features.

Exploratory Data Analysis

Effect of Age: Undertriage rates in ED for patients below median age of 47.57 years significantly lower than for patients above median age at significance level of 0.05

Effect of Time of Day: Practitioners tend to undertriage significantly more often during the daytime (6 am to 6 pm) based on a two-sample proportion test.

Effect of Mechanism of Injury: There is a significant difference between triage levels based upon mechanism of injury if decision is taken by Practioners. While the model gives a pretty stable result.

Feature Selection

Final Model


Two-sample proportion tests were performed at the alpha=0.05 significance level for each metric. Each test showed that the model classification performed better than ED staff classification for the metric being studied.

Model Explainability

In order to explain the model prediction to the end user with no technical knowledge, we used LIME (Local Interpretable Model-agnostic Explanations), which performs small perturbations on model input to see which affect output the most.

Future Work

Trauma Centre Specific Threshold: Determine method to find optimal activation threshold for trauma centers with different cost/benefit effect from under/over triage.

Including Feedback from PractionersSome features might make practically more sense in determining the trauma level, and thus should be made readily available for modelling.

Working upon larger dataset: More data is currently being labeled with ground truth activation levels – our modeling pipeline will be available for training to be performed with this data.


To the best of our knowledge, this is the first study that provided a detailed analysis on multiple factors like mechanism of injury, age, time of the day, etc. affecting the decision-making process for determining the trauma activation level. Also, the inclusion of only pre-treatment related features for decision making and achieving an acceptable success criteria when compared with the human prediction by the ED Staff, stands us apart.


The team would like to thank Dr. Nicole Wilson for sponsoring this Capstone Project and the respective Staff members from the Department of Pediatric Surgery of URMC for curating the data. The team would also like to thank Prof. Ajay Anand to mentor us throughout the process.


  1. Mohan, Deepika & Barnato, Amber & Rosengart, Matthew & Angus, Derek & Smith, Kenneth. (2012). Optimal Approach to Improving Trauma Triage Decisions: A Cost-Effectiveness Analysis. The American journal of managed care. 18. e91-e100.
  2. M. Schellenberg, ”Trauma Team Activation: Optimizing Prehospital Triage of the Injured Patient”. The American Association for the Surgery of Trauma. trauma-team-activation-optimizing-pr#. Accessed 4/27/2023.
  3. M. Ribeiro and S. Singh and C. Guestrin. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016.
  4. Azur, Melissa J et al. “Multiple imputation by chained equations: what is it and how does it work?.” International journal of methods in psychiatric research vol. 20,1 (2011): 40-9.
Return to the top of the page