Authors
- Anthony Corbett
- Jennifer Dutra
- Jeewoo Park
- Jocelyn Wood
Sponsor
Jinjiao Wang, Ph.D., RN
Instructors
Ajay Anand, Ph.D.
Cantay Caliskan , Ph.D.
Abstract
This project aimed to identify which older adults are most at risk of being rehospitalized after receiving home healthcare, with a focus on those taking multiple medications. The team analyzed data from over 6,800 patients, cleaning and standardizing medication and diagnosis information, then using clustering to group patients and medications into meaningful categories. To predict rehospitalization, the team applied several machine learning models, including tree-based methods and a model that used BERT text embeddings to summarize medication data. The best-performing model achieved 96% accuracy and a 97% AUC score. Results highlighted important risk factors such as high medication burden, limited physical function, and certain drug classes. These findings support safer prescribing decisions and more personalized care for older adults.
Introduction
Many older adults are taking multiple medications to manage chronic conditions.
Older adults taking multiple medications may be at higher risk for adverse outcomes, including falls, cognitive issues, and overall frailty. Studies have shown that the most successful deprescribing interventions are ones that are personalized to individual patients1.
Our goal is to identify patterns and clusters of medication use and to identify factors that increase the risk of hospitalization.
Methods
- Standardized medication and diagnosis data using RxNorm and PhecodeX; merged with patient demographics and clinical variables.
- Conducted exploratory data analysis and clustering on medication categories (SHED Med), potentially inappropriate medications (PIMs) criteria and patient features to uncover risk patterns.
- Identified statistically significant predictors of rehospitalization using logistic regression.
- Trained and evaluated multiple machine learning models; BERT embeddings combined with XGBoost achieved highest accuracy (96%) and ROC-AUC (97%).
Data
Older adults (N = 6855) were followed while receiving home health care services after a hospitalization or other acute event. All-cause rehospitalization was 11%. Data collected included:
- Patient demographics, ADLs, OASIS variables
- Patient medications
- Patient medical diagnoses



Medications Associated with Geriatric Syndromes (MAGS)

- Only depression and urinary incontinence MAGS show much difference between percentage of re-hospitalized patients and percentage of not re-hospitalized patients.


- BEERS Criteria is developed by the American Geriatric Society flagging potentially inappropriate medications (PIMs) that pose higher risks than benefits for older adults.
Screening Tool of Older Persons’ Prescriptions (STOPP)


- Developed by European Geriatric Society flagging potentially inappropriate prescriptions by considering patient-specific clinical conditions.
RASP


- Adapted from STOPP, the RASP list is tailored for community and home care pharmacy settings.
Data Processing
- Include only active medications
- Map free-text medications and medication categories to standardized RxNorm codes
- Merge medication files by RxNorm code
- Map ICD-10 diagnosis codes PhecodeX phenotype categories and rollup to most general disease categories to reduce dimensionality.
- Integrate all medication and diagnoses data with patient data




- Polypharmacy (32%) and hyper-polypharmacy (61%) are very common, but very few patients with no polypharmacy (~7%)

Clustering


- Patient data clustered on several sets of related variables
- Medication data clustered based on SHED Med medication categories

Results
- Medications and Diagnoses were independently screened for statistically significant association with rehospitalization.
- Significant results, along with patient demographics were adjusted for in a logistic regression model to estimate effect size (odds ratio ) for the increased risk.

- Patient data and service intensity data showed to be most robust individual predictors
- Tree-based models (XGBoost, Random Forest) was the most efficient and accurate model

- Numeric data alone did not show significant improvement in the model
- Using text data from the cleaned dataset helped improve the statistics
- Summarized text data with BERT transformer and stacking XGBoost showed the best result with 96% accuracy, 97% ROC-AUC, and 69% recall
Conclusion
- Hyper-polypharmacy, ADL and Charlson Comorbidity Index show increased risk
- Key drug classes and clinically relevant diagnoses show increased risk
- Tree-based ML models perform the best at predicting rehospitalization
- Text embedding shows overall higher performance over structured, high dimensional and sparse data
Acknowledgements
We would like to thank Dr. Wang for her support and feedback and for giving us the opportunity to work on this project. We’d also like to thank Dr. Anand and Dr. Caliskan for their guidance throughout the semester.
Reference
1. Liacos M, Page AT, Etherton-Beer C. Deprescribing in older people. Aust Prescr. 2020 Aug;43(4):114-120. doi: 10.18773/austprescr.2020.033. Epub 2020 Aug 3. PMID: 32921886; PMCID: PMC7450772.