Skip to content

Data Science (DSC)

The Goergen Institute for Data Science welcomes you to its showcase of data science capstone and practicum projects from its undergraduate and graduate degree programs. Our students engage with industry, government, non-profits, and UR departments to conduct real-world analytics projects using data provided by sponsoring organizations. Students work in teams over a semester to understand the business problem, process and analyze the data, and devise a solution.They engage with the project sponsor throughout the semester via bi-weekly meetings and project presentations. Since the program was launched in 2016, over 75 data science projects from 45 companies have been offered to students, spanning a broad range of industry segments including consumer retail, healthcare, agriculture, government, education and finance. Students apply their skills in predictive modeling, machine learning, data mining, statistical analysis and data visualization to extract insights for business problems posed by the sponsor. We welcome you here to visit our data science capstone project exhibits. For additional information, please visit our website. Additional examples of recent capstone projects are available here. We look forward to hearing from you!

Contact Information

Ajay Anand, PhD – Associate Professor and Deputy Director
ajay.anand@rochester.edu

Cantay Caliskan
cantay.caliskan@rochester.edu

DSC Capstone Projects

1. Current Year Programs
May 4, 2026

Rel8ed

Meet the Team Alvin Yao Model Training & Evaluation Developed performance diagnostics and visualized stage-by-stage model behavior. Authored the next-steps and challenge discussion. Grayson Gong Pipeline Architect Designed the three-stage…

1. Current Year Programs
May 4, 2026

Finch AI

Portfolio risk analysis & daily reporting. This portfolio risk analysis project was completed as a University of Rochester Data Science Capstone for Senior Design Day, with sponsorship from FinchAI. We…

topics: Data Science, finance, Machine Learning,
1. Current Year Programs
May 4, 2026

Mathworks

Team Members: Bryce Tyler, Carter Schmitt, Mia Alex, Tianyou Tu, Yaxun Chen Sponsor: MathWorks | Instructors: Ajay Anand, Ph.D. Cantay Caliskan, Ph.D. Abstract: The FOMC, or Federal Open Market Committee, gathers…

1. Current Year Programs
May 4, 2026

URMC-CTSI

Leveraging LLMs to Identify Engaging Positive Mental Health Messages on Social Media  Introduction Public health need 60 million U.S. adults experience mental illness annually 1 in 5 youth experience major…

Archive

This project investigated whether patterns in medication use among older adults could help predict rehospitalization after home healthcare. Using data from over 6,800 patients, we analyzed active medications, diagnoses, and patient characteristics, mapping them to standardized codes like RxNorm and PhecodeX. We applied clustering techniques and machine learning models, including XGBoost and BERT-based text embeddings, to identify potential risk factors. Although some variables—like hyper-polypharmacy, reduced physical function (ADL), and certain medication classes—were associated with increased risk, no clear or consistent clusters emerged as highly predictive. Our best-performing model achieved 96% accuracy and 97% ROC-AUC, reinforcing the value of advanced methods but also underscoring the need for individualized deprescribing strategies in geriatric care.

topics: Machine Learning, Polypharmacy,

DSCC383 Team 10: Brennan Kalinowski, Tarun Paravasthu, Sean Tian, Madeleine Johnson Advisor: Cantay Caliskan, Ph.D Sponsor: Benchmark Labs Introduction Background: Organizations like the National Weather Service use numerical weather models…

topics: Climate Technology, CNN, Data Science, environmental, LSTM, Machine Learning, Time Series Analysis,

Team Members Chengze Miao, Xinyu Wang, Yamin Zheng, Bruce Zhang, Isabel Liu Academic Advisor Professor Ajay Anand Sponsor Dr. Zidian Xie Abstract In response to rising public concerns over youth…

topics: Behavioral Science, Data Science, Machine Learning, Public Health,
3. Programs Archive
May 5, 2025

URMC – Dreisbach

Social Determinants of Health Factors as Upstream Predictors of Postpartum Hemorrhage Sponsor: Caitlin Dreisbach Coauthors: Aditi Marupaka Katie Nguyen Tracy Tan Peter Zhao Yuki Li Team 4, ​​DSCC 383W Data…

3. Programs Archive
May 5, 2025

CircleStar|How to Make Your Resume Fool LLM

Contributors Abstract As AI tools enter hiring, understanding how they interpret resumes is critical. We studied how machine learning models and LLM APIs classify resumes and infer experience, using 2,484…

3. Programs Archive
May 3, 2024

Strategies Exploration For Quality Improvement

Author Jingyan Yu Lucy Chen Xinyi Liu Veronica Chistaya Advisor Cantay Çalışkan, PhD Sponsor Jack Bramley and Irena P. Boyce, Ph.D Overview Working closely with the UR Medicine Quality Institute,…

DiscSense is aiming to advance athletes’ throwing skills through the development of a gyroscopic sensor that tracks the end conditions of throws. Throughout our capstone project, we concentrated on building a classification model that will aid athletes in recognizing patterns of successful throws and pinpoint prevalent errors.

topics: Data Science,

1. Introduction Investment, based on the definition of Robinhood (one famous online brokerage platform), is the attempt to buy assets (stocks, real estate, etc.) with own resources (money or credit)…

topics: Data Science, finance, Investment, Machine Learning,
DSC Archive
March 17, 2023

Pickleball Analytics

Our project is to aid in the development of a pickleball analytics platform by improving ball detection and tracking. The baseline model used is a TrackNetV2 (Sun et. al. 2020) model trained on badminton, and the purpose of this project is to adapt the model by using transfer learning techniques to improve its performance in pickleball.

Magnetic Couple Study collected data and information from heterosexual couples who are of mixed HIV-status and recorded their prevention methods, including condom use, viral load, and new method-PrEP. This project focused on using unsupervised learning algorithms to examine the main predictors associated with protection strategies.

topics: HIV Prevention, PrEP, URMC Nursing,

ML Based classification model to detect triage level for patients arriving at trauma centre, and thus allocate appropriate resources. This was achieved using patients’ data from URMC (Department of Paediatrics).

topics: Data Science, Health care, Machine Learning, Pediatrics, Technology, Trauma, Triage, URMC,

Virufy has created machine learning models that analyze coughs in order to provide a COVID-19 diagnosis. Training these models requires an even balance between COVID-positive and COVID-negative data, but they unfortunately have very little positive data. In order to combat this issue, the team hoped to generate synthetic coughs that closely resemble real coughs.

DSC Archive
April 15, 2022

Rochester Transit Service

Team Yihe Chen Harry Huang Junting Chen Kehan Yu Mentor Cantay Caliskan Abstract Predictive Analytics for Demand Responsive Para- transportation Vision & Goal ● Create a productive schedule for Demand…

DSC Archive
April 15, 2022

MacroX-Nightlights

This project uses the luminescence of the nighttime sky as a predictive features for economic activity.

DSC Archive
April 15, 2022

URMC-COVID Resource Allocation

This project aims to observe, visualize, and model the trends in which COVID-19 patients at the University of Medical Center were allocated ventilators. Descriptive analyses are performed to investigate the relationships between variables such as but not limited to recovery rate and length of ventilator allocation and gender, race, and age.

DSC Archive
April 15, 2022

City of Rochester

This project aims to build a model which detects features such as crosswalks and curb ramps at intersections in the city of Rochester.

topics: road structures,
DSC Archive
April 15, 2022

GIDS-2

Team Qianqian Gu (Project Manager) Wei Wu Chen Yao Hanyang Zhang Mentor Ajay Anand Abstract The Goergen Institute for Data Science (GIDS) masters admission office wants to better understand applicants’…

DSC Archive
April 15, 2022

Vnomics

Team Steven Dai Zachary Mustin Uzoma Ohajekwe Duy Pham Sponsor Vnomics Corporation Matt Mayo Mentor Prof. Ajay Anand Abstract Our task is to predict imminent failures in Diesel Particulate Filters…

DSC Archive
April 15, 2022

GIDS-1: Masters Admissions

Team Xiaoen Ding Jiecheng Gu Sung Beom Park Joseph Smith Mentor Ajay Anand Sponsor Lisa Altman Gretchen Briscoe Abstract The Goergen Institute for Data Science wants to understand the types…

DSC Archive
April 15, 2022

URMC Geriatric Oncology

This project investigates the associations between geriatric assessment based features and relative dose intensity of chemotherapy. It is at the first few phases of Wilmot Cancer Institute’s Ger Oncology Research team at University of Rochester Medical Center. The team refined the data preprocessing pipeline, built predictive models and employed feature selection on the dataset, providing insightful suggestions for future work in cancer studies.

topics: chemotherapy, geriatric assessment, oncology,
DSC Archive
April 18, 2021

Vnomics 1

Successfully built autoencoder models with ML Flow and Keras to predict truck failures given sensor data for a fuel optimization startup called Vnomics. The model is optimized by comprehensive time series feature engineering with TS Fresh to achieve a high recall score of 56% on unseen data.

topics: autoencoders, classification, feature engineering, kerastime series analysis, spark, truck failure, tsfresh, vnomics,
DSC Archive
March 28, 2021

URMC Geriatric Oncology

The Geriatric Oncology Research Team at URMC wants to better understand chemotherapy tolerability in vulnerable older adults.

Rochester Monroe Anti-Poverty Initiative (RMAPI) launched a new survey to better understand the impact of COVID- 19 on community member’s income and basic needs as well as what community members need to be safe and financially secure. The goal of the project was to analyze the survey and responses to inform United Way which kind of assistance needs to be provided, and what features of living necessities are more important for the respondents.

DSC Archive
March 28, 2021

Predictive Maintainence for Trucks

Identify scenarios where DPF (Diesel Particulate Filter) failure is likely to happen so that the trucking customer can be alerted in advance to avoid costly roadside breakdowns.

Community Engagement Archive
March 28, 2021

Identify Mental Health Issues during COVID-19 using Twitter

The project aim was: 1) Understand how the degree of mental health issues changed over time and space during COVID-19; 2) Find out what topics are people concerned about, and 3) Infer what group of people are more likely to have mental health issues.

81 Moses-Saunders Dam, St. Lawrence Seaway, Lake Ontario
Community Engagement Archive
April 27, 2020

Verifying Lake Ontario’s Water Level

The Caldwell-Fay equation (2002) attempts to model what Lake Ontario’s current water level would be if dam construction had never taken place along the St. Lawrence Seaway (i.e. the natural hydraulic state of the lake).

Newly unearthed Lake Ontario data going back to the 1860s has been discovered, and we had the rare opportunity to be the first to digitize and publicly analyze it.

Since this data set predates any dam construction it actually captures the lake’s natural state. Therefore it can be used to verify Caldwell-Fey’s equation which is being used to govern the lake’s inflow and outflow rate on a daily basis.

Ward Clustering on Dataset

We were given a patient reported symptoms dataset PRO-CTCAE and applied a variety of clustering methods. The clusters were then statistically tested for associations with a selection of outcomes such as hospitalization. We found significant associations with clusters and outcomes and compared it to linear regression results.

Regional Transit Service

RTS is a regional transportation authority established by New York State and the goal of the project is to find the potential reasons for preventable accidents caused by bus operators. First, descriptive and exploratory analysis is performed on all the data provided and driver-related variables and environmental-related variables. Then, frequent pattern mining is applied and conditional probabilities are calculated for the accident history of operators with high risk of accidents to extract accident patterns.

Wegmans Logo
DSC Archive
April 22, 2020

DSC Capstone: Wegmans

Wegmans grocery stores experience changes in consumer demand due to weather-related events which may result in item shortages. Our goal was to generate a list of items that are expected to have a huge increase in sales which would allow Wegmans to prepare beforehand. We correlated the change in consumer demand over time with weather warning data and detected anomalous behaviors in item sales.

4. Keywords Archive
April 22, 2020

URMC-CTSI Networking Rhythm Badge Analysis

In this project, we want to apply DSC and machine learning techniques to identify and analyze group communication and interaction patterns from the data collected, e.g. “Who interacts with whom” and “Who attended which breakout sessions”, which can function as an indicator of team performance, group intelligence and meeting efficiency. We can further use the information to increase the productivity of Un-meetings by modifying related elements.