Skip to content

Data Science (DSC)

The Goergen Institute for Data Science welcomes you to its showcase of data science capstone and practicum projects from its undergraduate and graduate degree programs. Our students engage with industry, government, non-profits, and UR departments to conduct real-world analytics projects using data provided by sponsoring organizations. Students work in teams over a semester to understand the business problem, process and analyze the data, and devise a solution.They engage with the project sponsor throughout the semester via bi-weekly meetings and project presentations. Since the program was launched in 2016, over 75 data science projects from 45 companies have been offered to students, spanning a broad range of industry segments including consumer retail, healthcare, agriculture, government, education and finance. Students apply their skills in predictive modeling, machine learning, data mining, statistical analysis and data visualization to extract insights for business problems posed by the sponsor. We welcome you here to visit our data science capstone project exhibits. For additional information, please visit our website. Additional examples of recent capstone projects are available here. We look forward to hearing from you!

Contact Information

Ajay Anand, PhD – Associate Professor and Deputy Director

Cantay Caliskan

DSC Capstone Projects

1. Current Year Programs
May 3, 2024 | 09:30 am

Enhancing Disc Sport Performance: Insights and Innovations

DiscSense is aiming to advance athletes’ throwing skills through the development of a gyroscopic sensor that tracks the end conditions of throws. Throughout our capstone project, we concentrated on building a classification model that will aid athletes in recognizing patterns of successful throws and pinpoint prevalent errors.

topics: Data Science,


DSC Archive
May 1, 2023 | 04:39 pm

Pairs Trading Algorithm Development for FLXAI

1. Introduction Investment, based on the definition of Robinhood (one famous online brokerage platform), is the attempt to buy assets (stocks, real estate, etc.) with own resources (money or credit)…

topics: Data Science, finance, Investment, Machine Learning,
DSC Archive
March 17, 2023 | 01:51 pm

Pickleball Analytics

Our project is to aid in the development of a pickleball analytics platform by improving ball detection and tracking. The baseline model used is a TrackNetV2 (Sun et. al. 2020) model trained on badminton, and the purpose of this project is to adapt the model by using transfer learning techniques to improve its performance in pickleball.

Magnetic Couple Study collected data and information from heterosexual couples who are of mixed HIV-status and recorded their prevention methods, including condom use, viral load, and new method-PrEP. This project focused on using unsupervised learning algorithms to examine the main predictors associated with protection strategies.

topics: HIV Prevention, PrEP, URMC Nursing,

ML Based classification model to detect triage level for patients arriving at trauma centre, and thus allocate appropriate resources. This was achieved using patients’ data from URMC (Department of Paediatrics).

topics: Data Science, Health care, Machine Learning, Pediatrics, Technology, Trauma, Triage, URMC,

Virufy has created machine learning models that analyze coughs in order to provide a COVID-19 diagnosis. Training these models requires an even balance between COVID-positive and COVID-negative data, but they unfortunately have very little positive data. In order to combat this issue, the team hoped to generate synthetic coughs that closely resemble real coughs.

DSC Archive
April 15, 2022 | 05:50 pm

Rochester Transit Service

Team Yihe Chen Harry Huang Junting Chen Kehan Yu Mentor Cantay Caliskan Abstract Predictive Analytics for Demand Responsive Para- transportation Vision & Goal ● Create a productive schedule for Demand…

DSC Archive
April 15, 2022 | 05:45 pm


This project uses the luminescence of the nighttime sky as a predictive features for economic activity.

DSC Archive
April 15, 2022 | 05:44 pm

URMC-COVID Resource Allocation

This project aims to observe, visualize, and model the trends in which COVID-19 patients at the University of Medical Center were allocated ventilators. Descriptive analyses are performed to investigate the relationships between variables such as but not limited to recovery rate and length of ventilator allocation and gender, race, and age.

DSC Archive
April 15, 2022 | 05:44 pm

City of Rochester

This project aims to build a model which detects features such as crosswalks and curb ramps at intersections in the city of Rochester.

topics: road structures,
DSC Archive
April 15, 2022 | 05:44 pm


Team Qianqian Gu (Project Manager) Wei Wu Chen Yao Hanyang Zhang Mentor Ajay Anand Abstract The Goergen Institute for Data Science (GIDS) masters admission office wants to better understand applicants’…

DSC Archive
April 15, 2022 | 05:43 pm


Team Steven Dai Zachary Mustin Uzoma Ohajekwe Duy Pham Sponsor Vnomics Corporation Matt Mayo Mentor Prof. Ajay Anand Abstract Our task is to predict imminent failures in Diesel Particulate Filters…

DSC Archive
April 15, 2022 | 05:43 pm

GIDS-1: Masters Admissions

Team Xiaoen Ding Jiecheng Gu Sung Beom Park Joseph Smith Mentor Ajay Anand Sponsor Lisa Altman Gretchen Briscoe Abstract The Goergen Institute for Data Science wants to understand the types…

DSC Archive
April 15, 2022 | 05:37 pm

URMC Geriatric Oncology

This project investigates the associations between geriatric assessment based features and relative dose intensity of chemotherapy. It is at the first few phases of Wilmot Cancer Institute’s Ger Oncology Research team at University of Rochester Medical Center. The team refined the data preprocessing pipeline, built predictive models and employed feature selection on the dataset, providing insightful suggestions for future work in cancer studies.

topics: chemotherapy, geriatric assessment, oncology,
DSC Archive
April 18, 2021 | 12:47 pm

Vnomics 1

Successfully built autoencoder models with ML Flow and Keras to predict truck failures given sensor data for a fuel optimization startup called Vnomics. The model is optimized by comprehensive time series feature engineering with TS Fresh to achieve a high recall score of 56% on unseen data.

topics: autoencoders, classification, feature engineering, kerastime series analysis, spark, truck failure, tsfresh, vnomics,
DSC Archive
March 28, 2021 | 02:06 pm

URMC Geriatric Oncology

The Geriatric Oncology Research Team at URMC wants to better understand chemotherapy tolerability in vulnerable older adults.

Community Engagement Archive
March 28, 2021 | 01:58 pm

COVID-19 Survey Analysis to Understand the Community’s Socioeconomic Needs

Rochester Monroe Anti-Poverty Initiative (RMAPI) launched a new survey to better understand the impact of COVID- 19 on community member’s income and basic needs as well as what community members need to be safe and financially secure. The goal of the project was to analyze the survey and responses to inform United Way which kind of assistance needs to be provided, and what features of living necessities are more important for the respondents.

DSC Archive
March 28, 2021 | 01:39 pm

Predictive Maintainence for Trucks

Identify scenarios where DPF (Diesel Particulate Filter) failure is likely to happen so that the trucking customer can be alerted in advance to avoid costly roadside breakdowns.

Community Engagement Archive
March 28, 2021 | 01:36 pm

Modeling of Lake St. Louis Water Levels

The main objective is to identify the maximum water flow tolerance of the Moses-Saunders Dam in order not to exceed the permissible limits of Lake St. Louis.

Community Engagement Archive
March 28, 2021 | 01:13 pm

Identify Mental Health Issues during COVID-19 using Twitter

The project aim was: 1) Understand how the degree of mental health issues changed over time and space during COVID-19; 2) Find out what topics are people concerned about, and 3) Infer what group of people are more likely to have mental health issues.

Community Engagement Archive
April 27, 2020 | 10:45 am

Verifying Lake Ontario’s Water Level

The Caldwell-Fay equation (2002) attempts to model what Lake Ontario’s current water level would be if dam construction had never taken place along the St. Lawrence Seaway (i.e. the natural hydraulic state of the lake).

Newly unearthed Lake Ontario data going back to the 1860s has been discovered, and we had the rare opportunity to be the first to digitize and publicly analyze it.

Since this data set predates any dam construction it actually captures the lake’s natural state. Therefore it can be used to verify Caldwell-Fey’s equation which is being used to govern the lake’s inflow and outflow rate on a daily basis.

We were given a patient reported symptoms dataset PRO-CTCAE and applied a variety of clustering methods. The clusters were then statistically tested for associations with a selection of outcomes such as hospitalization. We found significant associations with clusters and outcomes and compared it to linear regression results.

Community Engagement Archive
April 22, 2020 | 10:32 pm

Exploring Reasons Behind the Preventable Accidents of RTS Drivers

RTS is a regional transportation authority established by New York State and the goal of the project is to find the potential reasons for preventable accidents caused by bus operators. First, descriptive and exploratory analysis is performed on all the data provided and driver-related variables and environmental-related variables. Then, frequent pattern mining is applied and conditional probabilities are calculated for the accident history of operators with high risk of accidents to extract accident patterns.

DSC Archive
April 22, 2020 | 10:32 pm

DSC Capstone: Wegmans

Wegmans grocery stores experience changes in consumer demand due to weather-related events which may result in item shortages. Our goal was to generate a list of items that are expected to have a huge increase in sales which would allow Wegmans to prepare beforehand. We correlated the change in consumer demand over time with weather warning data and detected anomalous behaviors in item sales.

4. Keywords Archive
April 22, 2020 | 10:32 pm

URMC-CTSI Networking Rhythm Badge Analysis

In this project, we want to apply DSC and machine learning techniques to identify and analyze group communication and interaction patterns from the data collected, e.g. “Who interacts with whom” and “Who attended which breakout sessions”, which can function as an indicator of team performance, group intelligence and meeting efficiency. We can further use the information to increase the productivity of Un-meetings by modifying related elements.

Return to the top of the page