Data Science (DSC)

The Goergen Institute for Data Science welcomes you to its showcase of data science capstone and practicum projects from its undergraduate and graduate degree programs. Our students engage with industry, government, non-profits, and UR departments to conduct real-world analytics projects using data provided by sponsoring organizations. Students work in teams over a semester to understand the business problem, process and analyze the data, and devise a solution.They engage with the project sponsor throughout the semester via bi-weekly meetings and project presentations. Since the program was launched in 2016, over 75 data science projects from 45 companies have been offered to students, spanning a broad range of industry segments including consumer retail, healthcare, agriculture, government, education and finance. Students apply their skills in predictive modeling, machine learning, data mining, statistical analysis and data visualization to extract insights for business problems posed by the sponsor. We welcome you here to visit our data science capstone project exhibits. For additional information, please visit our website. Additional examples of recent capstone projects are available here. We look forward to hearing from you!

Contact Information:

Ajay Anand, PhD – Associate Professor and Deputy Director

Cantay Caliskan

DSC Capstone Projects

  • A Comparison of MS and Ph.D. Programs for Three University of Rochester Departments between 2015-2022
    1. Team 2. Mentor  Georgen Institute for Data Science (GIDS) 3. Sponsor Lisa Altman 4. Abstract Due to the continuously increased demand for Data Science degrees, our school will open a Data Science PhD program soon. Our project can help our school have a better understanding of the potential DS […]
  • Classifying Patient Perceptions of Tolerability of Cancer Treatment
    Team Academic Advisor Prof. Ajay Anand and Prof. Cantay Caliskan Project Sponsor Dr. Erika Ramsdale and the URMC Geriatric Oncology Team Introduction Studies in recent years have shown that cancer rates have declined in the overall population, but are on the rise in people over the age of 65. In […]
  • Clustering Analysis of HIV Prevention Strategies on Magnetic Couples Study
    Magnetic Couple Study collected data and information from heterosexual couples who are of mixed HIV-status and recorded their prevention methods, including condom use, viral load, and new method-PrEP. This project focused on using unsupervised learning algorithms to examine the main predictors associated with protection strategies.
  • Machine Learning Decision Support Tool For Trauma Activation Level
    ML Based classification model to detect triage level for patients arriving at trauma centre, and thus allocate appropriate resources. This was achieved using patients’ data from URMC (Department of Paediatrics).
  • Mitigating Class Imbalance by Generating Synthetic Coughs Using WaveGAN
    Virufy has created machine learning models that analyze coughs in order to provide a COVID-19 diagnosis. Training these models requires an even balance between COVID-positive and COVID-negative data, but they unfortunately have very little positive data. In order to combat this issue, the team hoped to generate synthetic coughs that closely resemble real coughs.
  • Pairs Trading Algorithm Development for FLXAI
    1. Introduction Investment, based on the definition of Robinhood (one famous online brokerage platform), is the attempt to buy assets (stocks, real estate, etc.) with own resources (money or credit) for the sake of future profits [1]. In the stock market, investors typically trade stocks in two different ways to […]
  • Pickleball Analytics
    Our project is to aid in the development of a pickleball analytics platform by improving ball detection and tracking. The baseline model used is a TrackNetV2 (Sun et. al. 2020) model trained on badminton, and the purpose of this project is to adapt the model by using transfer learning techniques to improve its performance in pickleball.
  • Public perception of marijuana/cannabis on Twitter in the US
    Team Members Runtao Zhou, Qihao Yun, Jiahang Wu, Zhengyuan Wang, Mengmeng Yu Project Sponsor Dr. Zidian Xie Project descriptions and motivation Our project aims to explore the public’s perception of marijuana/cannabis in the US through Twitter data analysis. We aim to achieve four objectives, namely gaining insights into public perception […]
  • Revenue Forecast Using Time Series-Based Deep Learning Model
    Team Mentor Professor Ajay Anand Dr. Preston Countryman Sponsor Corning Inc. – Data Science & Intelligence (DSI) Team Abstract Corning wants to develop a deep-time-series model to perform accurate customer-level demand forecasting using daily purchasing data. The goal of this project is to predict the revenue income in a given […]
  • Sentiment Analysis on Twitter Data Regarding Dental Issues associated with Opioid Consumption
    DSCC383 Group I Team Youssef Ouenniche, Ian Kaplan, Michael Kingsley, Goutham Swaminathan, Shiva Rahul Edara Advisor: Professor Ajay Anand | Sponsor: Dr. Zidian Xie Analysis & Modeling Background: Opioid Use Disorder (OUD) is a chronic brain disease characterized by persistent opioid use despite harmful consequences. There are a number of […]
  • University of Rochester: Corporate Purchasing Non-Clinical Spend Analysis
    Team Team Member Major Amanda Pignataro B.S. in Data Science Avery Girksy B.S. in Data Science Ryan Hilton B.S. in Data Science Vaarya Srivastava B.S. in Data Science Mentor Prof. Cantay Caliskan , Goergen Institute for Data Science Sponsor University of Rochester Corporate Purchasing: Katherine Sadoff-Herrick, Neil Pierce, Jeff Meteyer, […]


  • A Model to Predict Paychex 401(k) Services’ Potential Clients and Explainers for Analysis
    The goal of the project was to identify upsell opportunities for Paychex’s 401(k) service products to their existing clients.
  • Analyze Membership Trends at RMSC
    To spur museum membership growth, encourage donations from members, and increase overall museum revenue
  • Benchmark Labs – Powdery Mildew Prediction
    Team Yihan Shao Chuqin Wu Melanie Xue Zihe Zheng Mentor Cantay Çalışkan Abstract The goal of this project is to forecast the pest pressure of Grape Powdery Mildew at a specific location to allow growers to treat this plant disease in time. We will experiment with various Time Series Forecasting […]
  • City of Rochester
    This project aims to build a model which detects features such as crosswalks and curb ramps at intersections in the city of Rochester.
  • City of Rochester Crime & Convenience Stores
    The City of Rochester wants to understand if physical proximity to a convenience store or liquor store affects the likelihood of different types of part 1 crimes.
  • Clustering Methods for Finding Insights in Patient Reported Data
    We were given a patient reported symptoms dataset PRO-CTCAE and applied a variety of clustering methods. The clusters were then statistically tested for associations with a selection of outcomes such as hospitalization. We found significant associations with clusters and outcomes and compared it to linear regression results.
  • COVID-19 Survey Analysis to Understand the Community’s Socioeconomic Needs
    Rochester Monroe Anti-Poverty Initiative (RMAPI) launched a new survey to better understand the impact of COVID- 19 on community member’s income and basic needs as well as what community members need to be safe and financially secure. The goal of the project was to analyze the survey and responses to inform United Way which kind of assistance needs to be provided, and what features of living necessities are more important for the respondents.
  • DSC Capstone: Wegmans
    Wegmans grocery stores experience changes in consumer demand due to weather-related events which may result in item shortages. Our goal was to generate a list of items that are expected to have a huge increase in sales which would allow Wegmans to prepare beforehand. We correlated the change in consumer demand over time with weather warning data and detected anomalous behaviors in item sales.
  • Exploring Reasons Behind the Preventable Accidents of RTS Drivers
    RTS is a regional transportation authority established by New York State and the goal of the project is to find the potential reasons for preventable accidents caused by bus operators. First, descriptive and exploratory analysis is performed on all the data provided and driver-related variables and environmental-related variables. Then, frequent pattern mining is applied and conditional probabilities are calculated for the accident history of operators with high risk of accidents to extract accident patterns.
  • GIDS-1: Masters Admissions
    Team Xiaoen Ding Jiecheng Gu Sung Beom Park Joseph Smith Mentor Ajay Anand Sponsor Lisa Altman Gretchen Briscoe Abstract The Goergen Institute for Data Science wants to understand the types of institutions and programs that students are choosing to attend. Thus, the goal of this project is to better understand […]
  • GIDS-2
    Team Qianqian Gu (Project Manager) Wei Wu Chen Yao Hanyang Zhang Mentor Ajay Anand Abstract The Goergen Institute for Data Science (GIDS) masters admission office wants to better understand applicants’ decisions and the overall application cycle from 2015 to 2021. The goal of this project is to generate meaningful insights […]
  • Identify Mental Health Issues during COVID-19 using Twitter
    The project aim was: 1) Understand how the degree of mental health issues changed over time and space during COVID-19; 2) Find out what topics are people concerned about, and 3) Infer what group of people are more likely to have mental health issues.
  • Improve Efficiency of Chilled Water Production
    The project supported the goal of UR Utilities and Energy Management deparment to improve the efficiency of chilled water production through predictive modeling
  • Machine Learning Decision Support Tool For Trauma Activation Level
    ML Based classification model to detect triage level for patients arriving at trauma centre, and thus allocate appropriate resources. This was achieved using patients’ data from URMC (Department of Paediatrics).
  • MacroX-Nightlights
    This project uses the luminescence of the nighttime sky as a predictive features for economic activity.
  • Modeling of Lake St. Louis Water Levels
    The main objective is to identify the maximum water flow tolerance of the Moses-Saunders Dam in order not to exceed the permissible limits of Lake St. Louis.
  • Predictive Maintainence for Trucks
    Identify scenarios where DPF (Diesel Particulate Filter) failure is likely to happen so that the trucking customer can be alerted in advance to avoid costly roadside breakdowns.
  • Public Perception on COVID-19 Vaccines
    The goal of the project was to explore public perception on COVID-19 vaccine by analyzing social media platform data (Twitter).
  • Rochester Transit Service
    Team Yihe Chen Harry Huang Junting Chen Kehan Yu Mentor Cantay Caliskan Abstract Predictive Analytics for Demand Responsive Para- transportation Vision & Goal ● Create a productive schedule for Demand Responsive Para-transportation by predicting the customers’ cancellation. ● Provide executable Python code and classification model. ● Discover best performance metrics. […]
  • URMC Geriatric Oncology
    This project investigates the associations between geriatric assessment based features and relative dose intensity of chemotherapy. It is at the first few phases of Wilmot Cancer Institute’s Ger Oncology Research team at University of Rochester Medical Center. The team refined the data preprocessing pipeline, built predictive models and employed feature selection on the dataset, providing insightful suggestions for future work in cancer studies.
  • URMC Geriatric Oncology
    The Geriatric Oncology Research Team at URMC wants to better understand chemotherapy tolerability in vulnerable older adults.
  • URMC-COVID Resource Allocation
    This project aims to observe, visualize, and model the trends in which COVID-19 patients at the University of Medical Center were allocated ventilators. Descriptive analyses are performed to investigate the relationships between variables such as but not limited to recovery rate and length of ventilator allocation and gender, race, and age.
  • URMC-CTSI Networking Rhythm Badge Analysis
    In this project, we want to apply DSC and machine learning techniques to identify and analyze group communication and interaction patterns from the data collected, e.g. “Who interacts with whom” and “Who attended which breakout sessions”, which can function as an indicator of team performance, group intelligence and meeting efficiency. We can further use the information to increase the productivity of Un-meetings by modifying related elements.
  • Verifying Lake Ontario’s Water Level
    The Caldwell-Fay equation (2002) attempts to model what Lake Ontario’s current water level would be if dam construction had never taken place along the St. Lawrence Seaway (i.e. the natural hydraulic state of the lake). Newly unearthed Lake Ontario data going back to the 1860s has been discovered, and we had the rare opportunity to be the first to digitize and publicly analyze it. Since this data set predates any dam construction it actually captures the lake’s natural state. Therefore it can be used to verify Caldwell-Fey’s equation which is being used to govern the lake’s inflow and outflow rate on a daily basis.
  • Vnomics
    Team Steven Dai Zachary Mustin Uzoma Ohajekwe Duy Pham Sponsor Vnomics Corporation Matt Mayo Mentor Prof. Ajay Anand Abstract Our task is to predict imminent failures in Diesel Particulate Filters (DPFs) of truck trailers up to fourteen days before breakdown occurs and to identify critical indicators of DPF failures. Upon […]
  • Vnomics 1
    Successfully built autoencoder models with ML Flow and Keras to predict truck failures given sensor data for a fuel optimization startup called Vnomics. The model is optimized by comprehensive time series feature engineering with TS Fresh to achieve a high recall score of 56% on unseen data.