Team
Yihe Chen
Harry Huang
Junting Chen
Kehan Yu
Mentor
Cantay Caliskan
Abstract
Predictive Analytics for Demand Responsive Para- transportation

Vision & Goal
● Create a productive schedule for Demand Responsive Para-transportation by predicting the customers’ cancellation.
● Provide executable Python code and classification model.
● Discover best performance metrics.
● Generate well-organized supporting
Data Overview
Internal data
● We acquired the internal data from our sponsor
● Our original dataset contains 102754 observations, and 21 explanatory variables from May 17th, 2021 to December 5th, 2021.

External Data
● We acquired the From NOAA (National Oceanic and Atmospheric Administration)
● Acquired daily weather information

Data Visualization





Feature engineering
● Created a label for the cancellation (1 for canceled, 0 for performed)
● Transformed ‘date’ variables into informative variables (e.g.. month, day, weekday)
● Encoded categorical variables
● Aggregated passengers by type (with children, need lift)

Modeling
Handle the Class Imbalance


Random Forest with SMOTE
Accuracy: 81.5% -> 84.8%
Precision: 32.2% -> 62.1%
Precision: 53.9 -> 57.7%

Precision is significantly improved by 92.8%, while Recall and Accuracy are slightly improved
Weighted Random Forest Classifier
Compared to the Random Forest Classifier, Weighted Random Forest Classifier penalizes the misclassification of minority class more
Confusion matrix
| Actually Canceled | Actually Uncanceled | |
| Predicted Canceled | TP = 1808 | FP = 598 | 
| Predicted Uncanceled | FN = 2207 | TN = 15938 | 
XGBoost Classifier

Accuracy: 81.5% -> 86.1%
Precision: 32.2% -> 65.2%
Recall: 53.9% -> 62.3%
Confusion matrix
| Actually Canceled | Actually Uncanceled | |
| Predicted Canceled | TP = 2503 | FP = 1336 | 
| Predicted Uncanceled | FN = 1512 | TN = 15200 | 
Key Insights
● Our sponsor(RTS) has an extra bus on standby to cover any missing cases.
● During busy hours (from 8 am to 3 pm):
○ Excessively running the extra bus is costly when the prediction is not precise ○ It’s better to use the Weighted Random Forest Classifier, which gives the highest precision
● During other times:
○ It’s less costly for running extra buses since fewer clients use the service
○ It’s better to use XGBoost Classifier, which balances recall (covering more
canceled trip) and precision (making fewer errors when predicting the cancellation)