{"id":46662,"date":"2021-03-28T13:39:22","date_gmt":"2021-03-28T17:39:22","guid":{"rendered":"https:\/\/seniordesign.digitalscholar.rochester.edu\/ds2021\/?p=84"},"modified":"2022-04-13T10:16:45","modified_gmt":"2022-04-13T14:16:45","slug":"vnomics-2","status":"publish","type":"post","link":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/","title":{"rendered":"Predictive Maintainence for Trucks"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/embarktruck-TA-1024x768.jpeg\" alt=\"\" class=\"wp-image-704\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Team members<\/h2>\n\n\n\n<div class=\"wp-block-coblocks-author\"><figure class=\"wp-block-coblocks-author__avatar\"><img decoding=\"async\" class=\"wp-block-coblocks-author__avatar-img\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/WechatIMG1.jpeg\" alt=\"Yu Hao\"\/><\/figure><div class=\"wp-block-coblocks-author__content\"><span class=\"wp-block-coblocks-author__name\">Yu Hao<\/span><p class=\"wp-block-coblocks-author__biography\">Graduate Student &#8217;21<br\/>Data Science Major<\/p><\/div><\/div>\n\n\n\n<div class=\"wp-block-coblocks-author\"><figure class=\"wp-block-coblocks-author__avatar\"><img decoding=\"async\" class=\"wp-block-coblocks-author__avatar-img\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/ececa_e__-copy.jpg\" alt=\"Xinyu Guo\"\/><\/figure><div class=\"wp-block-coblocks-author__content\"><span class=\"wp-block-coblocks-author__name\">Xinyu Guo<\/span><p class=\"wp-block-coblocks-author__biography\">Undergraduate Student &#8217;21<br\/>Data Science, Economics<\/p><\/div><\/div>\n\n\n\n<div class=\"wp-block-coblocks-author\"><figure class=\"wp-block-coblocks-author__avatar\"><img decoding=\"async\" class=\"wp-block-coblocks-author__avatar-img\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/IMG_2581.jpg\" alt=\"Yuman Xie\"\/><\/figure><div class=\"wp-block-coblocks-author__content\"><span class=\"wp-block-coblocks-author__name\">Yuman Xie<\/span><p class=\"wp-block-coblocks-author__biography\">Undergraduate Student &#8217;21<br\/>Data Science, Financial Econ<\/p><\/div><\/div>\n\n\n\n<div class=\"wp-block-coblocks-author\"><figure class=\"wp-block-coblocks-author__avatar\"><img decoding=\"async\" class=\"wp-block-coblocks-author__avatar-img\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/Screen-Shot-2021-05-06-at-11.59.24-PM.png\" alt=\"Pinyi Wu\"\/><\/figure><div class=\"wp-block-coblocks-author__content\"><span class=\"wp-block-coblocks-author__name\">Pinyi Wu<\/span><p class=\"wp-block-coblocks-author__biography\">Undergraduate Student &#8217;21<br\/>Data Science, Brain and Cognitive Science<\/p><\/div><\/div>\n\n\n\n<div class=\"wp-block-coblocks-author\"><div class=\"wp-block-coblocks-author__content\"><span class=\"wp-block-coblocks-author__name\">Yuqi Zeng<\/span><p class=\"wp-block-coblocks-author__biography\">Undergraduate Student &#8217;21<br\/>Data Science, Business, Economics<\/p><\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Supervisors<\/h2>\n\n\n\n<p>Ajay Anand, Pedro Fernandez<\/p>\n\n\n\n<p>Department of Data Science, UR<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sponsors<\/h2>\n\n\n\n<p>Lloyd Palum, Mathew Mayo<\/p>\n\n\n\n<p>Vnomics Corporation<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p><strong>Vision:<\/strong> Identify scenarios where DPF (Diesel Particulate Filter) failure is likely to happen so that the customer can be alerted in advance to avoid costly roadside breakdowns.<\/p>\n\n\n\n<p><strong>Goal:<\/strong> By performing data preprocessing, data visualization, and building classification models, we would like to have our outputs be recall scores and confusion matrices of classification results.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Terminology and Features<\/h2>\n\n\n\n<p><strong>DPF (Diesel Particulate Filter):<\/strong> a filter in the truck that filters out environmentally harmful matter from exhaust gases; truck will self-clean its filter during operation<\/p>\n\n\n\n<p><strong>DPF failure<\/strong>: a situation where the filter\u2019s self-cleaning (regeneration) does not work properly; and to a point, the filter is so clogged that it cannot be cleaned solely through regeneration, thus needs maintenance.&nbsp;<\/p>\n\n\n\n<p><strong>dpf_regen_inhibited_duration:<\/strong> &nbsp;the total duration (minutes) where dpf regeneration is inhibited for the day, in this case, regeneration cannot occur even if it needs to, and dpf_regen_not_inhibited_duration is the reverse.<\/p>\n\n\n\n<p><strong>dpf_regen_active:<\/strong> the total duration (minutes) where dpf regeneration is active for the day, which means regeneration is taking place, and dpf_regen_not_active is the reverse<\/p>\n\n\n\n<p><strong>dpf_regen_needed_duration_mins:<\/strong> the total duration (minutes) where dpf regeneration is reported as being needed for the day, in which regeneration is needed but is not taking place. We regard this parameter as a critical sign of an upcoming failure.<\/p>\n\n\n\n<p><strong>DTC (Diagnostic Trouble Codes):<\/strong> leading up to DPF failure, the trucks may throw one or more DTCs. Each DTC consists of an SPN code and an FMI code. The combination of these codes will return any possible issue of a truck, thus yielding thousands of possible outcomes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data Preprocessing &amp; Cleaning<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>Data merging<\/li><\/ul>\n\n\n\n<p>The raw data we got has one dataset for each individual truck, in a total of 161 CSV files. We merged them all together and got a big dataframe with all data.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Data cleaning<\/li><\/ul>\n\n\n\n<p>We imputed null data based on the data of the previous day since we needed continuous time series. We added features \u201cmonth\u201d and \u201cvehicle year\u201d, which we thought might give us some patterns. We also added DTC. After communicating with our sponsor, we kept only 8 of the DTC pairs that were related to DPF failures. We only kept the trucks with DPF failures and dropped all others. Within these trucks, we kept the data 30 days before the service date, and later based on our tuning result we changed the length to 60 days. As illustrated in the graph below, the service date is indicated by the arrow, and the bold lines are the data we choose. No matter when the service date was, we only kept 60 days before that. If a truck, like truck D, didn&#8217;t have enough data, we dropped the truck.<\/p>\n\n\n\n<p><img decoding=\"async\" width=\"589px;\" height=\"401px;\" src=\"https:\/\/lh3.googleusercontent.com\/3HE4KyBT-VKRg2AKPeeIsscDCJlaTJK02-A3GeJlOBXeHWQGgAic4ywztrsYgYW1vd0as-MvdqyHswdhCoLLHbPSngrwKLbBw_uPZ7ijS9V1zKaNT2vF-y8TQcqf1pEperk5XYNlglM\"><\/p>\n\n\n\n<p>&nbsp;Finally, we got 87 trucks left. In total there are 5220 rows of raw data.<\/p>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>Target variable<\/li><\/ul>\n\n\n\n<p>We set our target variable (y=1) 3 days before the actual failure date in order to predict the failure in advance. For all the other dates (rows), their y values were set 0.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Data Normalization<\/li><\/ul>\n\n\n\n<p>We normalized the data based on duration mins. Because data from a truck that works over 10 hours per day is not comparable to a truck that only works a few minutes per day. So we divided every numeric variable by the corresponding duration mins. In this way, we normalized the numerical data and were able to compoare them across different trucks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Explorative Data Analysis<\/h2>\n\n\n\n<p>As mentioned earlier, it is necessary to normalize the numerical data. Here, we used histograms and correlation coefficients to examine which normalization methods we should utilize. We chose to normalize the data by &#8220;distance_miles&#8221; first. However, as might be seen in the histograms, the distributions of all features <g class=\"gr_ gr_16 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar multiReplace\" id=\"16\" data-gr-id=\"16\">were heavily affected by a <\/g><g class=\"gr_ gr_15 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins doubleReplace replaceWithoutSep\" id=\"15\" data-gr-id=\"15\">large<\/g> number of outliers. We were concerned that such <g class=\"gr_ gr_14 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins replaceWithoutSep\" id=\"14\" data-gr-id=\"14\">large<\/g> number of outliers might decrease the quality of our model. Therefore, we tried normalization by &#8220;duration_mins&#8221;, and the distributions were much reasonable. All values fell in the range of 0 to 1, and the shapes of the distributions were clearer. In <g class=\"gr_ gr_12 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar multiReplace\" id=\"12\" data-gr-id=\"12\">addition<\/g>, we didn&#8217;t see many outliers.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-28-1-1024x515.png\" alt=\"\" class=\"wp-image-601\"\/><figcaption>Histogram distribution of data normalized by distance miles<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/Screen-Shot-2021-05-06-at-9.32.54-PM-1024x517.png\" alt=\"\" class=\"wp-image-603\"\/><figcaption>Histogram distribution of data normalized by duration mins<\/figcaption><\/figure>\n\n\n\n<p>The graph below is the correlation coefficient across all different features. The data here has been normalized by duration mins. Comparing to the correlation we had from data normalized by distance miles, we saw an increase in the linear relationship between the dependent variables \u201cdpf_failure_2weeks\u201d and all other explanatory variables. In addition, we found two sets of explanatory variables that had very high correlations. The first one is &#8220;fuel_used_gallons&#8221; and &#8220;distance_miles&#8221;, which were highlighted in blue boxes. Their correlation was 0.96. Another one is &#8220;dpf_regen_not_active_duration_mins&#8221; and &#8220;dpf_regen_inhibited_duration_mins&#8221;, which were highlighted in the yellow boxes. Their correlation was 0.85. <\/p>\n\n\n\n<p>We also noticed that the correlation coefficient has decreased a lot between most of the explanatory features after we normalized the data by \u201cduration_mins\u201d.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-27-1.png\" alt=\"\" class=\"wp-image-600\"\/><figcaption>Correlation heat map <\/figcaption><\/figure>\n\n\n\n<p>Since the data was more reasonable when it was normalized by duration_mins, we decided to go ahead and use this normalization for our data.<\/p>\n\n\n\n<p>Then we compared line plots for different windowing intervals. We selected 7 days, 14 days, and 30 days as windowing intervals. X-axis is the number of days to the service date, i.e. the end of the x-axis is the service date (failure date). And y-axis is the value for corresponding features.<\/p>\n\n\n\n<p>The graph below is the plot for 7 days. Dark green lines are the average trend, and light green areas are the 95 percent confidence intervals. We can see from this graph that, starting from 4 days before failure, there was a downward trend for the first two rows (8 graphs), but an upward trend for the last graph. We interpret it as, when the failure is about to occur, the truck is driven less, thus less the distance miles, fuel used, etc, yet higher the regeneration duration needed because the truck is not able to regenerate itself.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-29-1-1024x506.png\" alt=\"\" class=\"wp-image-604\"\/><figcaption>Line plot for 7 days<\/figcaption><\/figure>\n\n\n\n<p>Then we <g class=\"gr_ gr_5 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace\" id=\"5\" data-gr-id=\"5\">wante<\/g>d to see for how many days is this downward trend valid. So we extend the windowing interval to 14 and 30 days.<\/p>\n\n\n\n<p>The graph below is for 14 days. And we see the trend is valid between 3 to 5 days before failure.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-30-1-1024x500.png\" alt=\"\" class=\"wp-image-606\"\/><figcaption>Line plot for 14 days<\/figcaption><\/figure>\n\n\n\n<p>And down below is for 30 days, we can see more clearly that the trend is valid 5 days before the failure date.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-31-1-1024x494.png\" alt=\"\" class=\"wp-image-607\"\/><figcaption>Line plot for 30 days<\/figcaption><\/figure>\n\n\n\n<p>Thus according to our explorative data analysis, our raw data (also called the base data) was normalized by &#8220;duration_mins data&#8221;, with 5 days\u2019 data before service date counting as failure. That is, only those data that are 5 days before the service date are counted as y equals to 1. Later, based on our <g class=\"gr_ gr_39 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace\" id=\"39\" data-gr-id=\"39\">tsFresh<\/g> result, we extended failure days to 10 days before the service date.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Methods and Results<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>tsfresh<\/li><\/ul>\n\n\n\n<p>In feature engineering, we used python package <g class=\"gr_ gr_79 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace\" id=\"79\" data-gr-id=\"79\">tsf<\/g>resh to generate windowed data, calculate and select a large number of time-series characterized features. We also incorporated SMOTE to implement oversampling, and PCA to further reduce dependency between calculated features and data dimensionality.&nbsp;<\/p>\n\n\n\n<p>The graph below this paragraph is an overview of the process of our feature engineering. We separated it into two scenarios, the upper one is test\/train split based on windows. The one below is splitting by the trucks. The key difference here is where the test\/train split takes place. Instead of splitting at the start, the splitting by windows one split after features generation. Since there are only 88 trucks used in our data, we believe splitting by windows can focus on driving behaviors and eliminate the differences between each truck. Therefore, we proceeded with splitting by windows (the one below).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"575\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-32-1-1024x575.png\" alt=\"\" class=\"wp-image-625\" srcset=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-32-1-1024x575.png 1024w, https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-32-1-300x168.png 300w, https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-32-1-768x431.png 768w, https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-32-1-1200x674.png 1200w, https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-32-1.png 1288w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>To help understand the process, we will use our baseline model to introduce each stage, as illustrated in the graph below. (Because it&#8217;s the baseline, the days we used are different from our final model. Nevertheless, it is the concept that matters.) The first part is rolling and windowing to convert the data into windows. This part is powered by tsfresh. In order to predict failure in advance, we left 1 day out and moved y = 1 to one day before the service date, shown in blue. We used 30 days of data, so the data starts from 31 days before the <g class=\"gr_ gr_17 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins replaceWithoutSep\" id=\"17\" data-gr-id=\"17\">service<\/g> date to 1 day before the service date. The rectangles <g class=\"gr_ gr_1269 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar replaceWithoutSep\" id=\"1269\" data-gr-id=\"1269\">in<\/g> the graph represent the windowing period. Based on the results of our EDA, we used a window size of 5 days, shown in yellow. In order to generate more data, we used an overlapped percentage at 40%, giving 2-days overlapped between windows, and this is shown in red. After windowing, all windows, except the last window, have a y = 0, and the last window that includes the 1 day before the service date has a y = 1. The resulting data frame has 4752 rows and 21 columns.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-33-1-1024x426.png\" alt=\"\" class=\"wp-image-630\"\/><figcaption>tsfresh feature selection<\/figcaption><\/figure>\n\n\n\n<p>After we got windowed data, we used tsFresh to generate a large number of time series characterized features. Each window has 5 days of readings and 21 features to feed in tsfresh. We used ComprehensiveParameters that calculates all tsfresh features. After feature generation, each window will become a feature vector with a length of 13379. The resulting data frame has 792 rows and 13379 columns.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/05\/image-34-1.png\" alt=\"\" class=\"wp-image-632\"\/><figcaption>tsfresh feature generation<\/figcaption><\/figure>\n\n\n\n<p>Out of these 792 rows of data, we only had 88 of them that have a y = 1. In order to solve the imbalance classes here, we utilized SMOTE to oversample the minority class, and we got balanced classes as a result. Then we conducted <g class=\"gr_ gr_251 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins replaceWithoutSep\" id=\"251\" data-gr-id=\"251\">test<\/g>\/train split. After splitting, we conducted feature selection, a method powered by tsfresh, to get selected features. We selected the feature based on the train set by the feature selection method in <g class=\"gr_ gr_14 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace\" id=\"14\" data-gr-id=\"14\">tsfresh<\/g>. Based on these selected features, we then extracted the same features from the test set. As <g class=\"gr_ gr_13 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace\" id=\"13\" data-gr-id=\"13\">tsfresh<\/g> produced highly correlated features and we still have too many features, we incorporated PCA to further reduce the dimensionality and dealt with those highly correlated features. We fit the train set to the PCA model and concluded that, at a principal component number of 4, we are able to preserve 99% variation. Then, we fit the same PCA transformation into the test data. And now, our test and train data set are ready to feed into the model.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Modeling<\/li><\/ul>\n\n\n\n<p>We measured our models based on Recall value. <g class=\"gr_ gr_230 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins doubleReplace replaceWithoutSep\" id=\"230\" data-gr-id=\"230\">Recall<\/g> is calculated by dividing the true-positive value by the sum of true-positive and false-negative values. We would like to see how sensitive our model is, and recall tells us that, out of the actual positive data, how many times we predict correctly. Since the cost of predicting <g class=\"gr_ gr_12 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace\" id=\"12\" data-gr-id=\"12\">DPF<\/g> failure is relatively low, and since being able to predict ahead of time is the main goal of this project, we set our criterion to be the recall value. In all, It\u2019s better to be safe than sorry.&nbsp;<\/p>\n\n\n\n<p>We fit the data obtained previously into 9 models: Logistics Regression, SVM, Random Forest,&nbsp;KNN,&nbsp;Extra Trees, Naive Bayes, Decision Tree, Bagging, Gradient Boosting. Here we&#8217;re using Logistics Regression as an example to show our result.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Tuning<\/li><\/ul>\n\n\n\n<p>After constructing the baseline model, we tuned five groups of parameters.<\/p>\n\n\n\n<p>The first one is moving the date we set as y=1 further in advance. Our baseline model sets the y=1 date only one day before the actual recorded service date, that means we\u2019re only able to predict failure, one day before the failure. So we moved the y=1, 2 to 5 days before the service date. Results show that 1 day and 3 days before service date have similar model recall, which are higher than any other days. And since three days before service date allows us to predict failure earlier and thus give truck drivers higher flexibility, we are choosing 3 days before service date as the y=1.<\/p>\n\n\n\n<p>The second parameter we tuned was the overlap days in tsfresh windowing. In the baseline model, we tried 40 to 80 percent of overlapping, that\u2019s from overlap of 2 days to 4 days in a total of 5 days windowing. Results show that 80 percent (or 4 days) of overlapping has the highest recall from the best performance model.<\/p>\n\n\n\n<p>Then our third try was on the tuning of the failure date. We changed failure days along with overlapping days. For the failure date, we tried 5 days, 6 days, and 10 days. While we were trying, we figured that when we increased the windowing period from 5 days to 10 days, there\u2019s a decrease of almost half of the after-rolling data entries. To make them comparable, we enlarged the total data used for 10-day windowing from 30 days to 60 days. For overlapping days we tried a range between 70% to 90%. Note here that the percentage is the actual percentage. For example, in our baseline model of 5 days, 70% to 90% overlap all correspond to 4 days of overlapping after the roundup, but we will use 80%, <g class=\"gr_ gr_137 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar multiReplace\" id=\"137\" data-gr-id=\"137\">as<\/g> 4 days out of 5 days is exactly 80%. The resultant percentage we compared are the recall rates of y=1 from best performance models of corresponding failure and overlapping days. Due to a small mistake in our method, we&#8217;re currently re-processing this part. Final results should be available soon.<\/p>\n\n\n\n<p>The fourth parameter we tuned was a setting in tsfresh\u2019s extraction function. There are three predefined dictionaries, ComprehensiveFCParameter, EfficientFCParameter, and MinimalFCParameter. We ran each of them. Results show that comprehensive and efficient give only a 2% difference in recall. But Comprehensive has a runtime of around half an hour, efficient has a runtime of only 5 minutes, so we decided to use efficientFCParameter.<\/p>\n\n\n\n<p>The fifth parameter we will be focusing on after we solve the small problem will be the model parameters. By changing <g class=\"gr_ gr_127 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins replaceWithoutSep\" id=\"127\" data-gr-id=\"127\">internal<\/g> parameters of each model, we were able to raise model performance by as much as around 27%.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusions<\/h2>\n\n\n\n<p>Due to the mistake that we made in tsfresh, we are still tuning our last part of the model. The highest recall value we got before tuning is around 60%, which means out of those trucks experiencing failure, we have 60% confidence that we will predict the failure. This result is better than we expected due to the fact that we didn\u2019t have a lot of data, and the data is very unbalanced (for 161 trucks in two years, we only have 87 rows of data that can be count as y=1, that\u2019s 87 out of 10225110). We do think we can improve the model performance by further tuning. But upon that, we also think there are several other things to do.<\/p>\n\n\n\n<p>Firstly, we could still try to split our data by trucks at the tsfresh rolling step. By doing that we can test our model validity.<\/p>\n\n\n\n<p>Secondly, we could try to predict normal trucks. We can predict the performance of normal trucks and set a standard for that. Then if a truck doesn\u2019t meet this standard, we can predict that this truck will have a failure. The problem with this method is that if we don\u2019t have enough normal trucks, we won\u2019t be able to include all the normal driving behavior for our \u201cnormal standard\u201d.<\/p>\n\n\n\n<p>We will try these methods and some other steps to improve our model performance before the semester ends. We will update our further change, and if you have any questions and suggestions, please leave us a comment below!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Acknowledgment<\/h2>\n\n\n\n<p>We would like to express our sincere gratitude to Professor Anand, Professor <g class=\"gr_ gr_5 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace\" id=\"5\" data-gr-id=\"5\">Palum<\/g>, Mr. Mayo. T<g class=\"gr_ gr_134 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling multiReplace\" id=\"134\" data-gr-id=\"134\">hank<\/g> you for all your help and support throughout the semester!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Reference<\/h2>\n\n\n\n<p><a href=\"https:\/\/tsfresh.readthedocs.io\/en\/latest\/text\/introduction.html\">https:\/\/tsfresh.readthedocs.io\/en\/latest\/text\/introduction.html<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/blue-yonder\/tsfresh\/tree\/main\/notebooks\/advanced\">https:\/\/github.com\/blue-yonder\/tsfresh\/tree\/main\/notebooks\/advanced<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Identify scenarios where DPF (Diesel Particulate Filter) failure is likely to happen so that the trucking customer can be alerted in advance to avoid costly roadside breakdowns.<\/p>\n","protected":false},"author":6242,"featured_media":60452,"comment_status":"closed","ping_status":"open","sticky":false,"template":"templates\/template-full-width.php","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[106,2956,3006,2986],"tags":[],"coauthors":[8612],"class_list":["post-46662","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dsc-archive","category-energy-environmental-archive","category-machine-learning-archive","category-transportation-archive"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Predictive Maintainence for Trucks - Senior Design Day<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Predictive Maintainence for Trucks - Senior Design Day\" \/>\n<meta property=\"og:description\" content=\"Identify scenarios where DPF (Diesel Particulate Filter) failure is likely to happen so that the trucking customer can be alerted in advance to avoid costly roadside breakdowns.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/\" \/>\n<meta property=\"og:site_name\" content=\"Senior Design Day\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-28T17:39:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-04-13T14:16:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/Vnomics_Picture.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"840\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#\\\/schema\\\/person\\\/351018fbcf84ed8cac6d8072ba5b347c\"},\"headline\":\"Predictive Maintainence for Trucks\",\"datePublished\":\"2021-03-28T17:39:22+00:00\",\"dateModified\":\"2022-04-13T14:16:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/\"},\"wordCount\":2649,\"image\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Vnomics_Picture.jpg\",\"articleSection\":[\"DSC Archive\",\"Energy and Environmental Archive\",\"Machine Learning Archive\",\"Transportation Archive\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/\",\"name\":\"Predictive Maintainence for Trucks - Senior Design Day\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Vnomics_Picture.jpg\",\"datePublished\":\"2021-03-28T17:39:22+00:00\",\"dateModified\":\"2022-04-13T14:16:45+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#\\\/schema\\\/person\\\/351018fbcf84ed8cac6d8072ba5b347c\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Vnomics_Picture.jpg\",\"contentUrl\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Vnomics_Picture.jpg\",\"width\":960,\"height\":720},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/vnomics-2\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Predictive Maintainence for Trucks\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#website\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/\",\"name\":\"Senior Design Day\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#\\\/schema\\\/person\\\/351018fbcf84ed8cac6d8072ba5b347c\",\"name\":\"admin\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/author\\\/seniordesign\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Predictive Maintainence for Trucks - Senior Design Day","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/","og_locale":"en_US","og_type":"article","og_title":"Predictive Maintainence for Trucks - Senior Design Day","og_description":"Identify scenarios where DPF (Diesel Particulate Filter) failure is likely to happen so that the trucking customer can be alerted in advance to avoid costly roadside breakdowns.","og_url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/","og_site_name":"Senior Design Day","article_published_time":"2021-03-28T17:39:22+00:00","article_modified_time":"2022-04-13T14:16:45+00:00","og_image":[{"url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/Vnomics_Picture.jpg","width":840,"height":630,"type":"image\/jpeg"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/#article","isPartOf":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/"},"author":{"name":"admin","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#\/schema\/person\/351018fbcf84ed8cac6d8072ba5b347c"},"headline":"Predictive Maintainence for Trucks","datePublished":"2021-03-28T17:39:22+00:00","dateModified":"2022-04-13T14:16:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/"},"wordCount":2649,"image":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/#primaryimage"},"thumbnailUrl":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/Vnomics_Picture.jpg","articleSection":["DSC Archive","Energy and Environmental Archive","Machine Learning Archive","Transportation Archive"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/","name":"Predictive Maintainence for Trucks - Senior Design Day","isPartOf":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/#primaryimage"},"image":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/#primaryimage"},"thumbnailUrl":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/Vnomics_Picture.jpg","datePublished":"2021-03-28T17:39:22+00:00","dateModified":"2022-04-13T14:16:45+00:00","author":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#\/schema\/person\/351018fbcf84ed8cac6d8072ba5b347c"},"breadcrumb":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/#primaryimage","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/Vnomics_Picture.jpg","contentUrl":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2021\/03\/Vnomics_Picture.jpg","width":960,"height":720},{"@type":"BreadcrumbList","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/vnomics-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/"},{"@type":"ListItem","position":2,"name":"Predictive Maintainence for Trucks"}]},{"@type":"WebSite","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#website","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/","name":"Senior Design Day","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#\/schema\/person\/351018fbcf84ed8cac6d8072ba5b347c","name":"admin","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/author\/seniordesign\/"}]}},"_links":{"self":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts\/46662","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/users\/6242"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/comments?post=46662"}],"version-history":[{"count":2,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts\/46662\/revisions"}],"predecessor-version":[{"id":60242,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts\/46662\/revisions\/60242"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/media\/60452"}],"wp:attachment":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/media?parent=46662"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/categories?post=46662"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/tags?post=46662"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/coauthors?post=46662"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}