DSCC383 Team 10: Brennan Kalinowski, Tarun Paravasthu, Sean Tian, Madeleine Johnson

Advisor: Cantay Caliskan, Ph.D Sponsor: Benchmark Labs

Introduction

Background: Organizations like the National Weather Service use numerical weather models that divide the Earth’s surface into a grid of uniform-sized cells to make weather predictions. Each grid box represents an average value of atmospheric conditions over a specific area. [1] However, in reality, within each of these grid boxes, there is a variety of distinct microclimates localized areas where weather conditions differ significantly due to factors like elevation, vegetation, urban structures, or bodies of water.

Benchmark Labs aims to create point specific wave models to extend forecasting beyond 1 hour.

Method & Result: We evaluated multiple forecasting models, including ARIMA, XGBoost, and Long Short-Term Memory (LSTM) neural networks. Our analysis demonstrated that these models yielded reliable performance for short-term forecasts (1–12 hours). However, further refinement is necessary to enhance predictive accuracy for longer-term horizons extending beyond 24 hours.

Data Collection

Our primary data source was the National Data Buoy Center (NDBC), part of the National Weather Service. The data consists of publicly sourced worldwide buoy data reported hourly. To account for missing values, we merged the data with the European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) dataset which aims to create a complete historical record of past weather and climate data. The final dataset contains variables explaining wind, wave and atmospheric behaviour. [3]

NDBC Variable	Description
WDIR	Wind direction (^oC)
WSPD	Wind Speed (m/s), averaged every 8 min
GST	Peak gust speed (m/s) over 5-8 seconds
WVHT	Significant Wave Height (m), average of highest ⅓ waves (20 mins)
DPD	Dominant wave period (s)
APD	Average wave period (s)
MWD	Dominant wind direction
PRES	Sea level pressure (hPa)
ATMP	Air temperature (^oC)
WTMP	Sea surface temperature (^oC)
DEWP	Dew point temperature (^oC)
VIS	Visibility (nautical miles)
PTDY	Pressure tendency (hPa) over last 3 hours
TIDE	Water level (ft) relative to Mean Lower Low Water

ERA5 Variable	Description
swh	Significant wave height (m)
hmax	Maximum individual wave height (m)
mwp	Mean wave period (s), average wave period over all waves
mvd	Mean wave direction (^o from true North)
ppld	Peak wave period (s)
sp	Surface pressure (Pa), atmospheric pressure at sea level
sst	Sea surface temperature (^oC)

Feature Importance

Model Structure

Model Results

Financial Impact

Forecast Horizon	Average RMSE (in meters)	Estimated Downtime Reduction^[2]
1-4 Hours	0.184	2-3%
5-8 Hours	0.31	1%
9-12 Hours	0.403	0.05%

Assumptions for Financial Model

1) $800,000: Daily installation cost loss from weather delays.

2) 30: Average Number of days impacted by severe weather per year for each wind farm

3) 5: Model will be applied to five wind farms.

Forecast Horizon	Estimated Downtime Reduction^[2]	Savings for 5 wind farms per year
1-4 Hours	2-3%	$2.4-$3.6 Million
5-8 Hours	1-2%	$1.2-$2.4 Million
9-12 Hours	0.05%	$0.6 Million

Conclusion

While both models performed similarly, the CNN+LSTM model outperformed XGBoost at longer forecast horizons and seemed to produce smoother forecasts, making it the better choice.

We were able to achieve high accuracy for the first 5 hours, with R²ranging from as high as 0.96 to around 0.8 for these predictions and RMSE below 0.3. Expanding to 10-12 hours, we saw a reduction in accuracy to around 0.6 R² and 0.4 RMSE, which can still be useful. However, going beyond that point is unreliable and requires more work to predict accurately.

Next Steps

Going forward, there are several opportunities to optimize our models. First, additional fine-tuning may yield further improvements. Second, exploring variables with stronger correlations to wave height, such as salinity, could enhance accuracy. Finally, a full VMD-LSTM framework without data leakage remains a promising direction for future work if given more time.

Acknowledgements

We gratefully acknowledge Ulrik Soderstrom and the entire Benchmark Labs team for their guidance and continued support throughout this project. We also extend our sincere thanks to Professor Cantay Caliskan for his insights & constructive feedback during our meetings.

References

1. National Oceanic and Atmospheric Administration (NOAA). Weather Models. JetStream – An Online School for Weather. https://www.noaa.gov/jetstream/upper-air-charts/weather-models

2. Song, T., Han, R., Meng, F., Wang, J., Wei, W., & Peng, S. (2022). A significant wave height prediction method based on deep learning combining the correlation between wind and wind waves. Frontiers in Marine Science, 9, 983007 https://doi.org/10.3389/fmars.2022.983007

3. National Data Buoy Center (NDBC). Measurement Descriptions and Units. National Oceanic and Atmospheric Administration. https://www.ndbc.noaa.gov/faq/measdes.shtml.