Skip to content

Benchmark Labs: Point-Specific Wave Height Forecasting Using Time Series Analysis

DSCC383 Team 10: Brennan Kalinowski, Tarun Paravasthu, Sean Tian, Madeleine Johnson

Advisor: Cantay Caliskan, Ph.D Sponsor: Benchmark Labs

Introduction

Background: Organizations like the National Weather Service use numerical weather models that divide the Earth’s surface into a grid of uniform-sized cells to make weather predictions. Each grid box represents an average value of atmospheric conditions over a specific area. [1] However, in reality, within each of these grid boxes, there is a variety of distinct microclimates localized areas where weather conditions differ significantly due to factors like elevation, vegetation, urban structures, or bodies of water. 

Benchmark Labs aims to create point specific wave models to extend forecasting beyond 1 hour. 

Method & Result: We evaluated multiple forecasting models, including ARIMA, XGBoost, and Long Short-Term Memory (LSTM) neural networks. Our analysis demonstrated that these models yielded reliable performance for short-term forecasts (1–12 hours). However, further refinement is necessary to enhance predictive accuracy for longer-term horizons extending beyond 24 hours.

Data Collection

Our primary data source was the National Data Buoy Center (NDBC), part of the National Weather Service. The data consists of publicly sourced worldwide buoy data reported hourly. To account for missing values, we merged the data with the European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) dataset which aims to create a complete historical record of past weather and climate data. The final dataset contains variables explaining wind, wave and atmospheric behaviour. [3]

NDBC Variable Description
WDIR Wind direction (oC)
WSPD Wind Speed (m/s), averaged every 8 min
GST Peak gust speed (m/s) over 5-8 seconds
WVHT Significant Wave Height (m), average of highest ⅓ waves (20 mins)
DPD Dominant wave period (s)
APD Average wave period (s)
MWD Dominant wind direction
PRES Sea level pressure (hPa)
ATMP Air temperature (oC)
WTMP Sea surface temperature (oC)
DEWP Dew point temperature (oC)
VIS Visibility (nautical miles)
PTDY Pressure tendency (hPa) over last 3 hours
TIDE Water level (ft) relative to Mean Lower Low Water
ERA5 Variable Description
swh Significant wave height (m)
hmax Maximum individual wave height (m)
mwp Mean wave period (s), average wave period over all waves
mvd Mean wave direction (o from true North)
ppld Peak wave period (s)
sp Surface pressure (Pa), atmospheric pressure at sea level
sst Sea surface temperature (oC)

Feature Importance

Model Structure

Model Results

Financial Impact

Forecast Horizon Average RMSE     (in meters) Estimated Downtime Reduction[2]
1-4 Hours 0.184 2-3%
5-8 Hours 0.31 1%
9-12 Hours 0.403 0.05%

Assumptions for Financial Model 

1) $800,000: Daily installation cost loss from weather delays. 

2) 30: Average Number of days impacted by severe weather per year for each wind farm

3)   5: Model will be applied to five wind farms.

Forecast Horizon Estimated Downtime Reduction[2] Savings for 5 wind farms per year
1-4 Hours 2-3% $2.4-$3.6 Million 
5-8 Hours 1-2% $1.2-$2.4 Million
9-12 Hours 0.05% $0.6 Million

Conclusion

While both models performed similarly, the CNN+LSTM model outperformed XGBoost at longer forecast horizons and seemed to produce smoother forecasts, making it the better choice.

We were able to achieve high accuracy for the first 5 hours, with R2 ranging from as high as 0.96 to around 0.8 for these predictions and RMSE below 0.3. Expanding to 10-12 hours, we saw a reduction in accuracy to around 0.6 R2 and 0.4 RMSE, which can still be useful. However, going beyond that point is unreliable and requires more work to predict accurately.

Next Steps

Going forward, there are several opportunities to optimize our models. First, additional fine-tuning may yield further improvements. Second, exploring variables with stronger correlations to wave height, such as salinity, could enhance accuracy. Finally, a full VMD-LSTM framework without data leakage remains a promising direction for future work if given more time.

Acknowledgements

We gratefully acknowledge Ulrik Soderstrom and the entire Benchmark Labs team for their guidance and continued support throughout this project. We also extend our sincere thanks to Professor Cantay Caliskan for his insights & constructive feedback during our meetings.

References

1. National Oceanic and Atmospheric Administration (NOAA). Weather Models. JetStream – An Online School for Weather. https://www.noaa.gov/jetstream/upper-air-charts/weather-models

2. Song, T., Han, R., Meng, F., Wang, J., Wei, W., & Peng, S. (2022). A significant wave height prediction method based on deep learning combining the correlation between wind and wind waves. Frontiers in Marine Science, 9, 983007 https://doi.org/10.3389/fmars.2022.983007

3. National Data Buoy Center (NDBC). Measurement Descriptions and Units. National Oceanic and Atmospheric Administration. https://www.ndbc.noaa.gov/faq/measdes.shtml