Rethinking the Split-Sample Approach in Hydrological Model Calibration
MetadataShow full item record
Hydrological models, which have become increasingly complex in the last half century due to the advances in computing capabilities and data collection, have been extensively utilized to facilitate decision-making in water resources management. Such computer-based models generally contain considerable parameters that cannot be directly measured, and hence calibration and validation are required to ensure model transferability and robustness in model building (development). The most widely used method used for assessing model transferability in time is the split-sample test (SST) framework, which has even been a paradigm in the hydrological modeling community for decades. However, there is no clear guidance or empirical/numerical evidence that supports how a dataset should be split into the calibration and validation subsets. The SST decisions usually appear to be unclear and even subjective in literature. Even though past studies have spared tremendous efforts to investigate possible ways to improve model performance by adopting various data splitting methods; however, such problem of data splitting still remain as a challenge and no consensus has achieved on which splitting method may be optimal in hydrological modeling community. One of the key reasons is lacking a robust evaluation framework to objectively compare different data splitting methods in the “out-of-sample” model application period. To mitigate these gaps, this thesis aims at assessing different data splitting methods using the large-sample hydrology approach to identify optimal data splitting methods under different conditions, as well as exploring alternative validation methods to improve model robustness that is usually done by the SST method. First, the thesis introduces a unique and comprehensive evaluation framework to compare different data splitting methods. This evaluation framework defines different model build years, as such models can be built in various data availability scenarios. Years after the model build year are retained as model testing period, which acts as an “out-of-sample” data beyond the model building period and matches how models are applied in operational use. The evaluation framework allows to incorporate various data splitting methods into comparison, as the comparison of model performance is performed in the common testing period no matter how calibration and validation data are split in model building period. Moreover, a reference climatology, which is purely observation data-based, is applied to benchmark our model simulations. Model inadequacy is properly handled by considering the possible decisions modelers may make when faced with bad model simulations. As such, the model building can be more robust and realistic. Example approaches which cover a wide range of aspects modelers may care about in practice are provided to assess large-sample modeling results. Two large-sample modeling experiments are performed in the proposed evaluation framework to compare different data splitting methods. In the first experiment, two conceptual hydrological models are applied in 463 catchments across the United States to evaluate 50 different continuous calibration sub-periods (CSPs) for model calibration (varying data period length and recency) across five different model build year scenarios, which ensures robust results across three testing period conditions. Model performance in testing periods are assessed from three independent aspects: frequency of each short-period CSP being better than its corresponding full-period CSP; central tendency of the objective function metric as computed in model testing period; and frequency that a CSP correctly classifies model testing period failure and success. The second experiment assesses 44 representative continuous and discontinuous data splitting methods using a conceptual hydrological model in 463 catchments across the United States. These data splitting methods consist of all the ways hydrological model calibration split-sampling is currently done when only a single split sample is evaluated and one method found in data-driven modeling. This results in over 0.4 million model calibration-validation and 1.7 million model testing exercises for an extensive analysis. Model performance in testing periods are assessed in similar ways in the first experiment except that all model optimization trials are utilized to draw even more robust conclusions. Three SST recommendations are made based on the strong empirical evidence. Calibrating models to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should be avoided. Calibrating a model to the full available data period and skipping temporal model validation entirely is the most robust choice. It is recommended that hydrological modelers rebuild models after their validation experiments, but prior to operational use of the model, by calibrating models to all available data. Last but not least, alternative model validation methods are further tested to enhance model robustness based on the above large-sample modeling results. A proxy validation is adopted to replace the traditional validation period in the SST method by using Split Kling-Gupta Efficiency (KGE) and Split Reference KGE in calibration to identify unacceptable models. The proxy validation is demonstrated to have some promise to enhance model robustness when all data are used in calibration.
Cite this version of the work
Hongren Shen (2023). Rethinking the Split-Sample Approach in Hydrological Model Calibration. UWSpace. http://hdl.handle.net/10012/20038
Showing items related by title, author, creator and subject.
Nguyen, Khanh V. Q. (University of Waterloo, 2014-04-30)The full model of a double-wishbone suspension has more than 30 differential-algebraic equations which takes a remarkably long time to simulate. By contrast, the look-up table for the same suspension is simulated much ...
Information Matrices in Estimating Function Approach: Tests for Model Misspecification and Model Selection Zhou, Qian (University of Waterloo, 2009-08-26)Estimating functions have been widely used for parameter estimation in various statistical problems. Regular estimating functions produce parameter estimators which have desirable properties, such as consistency and ...
Visakhamoorthy, Sona (University of Waterloo, 2011-08-30)With growing concerns over emissions, homogeneous charge compression ignition (HCCI) engines offer a promising solution through reducing NOx and particulate emissions and increasing efficiency. However, this technology is ...