GEML 1.0
Here, we introduce version 1.0 of the Global Environmental eMuLator (GEML), a data-driven model compatible with the ΒΌΒ°, 13-level version of GraphCast (Lam et al. 2023, [1]). This model was trained by the Meteorological Research Divison (MRD) and Canadian Centre for Meteorological and Environmental Prediction (CCMEP), divisions of Environment and Climate Change Canada.
This model was trained "from scratch," using training code developed for other research projects. In inference (forecast production), this model is fully compatible with the model code in DeepMind's GraphCast repository.
License
These model weights are available under the Canada Open Government license, which permits derivative works and commercial use with attribution.
Variables
The model predicts the following meteorological variables on a ΒΌΒ° latitude/longitude grid (with poles):
- At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
- At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation[β ]
[β ] β This variable is incorrect. Please see the 'erratum' section.
The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels. For points that lie below the surface, extrapolated values are given (i.e. they are not masked).
The model timestep is 6 hours, and the model takes two time levels as input. That is, to produce a forecast valid at 12Z, the model needs input data at 6Z and 0Z.
Input data
All forecast variables except accumulated precipitation and taken as input values. The model also requires the surface geopotential, land-sea mask, and top-of-atmosphere incident solar radiation (accumulated over 1h) as input values.
The surface geopotential and land-sea mask are static variables. The incident solar radiation must be provded at both the input time-levels and the output time-level. This value can be calculated, and both the DeepMind GraphCast repository and the training code repository contain an incident solar radiation model.
Model training
Datasets
The model was pre-trained on ERA5 data (calendar years 1979β2015, inclusive), following the training configuration in Lam et al. Subsequently, it was fine-tuned on the "HRES initial conditions" dataset for calendar years 2016β2021.
Both of these datasets are available from the WeatherBench 2 project (Rasp et al. 2024, [2]).
Erratum
Although the HRES dataset contains an accumulated precipitation variable, this value is always zero, so during the fine-tuning the process the model was trained towards a prediction of zero precipitation.
Since precpitation is the one predicted variable that is not given as an input, we do not think that this error will have any impact on the prediction of the other output variables.
Loss function
The model was trained with the latitude and level-weighted mean squared error loss function, equation (19) in the supplementary material of Lam et al:
Normalizations
The GraphCast architecture takes normalized input data (z-scores) and outputs a forecast difference normalized by the standard deviation of 6-hour differences in a climatological dataset.
For these fields, we used the same normalization factors as the DeepMind GraphCast model, computed over the ERA5 dataset. Since the HRES data is very close to the ERA5 data, we re-used the ERA5 normalization factors without change during the model fine-tuning.
Training curriculum
The pre-training step closely followed the training curriculum of Lam et al.:
Pre-training
Stage | Batches | Forecast Length | Learning Rate |
---|---|---|---|
1 (Warmup) | 1000 | 1 step (6 h) | (linear) |
2 | 299000 | 1 step (6 h) | (cosine) |
3 | 1000 each | 2β12 steps (12β72 h) | (constant) |
Fune-tuning
Stage | Batches | Forecast Length | Learning Rate |
---|---|---|---|
Fine tune | 5000 | 12 steps (72 h) | (constant) |
In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement. On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.
Optimizer
As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters and and weight decay of on the weight matrices. Unlike Lam et al., we did not need to impose gradient clipping for stability.
Validation
Validation data/plots to come
Model weights
The fully-trained model weights are available as geml_1.0.ckpt in this repository.
For research purposes, we will also shortly update this repostiory to include intermediate checkpoints from the pretraining and fine-tuning process.
References
[1]: R. Lam et al., βLearning skillful medium-range global weather forecasting,β Science, vol. 382, no. 6677, pp. 1416β1421, Dec. 2023, doi: 10.1126/science.adi2336.
[2]: S. Rasp et al., βWeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models,β Journal of Advances in Modeling Earth Systems, vol. 16, no. 6, p. e2023MS004019, 2024, doi: 10.1029/2023MS004019.
[3]: I. Loshchilov and F. Hutter, βDecoupled Weight Decay Regularization,β Jan. 04, 2019, arXiv: arXiv:1711.05101. doi: 10.48550/arXiv.1711.05101.