GEML 1.0

Here, we introduce version 1.0 of the Global Environmental eMuLator (GEML), a data-driven model compatible with the ¼°, 13-level version of GraphCast (Lam et al. 2023, [1]). This model was trained by the Meteorological Research Divison (MRD) and Canadian Centre for Meteorological and Environmental Prediction (CCMEP), divisions of Environment and Climate Change Canada.

This model was trained "from scratch," using training code developed for other research projects. In inference (forecast production), this model is fully compatible with the model code in DeepMind's GraphCast repository.

License

These model weights are available under the Canada Open Government license, which permits derivative works and commercial use with attribution.

Variables

The model predicts the following meteorological variables on a ¼° latitude/longitude grid (with poles):

At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation[†]

[†] — This variable is incorrect. Please see the 'erratum' section.

The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels. For points that lie below the surface, extrapolated values are given (i.e. they are not masked).

The model timestep is 6 hours, and the model takes two time levels as input. That is, to produce a forecast valid at 12Z, the model needs input data at 6Z and 0Z.

Input data

All forecast variables except accumulated precipitation and taken as input values. The model also requires the surface geopotential, land-sea mask, and top-of-atmosphere incident solar radiation (accumulated over 1h) as input values.

The surface geopotential and land-sea mask are static variables. The incident solar radiation must be provded at both the input time-levels and the output time-level. This value can be calculated, and both the DeepMind GraphCast repository and the training code repository contain an incident solar radiation model.

Model training

Datasets

The model was pre-trained on ERA5 data (calendar years 1979–2015, inclusive), following the training configuration in Lam et al. Subsequently, it was fine-tuned on the "HRES initial conditions" dataset for calendar years 2016–2021.

Both of these datasets are available from the WeatherBench 2 project (Rasp et al. 2024, [2]).

Erratum

Although the HRES dataset contains an accumulated precipitation variable, this value is always zero, so during the fine-tuning the process the model was trained towards a prediction of zero precipitation.

Since precpitation is the one predicted variable that is not given as an input, we do not think that this error will have any impact on the prediction of the other output variables.

Loss function

The model was trained with the latitude and level-weighted mean squared error loss function, equation (19) in the supplementary material of Lam et al:

$\mathrm{MSE} = \underbrace{\sum_{\tau=1}^{N_t} \frac{1}{N_t}}_{\text{Lead time}} \underbrace{\sum_{i,j} \frac{dA(i,j)}{4\pi}}_{\text{Space}} \underbrace{\sum_{k=0}^{N_k} w(k)}_{\text{Level}} \underbrace{\sum_{\mathrm{var}} \omega_\mathrm{var} }_{\text{variable}} \frac{ (\hat{x}_\mathrm{var}(i,j,k;\tau) - x_\mathrm{var}(i,j,k;\tau))^2}{\sigma^2_{\Delta\mathrm{var}}(k)}$

Normalizations

The GraphCast architecture takes normalized input data (z-scores) and outputs a forecast difference normalized by the standard deviation of 6-hour differences in a climatological dataset.

For these fields, we used the same normalization factors as the DeepMind GraphCast model, computed over the ERA5 dataset. Since the HRES data is very close to the ERA5 data, we re-used the ERA5 normalization factors without change during the model fine-tuning.

Training curriculum

The pre-training step closely followed the training curriculum of Lam et al.:

Pre-training

Stage	Batches	Forecast Length	Learning Rate
1 (Warmup)	1000	1 step (6 h)	$0 \to 10^{-3}$ (linear)
2	299000	1 step (6 h)	$10^{-3} \to 3 \cdot 10^{-7}$ (cosine)
3	1000 each	2–12 steps (12–72 h)	$3 \cdot 10^{-7}$ (constant)

Fune-tuning

Stage	Batches	Forecast Length	Learning Rate
Fine tune	5000	12 steps (72 h)	$3 \cdot 10^{-7}$ (constant)

In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement. On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.

Optimizer

As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters $\beta_1 = 0.9$ and $\beta_2 = 0.95$ and weight decay of $0.1$ on the weight matrices. Unlike Lam et al., we did not need to impose gradient clipping for stability.

Validation

Validation data/plots to come

Model weights

The fully-trained model weights are available as geml_1.0.ckpt in this repository.

For research purposes, we will also shortly update this repostiory to include intermediate checkpoints from the pretraining and fine-tuning process.

References

[1]: R. Lam et al., “Learning skillful medium-range global weather forecasting,” Science, vol. 382, no. 6677, pp. 1416–1421, Dec. 2023, doi: 10.1126/science.adi2336.

[2]: S. Rasp et al., “WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models,” Journal of Advances in Modeling Earth Systems, vol. 16, no. 6, p. e2023MS004019, 2024, doi: 10.1029/2023MS004019.

[3]: I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” Jan. 04, 2019, arXiv: arXiv:1711.05101. doi: 10.48550/arXiv.1711.05101.