Graph Machine Learning
geml
graphcast
weather

GEML 1.0

Here, we introduce version 1.0 of the Global Environmental eMuLator (GEML), a data-driven model compatible with the ΒΌΒ°, 13-level version of GraphCast (Lam et al. 2023, [1]). This model was trained by the Meteorological Research Divison (MRD) and Canadian Centre for Meteorological and Environmental Prediction (CCMEP), divisions of Environment and Climate Change Canada.

This model was trained "from scratch," using training code developed for other research projects. In inference (forecast production), this model is fully compatible with the model code in DeepMind's GraphCast repository.

License

These model weights are available under the Canada Open Government license, which permits derivative works and commercial use with attribution.

Variables

The model predicts the following meteorological variables on a ΒΌΒ° latitude/longitude grid (with poles):

  • At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
  • At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation[†]

[†] β€” This variable is incorrect. Please see the 'erratum' section.

The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels. For points that lie below the surface, extrapolated values are given (i.e. they are not masked).

The model timestep is 6 hours, and the model takes two time levels as input. That is, to produce a forecast valid at 12Z, the model needs input data at 6Z and 0Z.

Input data

All forecast variables except accumulated precipitation and taken as input values. The model also requires the surface geopotential, land-sea mask, and top-of-atmosphere incident solar radiation (accumulated over 1h) as input values.

The surface geopotential and land-sea mask are static variables. The incident solar radiation must be provded at both the input time-levels and the output time-level. This value can be calculated, and both the DeepMind GraphCast repository and the training code repository contain an incident solar radiation model.

Model training

Datasets

The model was pre-trained on ERA5 data (calendar years 1979–2015, inclusive), following the training configuration in Lam et al. Subsequently, it was fine-tuned on the "HRES initial conditions" dataset for calendar years 2016–2021.

Both of these datasets are available from the WeatherBench 2 project (Rasp et al. 2024, [2]).

Erratum

Although the HRES dataset contains an accumulated precipitation variable, this value is always zero, so during the fine-tuning the process the model was trained towards a prediction of zero precipitation.

Since precpitation is the one predicted variable that is not given as an input, we do not think that this error will have any impact on the prediction of the other output variables.

Loss function

The model was trained with the latitude and level-weighted mean squared error loss function, equation (19) in the supplementary material of Lam et al:

MSE=βˆ‘Ο„=1Nt1Nt⏟Lead timeβˆ‘i,jdA(i,j)4Ο€βŸSpaceβˆ‘k=0Nkw(k)⏟Levelβˆ‘varΟ‰var⏟variable(x^var(i,j,k;Ο„)βˆ’xvar(i,j,k;Ο„))2σΔvar2(k) \mathrm{MSE} = \underbrace{\sum_{\tau=1}^{N_t} \frac{1}{N_t}}_{\text{Lead time}} \underbrace{\sum_{i,j} \frac{dA(i,j)}{4\pi}}_{\text{Space}} \underbrace{\sum_{k=0}^{N_k} w(k)}_{\text{Level}} \underbrace{\sum_{\mathrm{var}} \omega_\mathrm{var} }_{\text{variable}} \frac{ (\hat{x}_\mathrm{var}(i,j,k;\tau) - x_\mathrm{var}(i,j,k;\tau))^2}{\sigma^2_{\Delta\mathrm{var}}(k)}

Normalizations

The GraphCast architecture takes normalized input data (z-scores) and outputs a forecast difference normalized by the standard deviation of 6-hour differences in a climatological dataset.

For these fields, we used the same normalization factors as the DeepMind GraphCast model, computed over the ERA5 dataset. Since the HRES data is very close to the ERA5 data, we re-used the ERA5 normalization factors without change during the model fine-tuning.

Training curriculum

The pre-training step closely followed the training curriculum of Lam et al.:

Pre-training

Stage Batches Forecast Length Learning Rate
1 (Warmup) 1000 1 step (6 h) 0β†’10βˆ’30 \to 10^{-3} (linear)
2 299000 1 step (6 h) 10βˆ’3β†’3β‹…10βˆ’710^{-3} \to 3 \cdot 10^{-7} (cosine)
3 1000 each 2–12 steps (12–72 h) 3β‹…10βˆ’73 \cdot 10^{-7} (constant)

Fune-tuning

Stage Batches Forecast Length Learning Rate
Fine tune 5000 12 steps (72 h) 3β‹…10βˆ’73 \cdot 10^{-7} (constant)

In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement. On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.

Optimizer

As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters Ξ²1=0.9\beta_1 = 0.9 and Ξ²2=0.95\beta_2 = 0.95 and weight decay of 0.10.1 on the weight matrices. Unlike Lam et al., we did not need to impose gradient clipping for stability.

Validation

Validation data/plots to come

Model weights

The fully-trained model weights are available as geml_1.0.ckpt in this repository.

For research purposes, we will also shortly update this repostiory to include intermediate checkpoints from the pretraining and fine-tuning process.

References

[1]: R. Lam et al., β€œLearning skillful medium-range global weather forecasting,” Science, vol. 382, no. 6677, pp. 1416–1421, Dec. 2023, doi: 10.1126/science.adi2336.

[2]: S. Rasp et al., β€œWeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models,” Journal of Advances in Modeling Earth Systems, vol. 16, no. 6, p. e2023MS004019, 2024, doi: 10.1029/2023MS004019.

[3]: I. Loshchilov and F. Hutter, β€œDecoupled Weight Decay Regularization,” Jan. 04, 2019, arXiv: arXiv:1711.05101. doi: 10.48550/arXiv.1711.05101.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support