š„ Introducing BetaEarth - your own Earth embedding emulator [šš«š-ššš„ššš¬š]
The past year has brought many notable embedding products, like AlphaEarth, TESSERA or OlmoEarth. We are entering a phase where embeddings begin to act as a substitute for real observation data.
BetaEarth is an attempt to explore how much one can learn from a model based on its embeddings alone, and whether those embeddings can serve as a useful training target for other models. Huge credit to the AlphaEarth team for releasing the embedding archive openly ā it's what made this kind of community-built extension possible.
BetaEarth is a flexible (and relatively lightweight) emulator of the AlphaEarth annual product. It doesn't reproduce AlphaEarth's exact outputs, nor the product, but it reaches ~0.87 cosine similarity on held-out data and retains 97% of downstream land-cover classification accuracy. It only took 1-2 days to train.
It can encode any combination (including multi-temporal) of: - Sentinel-2 L1C - Sentinel-2 L2A - Sentinel-1 RTC - COP-DEM 30 product
The model weights are open, just like its training data (built exclusively using Major TOM). The GitHub repository provides a script for automated generation of embeddings across any footprint. You can also try the workflow over small bounding boxes on the free Hugging Face web app!