CloudOpsBERT: Domain-Specific Language Models for Cloud Operations
CloudOpsBERT is an open-source project exploring domain-adapted transformer models for cloud operations log analysis β specifically anomaly detection, reliability monitoring, and cost optimization.
This project fine-tunes lightweight BERT variants (e.g., DistilBERT) on large-scale system log datasets (HDFS, BGL) and provides ready-to-use models for the research and practitioner community.
π Motivation
Modern cloud platforms generate massive amounts of logs. Detecting anomalies in these logs is crucial for:
- Ensuring reliability (catching failures early),
- Improving cost efficiency (identifying waste or misconfigurations),
- Supporting autonomous operations (AIOps).
Generic LLMs and BERT models are not optimized for this domain. CloudOpsBERT bridges that gap by:
- Training on real log datasets (HDFS, BGL),
- Addressing imbalanced anomaly detection with class weighting,
- Publishing open-source checkpoints for reproducibility.
π Inference (Pretrained)
Predict anomaly probability for a single log line:
python src/predict.py \
--model_dir vaibhav2507/cloudops-bert \
--subfolder distributed-storage \
--text "ERROR dfs.DataNode: Lost connection to namenode"
Batch inference (file with one log line per row):
python src/predict.py \
--model_dir vaibhav2507/cloudops-bert \
--subfolder distributed-storage \
--file samples/sample_logs.txt \
--threshold 0.5 \
--jsonl_out predictions.jsonl
π Results
- HDFS (in-domain, test set)
- F1: 0.571
- Precision: 0.992
- Recall: 0.401
- AUROC: 0.730
- Threshold: 0.50 (tuneable)
- Cross-domain (HDFS β BGL)
- Performance degrades significantly due to dataset/domain shift (see paper).
- BGL (training in progress)
- Will be released as cloudops-bert (subfolder bgl) once full training is complete.
π¦ Models
- vaibhav2507/cloudops-bert (Hugging Face Hub)
- subfolder="distributed-storage" β HDFS-trained CloudOpsBERT
- subfolder="hpc" β BGL-trained CloudOpsBERT (coming soon)
- Each export includes:
- Model weights (pytorch_model.bin)
- Config with label mappings (normal, anomaly)
- Tokenizer files
π Quickstart (Scripts)
- Setup folders
bash scripts/setup_dirs.sh
- (optional) Download a local copy of a submodel from Hugging Face
bash scripts/fetch_pretrained.sh # downloads 'hdfs' by default
SUBFOLDER=bgl bash scripts/fetch_pretrained.sh # downloads 'bgl'
- Single-line prediction (directly from HF)
bash scripts/predict_line.sh "ERROR dfs.DataNode: Lost connection to namenode" hdfs
- Batch prediction (using local model folder)
bash scripts/make_sample_logs.sh
bash scripts/predict_file.sh samples/sample_logs.txt hdfs models/cloudops-bert-hdfs preds/preds_hdfs.jsonl
π Related Work
Several prior works have explored using BERT for log anomaly detection:
- Leveraging BERT and Hugging Face Transformers for Log Anomaly Detection
- Tutorial-style blog post demonstrating how to fine-tune BERT on log data with Hugging Face. Useful as an introduction, but not intended as a reproducible research artifact.
LogBERT (HelenGuohx/logbert)
- Academic prototype from ~2019β2020 focusing on modeling log sequences with BERT. Demonstrates feasibility but limited to in-domain experiments and lacks integration with modern Hugging Face tooling.
AnomalyBERT (Jhryu30/AnomalyBERT)
- Another exploratory repository showing BERT-based anomaly detection on logs, with dataset-specific preprocessing. Similar limitations in generalization and reproducibility.
π How CloudOpsBERT is different
- Domain-specific adaptation: explicitly trained for cloud operations logs (HDFS, BGL) with class-weighted loss.
- Cross-domain evaluation: includes in-domain and cross-domain benchmarks, highlighting generalization challenges.
- Reproducibility & usability: clean repo, scripts, and ready-to-use Hugging Face exports.
- Future directions: introduces MicroLM β compressed micro-language models for efficient edge/cloud hybrid inference.
- In short: previous work showed that βBERT can work for logs.β
- CloudOpsBERT operationalizes this idea into reproducible benchmarks, public models, and deployable tools for both researchers and practitioners.
π Citation
If you use CloudOpsBERT in your research or tools, please cite:
@misc{pandey2025cloudopsbert,
title={CloudOpsBERT: Domain-Specific Transformer Models for Cloud Operations Anomaly Detection},
author={Pandey, Vaibhav},
year={2025},
howpublished={GitHub, Hugging Face},
url={https://github.com/vaibhav-research/cloudops-bert}
}
- Downloads last month
- 35
Model tree for vaibhav2507/cloudops-bert
Base model
distilbert/distilbert-base-uncasedDatasets used to train vaibhav2507/cloudops-bert
Evaluation results
- f1 on HDFStest set self-reported0.571
- precision on HDFStest set self-reported0.992
- recall on HDFStest set self-reported0.401
- auroc on HDFStest set self-reported0.730
- threshold on HDFStest set self-reported0.500
- f1 on BGLtest set self-reported1.000
- precision on BGLtest set self-reported1.000
- recall on BGLtest set self-reported1.000
- auroc on BGLtest set self-reported1.000
- threshold on BGLtest set self-reported0.050