CBDC-Sentiment: A Domain-Specific BERT for CBDC-Related Sentiment Analysis
CBDC-Sentiment is a 3-class (negative / neutral / positive) sentence-level BERT-based classifier built for Central Bank Digital Currency (CBDC) communications. It is trained to identify overall sentiment in central-bank style text such as consultations, speeches, reports, and reputable news.
Base Model: bilalzafar/CentralBank-BERT
โ CentralBank-BERT is a domain-adapted BERT base (uncased), pretrained on 66M+ tokens across 2M+ sentences from central-bank speeches published via the Bank for International Settlements (1996โ2024). It is optimized for masked-token prediction within the specialized domains of monetary policy, financial regulation, and macroeconomic communication, enabling better contextual understanding of central-bank discourse and financial narratives.
Training data: The dataset consists of 2,405 custom, manually annotated sentences related to Central Bank Digital Currencies (CBDCs), extracted from BIS speeches. The class distribution is neutral: 1,068 (44.41%), positive: 1,026 (42.66%), and negative: 311 (12.93%). The data is split row-wise, stratified by label, into train: 1,924, validation: 240, and test: 241 examples.
Intended usage: Use this model to classify sentence-level sentiment in CBDC texts (reports, consultations, speeches, research notes, reputable news). It is domain-specific and not intended for generic or informal sentiment tasks.
Preprocessing & class imbalance
Sentences were lowercased (no stemming/lemmatization) and tokenized with the base tokenizer from bilalzafar/CentralBank-BERT
using max_length=320 with truncation and dynamic padding via DataCollatorWithPadding
. To address imbalance, training used Focal Loss (ฮณ=1.0) with class weights computed from the train split (class_weight="balanced"
) applied in the loss, plus a WeightedRandomSampler with โ(inverse-frequency) per-sample weights.
Training procedure
Training used bilalzafar/CentralBank-BERT
as the base, with a 3-label AutoModelForSequenceClassification
head. Optimization was AdamW (HF Trainer) with learning rate 2e-5, batch size 16 (train/eval), and up to 8 epochs with early stopping (patience=2)โbest epoch ~6*. A warmup ratio of 0.06, weight decay 0.01, and fp16 precision were applied. Runs were seeded (42) and executed on Google Colab (T4).
Evaluation
On the validation split (~10% of data), the model achieved accuracy 0.8458, macro-F1 0.8270, and weighted-F1 0.8453. On the held-out test split (~10%), performance was accuracy 0.8216, macro-F1 0.8121, and weighted-F1 0.8216.
Per-class (test):
Class | Precision | Recall | F1 | Support |
---|---|---|---|---|
negative | 0.8214 | 0.7419 | 0.7797 | 31 |
neutral | 0.7857 | 0.8224 | 0.8037 | 107 |
positive | 0.8614 | 0.8447 | 0.8529 | 103 |
Note: On the entire annotated dataset (in-domain evaluation, no hold-out), the model reaches ~0.95 accuracy / weighted-F1. These should be considered upper bounds; the test split above is the main reference for generalization.
Other CBDC Models
This model is part of the CentralBank-BERT / CBDC model family, a suite of domain-adapted classifiers for analyzing central-bank communication.
Model | Purpose | Intended Use | Link |
---|---|---|---|
bilalzafar/CentralBank-BERT | Domain-adaptive masked LM trained on BIS speeches (1996โ2024). | Base encoder for CBDC downstream tasks; fill-mask tasks. | CentralBank-BERT |
bilalzafar/CBDC-BERT | Binary classifier: CBDC vs. Non-CBDC. | Flagging CBDC-related discourse in large corpora. | CBDC-BERT |
bilalzafar/CBDC-Stance | 3-class stance model (Pro, Wait-and-See, Anti). | Research on policy stances and discourse monitoring. | CBDC-Stance |
bilalzafar/CBDC-Sentiment | 3-class sentiment model (Positive, Neutral, Negative). | Tone analysis in central bank communications. | CBDC-Sentiment |
bilalzafar/CBDC-Type | Classifies Retail, Wholesale, General CBDC mentions. | Distinguishing policy focus (retail vs wholesale). | CBDC-Type |
bilalzafar/CBDC-Discourse | 3-class discourse classifier (Feature, Process, Risk-Benefit). | Structured categorization of CBDC communications. | CBDC-Discourse |
bilalzafar/CentralBank-NER | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | CentralBank-NER |
Repository and Replication Package
All training pipelines, preprocessing scripts, evaluation notebooks, and result outputs are available in the companion GitHub repository:
๐ https://github.com/bilalezafar/CentralBank-BERT
Usage
from transformers import pipeline
# Load pipeline
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Sentiment")
# Example sentences
sentences = [
"CBDCs will revolutionize payment systems and improve financial inclusion."
]
# Predict
for s in sentences:
result = classifier(s, return_all_scores=False)[0]
print(f"{s}\n โ {result['label']} (score={result['score']:.4f})\n")
# Example output:
# [{CBDCs will revolutionize payment systems and improve financial inclusion. โ positive (score=0.9789)}]
Citation
If you use this model, please cite as:
Zafar, M. B. (2025). CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse. SSRN. https://papers.ssrn.com/abstract=5404456
@article{zafar2025centralbankbert,
title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
author={Zafar, Muhammad Bilal},
year={2025},
journal={SSRN Electronic Journal},
url={https://papers.ssrn.com/abstract=5404456}
}
- Downloads last month
- 11
Model tree for bilalzafar/CBDC-Sentiment
Base model
google-bert/bert-base-uncased