Update README.md

Browse files

Files changed (1) hide show

README.md +41 -7

README.md CHANGED Viewed

@@ -34,18 +34,12 @@ tags:
 **Intended usage:** Use this model to **classify sentence-level sentiment** in **CBDC** texts (reports, consultations, speeches, research notes, reputable news). It is **domain-specific** and *not intended* for generic or informal sentiment tasks.
----
 ## Preprocessing & class imbalance
 Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) using **max\_length=320** with truncation and **dynamic padding** via `DataCollatorWithPadding`. To address imbalance, training used *Focal Loss (γ=1.0)* with **class weights** computed from the *train* split (`class_weight="balanced"`) applied in the loss, plus a *WeightedRandomSampler* with √(inverse-frequency) *per-sample weights*.
----
 ## Training procedure
 Training used **[`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT)** as the base, with a 3-label `AutoModelForSequenceClassification` head. Optimization was *AdamW* (HF Trainer) with *learning rate 2e-5*, *batch size 16* (train/eval), and up to *8 epochs* with early stopping (patience=2)*—best epoch \~*6*. A *warmup ratio of 0.06*, *weight decay 0.01*, and *fp16* precision were applied. Runs were seeded (*42*) and executed on *Google Colab (T4)*.
----
 ## Evaluation
 On the **validation split** (\~10% of data), the model achieved **accuracy** *0.8458*, **macro-F1** *0.8270*, and **weighted-F1** *0.8453*.
@@ -63,6 +57,29 @@ Note: On the **entire annotated dataset** (in-domain evaluation, no hold-out), t
 ---
 ## Usage
 ```python
@@ -82,4 +99,21 @@ for s in sentences:
     print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")
 # Example output:
-# [{CBDCs will revolutionize payment systems and improve financial inclusion. → positive (score=0.9789)}]

 **Intended usage:** Use this model to **classify sentence-level sentiment** in **CBDC** texts (reports, consultations, speeches, research notes, reputable news). It is **domain-specific** and *not intended* for generic or informal sentiment tasks.
 ## Preprocessing & class imbalance
 Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) using **max\_length=320** with truncation and **dynamic padding** via `DataCollatorWithPadding`. To address imbalance, training used *Focal Loss (γ=1.0)* with **class weights** computed from the *train* split (`class_weight="balanced"`) applied in the loss, plus a *WeightedRandomSampler* with √(inverse-frequency) *per-sample weights*.
 ## Training procedure
 Training used **[`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT)** as the base, with a 3-label `AutoModelForSequenceClassification` head. Optimization was *AdamW* (HF Trainer) with *learning rate 2e-5*, *batch size 16* (train/eval), and up to *8 epochs* with early stopping (patience=2)*—best epoch \~*6*. A *warmup ratio of 0.06*, *weight decay 0.01*, and *fp16* precision were applied. Runs were seeded (*42*) and executed on *Google Colab (T4)*.
 ## Evaluation
 On the **validation split** (\~10% of data), the model achieved **accuracy** *0.8458*, **macro-F1** *0.8270*, and **weighted-F1** *0.8453*.
 ---
+## Other CBDC Models
+This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication.
+| **Model**                       | **Purpose**                                                         | **Intended Use**                                                    | **Link**                                                               |
+| ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- |
+| **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996–2024).      | Base encoder for CBDC downstream tasks; fill-mask tasks.            | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) |
+| **bilalzafar/CBDC-BERT**        | Binary classifier: CBDC vs. Non-CBDC.                               | Flagging CBDC-related discourse in large corpora.                   | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT)               |
+| **bilalzafar/CBDC-Stance**      | 3-class stance model (Pro, Wait-and-See, Anti).                     | Research on policy stances and discourse monitoring.                | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance)           |
+| **bilalzafar/CBDC-Sentiment**   | 3-class sentiment model (Positive, Neutral, Negative).              | Tone analysis in central bank communications.                       | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment)     |
+| **bilalzafar/CBDC-Type**        | Classifies Retail, Wholesale, General CBDC mentions.                | Distinguishing policy focus (retail vs wholesale).                  | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type)               |
+| **bilalzafar/CBDC-Discourse**   | 3-class discourse classifier (Feature, Process, Risk-Benefit).      | Structured categorization of CBDC communications.                   | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse)     |
+| **bilalzafar/CentralBank-NER**  | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER)   |
+## Repository and Replication Package
+All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository:
+🔗 **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)**
+---
 ## Usage
 ```python
     print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")
 # Example output:
+# [{CBDCs will revolutionize payment systems and improve financial inclusion. → positive (score=0.9789)}]
+```
+---
+## Citation
+If you use this model, please cite as:
+**Zafar, M. B. (2025). *CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse*. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)**
+```bibtex
+@article{zafar2025centralbankbert,
+  title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
+  author={Zafar, Muhammad Bilal},
+  year={2025},
+  journal={SSRN Electronic Journal},
+  url={https://papers.ssrn.com/abstract=5404456}
+}