File size: 3,924 Bytes
eae5abd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7f471a8
eae5abd
7f471a8
 
eae5abd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
pipeline_tag: fill-mask
tags:
- masked-language-modeling
- fill-mask
- distilbert
- imdb
- domain-adaptation
- nlp
- transformers
model-index:
- name: distilbert-imdb_mask_model
  results:
  - task:
      name: Masked Language Modeling
      type: fill-mask
    dataset:
      name: IMDB Movie Reviews (unsupervised text)
      type: imdb
      split: train
    metrics:
      - name: Loss
        type: loss
        value: N/A
      - name: Perplexity
        type: perplexity
        value: N/A
---

# Masked Language Modeling

## πŸ“Œ Model Overview
This model is a fine-tuned version of **distilbert-base-uncased** on the **IMDb dataset** using the **Masked Language Modeling (MLM)** objective.  
It is designed for **domain adaptation**, helping DistilBERT better understand the linguistic style of IMDb movie reviews.

---

## ✨ What this model does

- Learns to predict masked tokens in movie-review text (MLM / `fill-mask`).
- Helpful as a **domain-adapted backbone** for:
  - Sentiment analysis on reviews
  - Topic classification / intent
  - Review-specific QA / RAG preprocessing
  - Any task that benefits from in-domain representations

---

## πŸš€ Quickstart

### Use with `pipeline` (Fill-Mask)

```python
from transformers import pipeline

pipe = pipeline("fill-mask", model="azherali/distilbert-imdb_mask_model")

text = "This movie was absolutely [MASK] and the performances were stunning."
pipe(text)
# [{'sequence': 'this movie was absolutely fantastic ...', 'score': ...}, ...]

for x in pipe(text):
  print(x["sequence"])

output:
# this movie was absolutely fantastic and the performances were stunning.
# this movie was absolutely stunning and the performances were stunning.
# this movie was absolutely beautiful and the performances were stunning.
# this movie was absolutely brilliant and the performances were stunning.
# this movie was absolutely wonderful and the performances were stunning.

```
### Use with AutoModel (programmatic logits)


```python
import torch
from transformers import AutoModelForMaskedLM,AutoTokenizer

model_checkpoint = "azherali/distilbert-imdb_mask_model"

model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

text ="This movie was absolutely [MASK] and the performances were stunning."

inputs = tokenizer(text, return_tensors="pt")
token_logits = model(**inputs).logits
# Find the location of [MASK] and extract its logits
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
mask_token_logits = token_logits[0, mask_token_index, :]
# Pick the [MASK] candidates with the highest logits
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(f"'>>> {text.replace(tokenizer.mask_token, tokenizer.decode([token]))}'")

```




## πŸ“ˆ Training Results  

The model was trained for **5 epochs** on the IMDb dataset using the **Masked Language Modeling (MLM)** objective.  

**Loss Progression:**  
| Epoch | Training Loss | Validation Loss | Perplexity |
|-------|---------------|-----------------|-------------|
| 1     | 2.5249        | 2.3440          | 10.42       |
| 2     | 2.3985        | 2.2913          | 9.89        |
| 3     | 2.3441        | 2.2569          | 9.55        |
| 4     | 2.3079        | 2.2328          | 9.33        |
| 5     | 2.2869        | 2.2271          | 9.27        |

βœ”οΈ **Final Training Loss:** 2.28  
βœ”οΈ **Final Validation Loss:** 2.22  
βœ”οΈ **Final Perplexity:** 9.27  

---

## ⚑ Training Configuration  

- **Model:** distilbert-base-uncased  
- **Dataset:** IMDb (unsupervised)  
- **Epochs:** 5  
- **Batch Size:** 32  
- **Optimizer:** AdamW  
- **Learning Rate Scheduler:** Linear warmup + decay  
- **Total Steps:** 9,580  
- **Total FLOPs:** 1.02e+16  

---