Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,74 @@ language:
|
|
4 |
- en
|
5 |
base_model:
|
6 |
- google-bert/bert-base-uncased
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- en
|
5 |
base_model:
|
6 |
- google-bert/bert-base-uncased
|
7 |
+
---
|
8 |
+
|
9 |
+
# Model Information
|
10 |
+
|
11 |
+
This model is based on BERT. It is fine-tuned using a regression head to predict the "formulaicness" of texts. This model was created with logic-to-text generation in mind, a case study. Therefore, it may not work well with all types of sentences.
|
12 |
+
|
13 |
+
# Model Details
|
14 |
+
- **Authors**: Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt, Kees van Deemter
|
15 |
+
- **Main Affiliation**: Utrecht University
|
16 |
+
- **GitHub Repository**: [Formulaicness](https://github.com/Eduardo-Calo/formulaicness)
|
17 |
+
- **Paper**: _Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation_
|
18 |
+
- **Contact**: [email protected]
|
19 |
+
|
20 |
+
# Usage Example
|
21 |
+
```python
|
22 |
+
# === Load tokenizer ===
|
23 |
+
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
24 |
+
|
25 |
+
# === BERT regression model ===
|
26 |
+
class BertRegressionModel(BertPreTrainedModel):
|
27 |
+
def __init__(self, config):
|
28 |
+
super().__init__(config)
|
29 |
+
self.bert = BertModel(config)
|
30 |
+
self.regressor = nn.Linear(config.hidden_size, 1)
|
31 |
+
self.init_weights()
|
32 |
+
|
33 |
+
def forward(self, input_ids, attention_mask=None, labels=None):
|
34 |
+
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
|
35 |
+
pooled_output = outputs.pooler_output
|
36 |
+
logits = self.regressor(pooled_output).squeeze(-1)
|
37 |
+
loss = None
|
38 |
+
if labels is not None:
|
39 |
+
loss_fct = nn.MSELoss()
|
40 |
+
loss = loss_fct(logits, labels)
|
41 |
+
return {"loss": loss, "logits": logits}
|
42 |
+
|
43 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
44 |
+
|
45 |
+
model = BertRegressionModel.from_pretrained("eduardo-calo/formulaicness")
|
46 |
+
model.to(device)
|
47 |
+
|
48 |
+
def predict_formulaicness(text: str) -> float:
|
49 |
+
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
|
50 |
+
# Only pass input_ids and attention_mask to the model
|
51 |
+
inputs = {key: inputs[key].to(device) for key in ['input_ids', 'attention_mask']}
|
52 |
+
with torch.no_grad():
|
53 |
+
outputs = model(**inputs)
|
54 |
+
return outputs['logits'].item()
|
55 |
+
|
56 |
+
sents = [
|
57 |
+
"for all x x is a cube",
|
58 |
+
"for all x if x is a cube and y is a tetrahedron then x is to the right of y",
|
59 |
+
"some primes are even",
|
60 |
+
"if a is a cube and b is a tetrahedron then a is to the right of b",
|
61 |
+
"no cube is to the right of a tetrahedron",
|
62 |
+
"In case of a cube and a tetrahedron, the cube is to the right of the tetrahedron.",
|
63 |
+
"F and E are beautiful letters.",
|
64 |
+
"The cat sat on the mat.",
|
65 |
+
]
|
66 |
+
|
67 |
+
for sent in sents:
|
68 |
+
prob = predict_formulaicness(sent)
|
69 |
+
print(f"Probability of formulaicness: {prob:.2f}")
|
70 |
+
```
|
71 |
+
|
72 |
+
# Citation
|
73 |
+
If you find this work helpful or use any artifact coming from it, please cite our paper as follows:
|
74 |
+
|
75 |
+
```bibtext
|
76 |
+
Coming Soon
|
77 |
+
```
|