Eduardo-Calo
/

formulaicness

Model card Files Files and versions

Eduardo-Calo commited on Sep 12

Commit

0b074f0

·

verified ·

1 Parent(s): 46f661e

Update README.md

Files changed (1) hide show

README.md +71 -1

README.md CHANGED Viewed

@@ -4,4 +4,74 @@ language:
 - en
 base_model:
 - google-bert/bert-base-uncased
----

 - en
 base_model:
 - google-bert/bert-base-uncased
+---
+# Model Information
+This model is based on BERT. It is fine-tuned using a regression head to predict the "formulaicness" of texts. This model was created with logic-to-text generation in mind, a case study. Therefore, it may not work well with all types of sentences.
+# Model Details
+- **Authors**: Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt, Kees van Deemter
+- **Main Affiliation**: Utrecht University
+- **GitHub Repository**: [Formulaicness](https://github.com/Eduardo-Calo/formulaicness)
+- **Paper**: _Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation_
+- **Contact**: [email protected]
+# Usage Example
+```python
+# === Load tokenizer ===
+tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+# === BERT regression model ===
+class BertRegressionModel(BertPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.bert = BertModel(config)
+        self.regressor = nn.Linear(config.hidden_size, 1)
+        self.init_weights()
+    def forward(self, input_ids, attention_mask=None, labels=None):
+        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
+        pooled_output = outputs.pooler_output
+        logits = self.regressor(pooled_output).squeeze(-1)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.MSELoss()
+            loss = loss_fct(logits, labels)
+        return {"loss": loss, "logits": logits}
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = BertRegressionModel.from_pretrained("eduardo-calo/formulaicness")
+model.to(device)
+def predict_formulaicness(text: str) -> float:
+    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
+    # Only pass input_ids and attention_mask to the model
+    inputs = {key: inputs[key].to(device) for key in ['input_ids', 'attention_mask']}
+    with torch.no_grad():
+        outputs = model(**inputs)
+    return outputs['logits'].item()
+sents = [
+    "for all x x is a cube",
+    "for all x if x is a cube and y is a tetrahedron then x is to the right of y",
+    "some primes are even",
+    "if a is a cube and b is a tetrahedron then a is to the right of b",
+    "no cube is to the right of a tetrahedron",
+    "In case of a cube and a tetrahedron, the cube is to the right of the tetrahedron.",
+    "F and E are beautiful letters.",
+    "The cat sat on the mat.",
+    ]
+for sent in sents:
+    prob = predict_formulaicness(sent)
+    print(f"Probability of formulaicness: {prob:.2f}")
+```
+# Citation
+If you find this work helpful or use any artifact coming from it, please cite our paper as follows:
+```bibtext
+Coming Soon
+```