Eduardo-Calo commited on
Commit
0b074f0
·
verified ·
1 Parent(s): 46f661e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -1
README.md CHANGED
@@ -4,4 +4,74 @@ language:
4
  - en
5
  base_model:
6
  - google-bert/bert-base-uncased
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - en
5
  base_model:
6
  - google-bert/bert-base-uncased
7
+ ---
8
+
9
+ # Model Information
10
+
11
+ This model is based on BERT. It is fine-tuned using a regression head to predict the "formulaicness" of texts. This model was created with logic-to-text generation in mind, a case study. Therefore, it may not work well with all types of sentences.
12
+
13
+ # Model Details
14
+ - **Authors**: Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt, Kees van Deemter
15
+ - **Main Affiliation**: Utrecht University
16
+ - **GitHub Repository**: [Formulaicness](https://github.com/Eduardo-Calo/formulaicness)
17
+ - **Paper**: _Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation_
18
+ - **Contact**: [email protected]
19
+
20
+ # Usage Example
21
+ ```python
22
+ # === Load tokenizer ===
23
+ tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
24
+
25
+ # === BERT regression model ===
26
+ class BertRegressionModel(BertPreTrainedModel):
27
+ def __init__(self, config):
28
+ super().__init__(config)
29
+ self.bert = BertModel(config)
30
+ self.regressor = nn.Linear(config.hidden_size, 1)
31
+ self.init_weights()
32
+
33
+ def forward(self, input_ids, attention_mask=None, labels=None):
34
+ outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
35
+ pooled_output = outputs.pooler_output
36
+ logits = self.regressor(pooled_output).squeeze(-1)
37
+ loss = None
38
+ if labels is not None:
39
+ loss_fct = nn.MSELoss()
40
+ loss = loss_fct(logits, labels)
41
+ return {"loss": loss, "logits": logits}
42
+
43
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
44
+
45
+ model = BertRegressionModel.from_pretrained("eduardo-calo/formulaicness")
46
+ model.to(device)
47
+
48
+ def predict_formulaicness(text: str) -> float:
49
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
50
+ # Only pass input_ids and attention_mask to the model
51
+ inputs = {key: inputs[key].to(device) for key in ['input_ids', 'attention_mask']}
52
+ with torch.no_grad():
53
+ outputs = model(**inputs)
54
+ return outputs['logits'].item()
55
+
56
+ sents = [
57
+ "for all x x is a cube",
58
+ "for all x if x is a cube and y is a tetrahedron then x is to the right of y",
59
+ "some primes are even",
60
+ "if a is a cube and b is a tetrahedron then a is to the right of b",
61
+ "no cube is to the right of a tetrahedron",
62
+ "In case of a cube and a tetrahedron, the cube is to the right of the tetrahedron.",
63
+ "F and E are beautiful letters.",
64
+ "The cat sat on the mat.",
65
+ ]
66
+
67
+ for sent in sents:
68
+ prob = predict_formulaicness(sent)
69
+ print(f"Probability of formulaicness: {prob:.2f}")
70
+ ```
71
+
72
+ # Citation
73
+ If you find this work helpful or use any artifact coming from it, please cite our paper as follows:
74
+
75
+ ```bibtext
76
+ Coming Soon
77
+ ```