Model Information
This model is based on BERT. It is fine-tuned using a regression head to predict the "formulaicness" of texts. This model was created with logic-to-text generation in mind, a case study. Therefore, it may not work well with all types of sentences.
Model Details
- Authors: Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt, Kees van Deemter
- Main Affiliation: Utrecht University
- GitHub Repository: Formulaicness
- Paper: Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation
- Contact: [email protected]
Usage Example
# === Load tokenizer ===
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
# === BERT regression model ===
class BertRegressionModel(BertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.bert = BertModel(config)
self.regressor = nn.Linear(config.hidden_size, 1)
self.init_weights()
def forward(self, input_ids, attention_mask=None, labels=None):
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
pooled_output = outputs.pooler_output
logits = self.regressor(pooled_output).squeeze(-1)
loss = None
if labels is not None:
loss_fct = nn.MSELoss()
loss = loss_fct(logits, labels)
return {"loss": loss, "logits": logits}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = BertRegressionModel.from_pretrained("eduardo-calo/formulaicness")
model.to(device)
def predict_formulaicness(text: str) -> float:
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
# Only pass input_ids and attention_mask to the model
inputs = {key: inputs[key].to(device) for key in ['input_ids', 'attention_mask']}
with torch.no_grad():
outputs = model(**inputs)
return outputs['logits'].item()
sents = [
"for all x x is a cube",
"for all x if x is a cube and y is a tetrahedron then x is to the right of y",
"some primes are even",
"if a is a cube and b is a tetrahedron then a is to the right of b",
"no cube is to the right of a tetrahedron",
"In case of a cube and a tetrahedron, the cube is to the right of the tetrahedron.",
"F and E are beautiful letters.",
"The cat sat on the mat.",
]
for sent in sents:
prob = predict_formulaicness(sent)
print(f"Probability of formulaicness: {prob:.2f}")
Citation
If you find this work helpful or use any artifact coming from it, please cite our paper as follows:
Coming Soon
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Eduardo-Calo/formulaicness
Base model
google-bert/bert-base-uncased