Model Information

This model is based on BERT. It is fine-tuned using a regression head to predict the "formulaicness" of texts. This model was created with logic-to-text generation in mind, a case study. Therefore, it may not work well with all types of sentences.

Model Details

  • Authors: Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt, Kees van Deemter
  • Main Affiliation: Utrecht University
  • GitHub Repository: Formulaicness
  • Paper: Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation
  • Contact: [email protected]

Usage Example

# === Load tokenizer ===
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# === BERT regression model ===
class BertRegressionModel(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.bert = BertModel(config)
        self.regressor = nn.Linear(config.hidden_size, 1)
        self.init_weights()

    def forward(self, input_ids, attention_mask=None, labels=None):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.pooler_output
        logits = self.regressor(pooled_output).squeeze(-1)
        loss = None
        if labels is not None:
            loss_fct = nn.MSELoss()
            loss = loss_fct(logits, labels)
        return {"loss": loss, "logits": logits}
    
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = BertRegressionModel.from_pretrained("eduardo-calo/formulaicness")
model.to(device)

def predict_formulaicness(text: str) -> float:
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    # Only pass input_ids and attention_mask to the model
    inputs = {key: inputs[key].to(device) for key in ['input_ids', 'attention_mask']}
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs['logits'].item()

sents = [
    "for all x x is a cube",
    "for all x if x is a cube and y is a tetrahedron then x is to the right of y",
    "some primes are even",
    "if a is a cube and b is a tetrahedron then a is to the right of b",
    "no cube is to the right of a tetrahedron",
    "In case of a cube and a tetrahedron, the cube is to the right of the tetrahedron.",
    "F and E are beautiful letters.",
    "The cat sat on the mat.",
    ]

for sent in sents:
    prob = predict_formulaicness(sent)
    print(f"Probability of formulaicness: {prob:.2f}")

Citation

If you find this work helpful or use any artifact coming from it, please cite our paper as follows:

Coming Soon
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Eduardo-Calo/formulaicness

Finetuned
(5808)
this model