Model Card for fvossel/Llama-3.1-8B-Instruct-nl-to-fol

This model contains LoRA adapter weights for the base model meta-llama/Llama-3.1-8B-Instruct. It was trained to translate natural language statements into First-Order Logic (FOL) representations.

Model Details

Model Description

Developed by: Vossel et al. at Osnabrück University
Funded by: Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 456666331
Model type: Decoder-only causal language model (LLaMA architecture)
Language(s) (NLP): Englisch, FOL
License: This repository contains only LoRA adapter weights, trained using the base model meta-llama/Llama-3.1-8B-Instruct, which is released under the Meta Llama 3 Community License. These adapter weights are released under the Apache 2.0 License.
Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Uses

Direct Use

This model is designed to translate natural language (NL) sentences into corresponding first-order logic (FOL) expressions. It can be used directly for:

Automated semantic parsing and formalization of NL statements into symbolic logic.
Supporting explainable AI systems that require symbolic reasoning based on language input.
Research in neurosymbolic AI, logic-based natural language understanding, and formal verification.
Integrating into larger pipelines for natural language inference, question answering, or knowledge base population.

Users should be aware that the model outputs symbolic formulas which may require further validation or post-processing depending on the application.

Downstream Use

This LoRA adapter can be further fine-tuned or combined with other models for specialized domains, e.g., legal, biomedical, or scientific text formalization. It is also suitable for integration into interactive systems that require formal reasoning capabilities.

Out-of-Scope Use

This model is not intended for general-purpose natural language generation.
It may not perform well on ambiguous, highly figurative, or out-of-domain sentences.
Users should not rely on the model output as final legal, medical, or safety-critical logic without expert review.

Recommendations

The model may produce logically incorrect or syntactically invalid formulas, especially on complex or ambiguous input. Users should validate outputs before use in critical applications.
As the training data contains inherent biases from the source datasets and GPT-4 generation, users should be cautious about potential biases reflected in the model’s predictions.
The model is specialized for English natural language and first-order logic; it may not generalize well to other languages or more expressive logics.
Users applying this model in downstream tasks should carefully evaluate its performance and consider human-in-the-loop validation to ensure correctness.
This model is intended for research and prototyping purposes and should not be used as a standalone solution in high-stakes decision making without expert oversight.

How to Get Started with the Model

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Model repository identifiers
base_model_name = "meta-llama/Llama-3.1-8B-Instruct"
lora_weights = "fvossel/Llama-3.1-8B-Instruct-nl-to-fol"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Load the base model
model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code=True, device_map="auto")

# Load the LoRA adapter weights
model = PeftModel.from_pretrained(model, lora_weights, device_map="auto")

# Define formatting function for preparing the input prompt
def formatting_func(text):
    return tokenizer.apply_chat_template(
        [
            {
                "role": "system",
                "content": (
                    "You are a helpful AI assistant that translates Natural Language (NL) text "
                    "into First-Order Logic (FOL) using only the given quantors and junctors: "
                    "∀ (for all), ∃ (there exists), ¬ (not), ∧ (and), ∨ (or), → (implies), "
                    "↔ (if and only if), ⊕ (xor). "
                    "Start your answer with '𝜙=' followed by the FOL-formula. Do not include any other text."
                ),
            },
            {"role": "user", "content": text},
        ],
        tokenize=False,
        add_generation_prompt=False,
    )

# Example NL input
input_text = "All dogs are animals."

# Use formatting_func to create the model input
prompt = formatting_func(input_text)

# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt", padding=True)

# Generate output
outputs = model.generate(**inputs, max_new_tokens=100)

# Decode and print the result
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was fine-tuned on two datasets:

WillowNLtoFOL: A dataset containing over 16,000 paired natural language (NL) sentences and first-order logic (FOL) expressions. Originally published as part of the following Master's thesis:

Deveci, İ. E. (2024). Transformer models for translating natural language sentences into formal logical expressions [M.S. - Master of Science]. Middle East Technical University.
Available at: https://open.metu.edu.tr/handle/11511/109445

Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).
MALLS-v0: A dataset with 34,000 pairs of NL statements and corresponding FOL rules, generated using GPT-4. The FOL rules were checked for syntactic correctness but not for semantic alignment with the NL statements. More details: https://huggingface.co/datasets/yuan-yang/MALLS-v0

Licensed under Attribution-NonCommercial 4.0 International. As the data was collected using GPT-4, usage is also subject to OpenAI’s terms of use: https://openai.com/policies/terms-of-use

Training Procedure

The model was fine-tuned using LoRA adapters on top of the pre-trained "meta-llama/Llama-3.1-8B-Instruct" model. The training involved:

Prompt-based instruction tuning

Training Hyperparameters

Training regime: bf16 mixed precision
Batch size: 8 (per device)
Learning rate: 1e-5
Number of epochs: 12
Optimizer: AdamW
Scheduler: Cosine learning rate scheduler
Warmup ratio: 0.05
Gradient accumulation steps: 2
Weight decay: 0.01
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Bias: none
Task type: CAUSAL_LM
Early stopping patience: 4 epochs

fvossel
/

Llama-3.1-8B-Instruct-nl-to-fol