metadata

language:
  - en
license: mit
tags:
  - lora
  - tool-calling
  - llama3
  - instruction-tuning
  - json-generation
base_model: meta-llama/Meta-Llama-3-8B-Instruct

Tool-Calling LoRA for LLaMA-3-8B-Instruct

This is a LoRA (Low-Rank Adaptation) model fine-tuned on tool-calling datasets to enhance the model's ability to generate structured JSON responses for tool execution.

Model Details

Base Model: meta-llama/Meta-Llama-3-8B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 16
LoRA Alpha: 32
Training Dataset: Custom tool-calling dataset with 357 samples
Training Epochs: 5
Learning Rate: 5.0e-5

Usage

Load the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

# Load and merge LoRA
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/llama-traces")
model = model.merge_and_unload()

# Generate tool-calling responses
def generate_tool_call(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Check the weather in New York"
response = generate_tool_call(prompt)
print(response)

Expected Output Format

The model generates structured JSON responses like:

{
  "trace_id": "002",
  "steps": [
    {
      "action": "call_api",
      "api": "weather_api",
      "arguments": {"location": "New York"}
    },
    {
      "action": "respond",
      "message": "The weather in New York is currently sunny with a temperature of 72°F."
    }
  ]
}

Training Details

Dataset: Custom tool-calling dataset with instruction/input/output format
Template: llama3 chat template
Cutoff Length: 4096 tokens
Batch Size: 2 (effective batch size: 8 with gradient accumulation)
Optimizer: AdamW with cosine learning rate scheduling
Warmup Ratio: 0.1

Performance

The model shows improved capability in:

Generating structured JSON responses
Following tool-calling patterns
Maintaining context for multi-step tool execution
Producing consistent output formats

Limitations

Requires the base LLaMA-3-8B-Instruct model to function
May generate invalid JSON in some edge cases
Performance depends on the quality of the training data

License

This model is released under the MIT License.