Tool-Calling LoRA for LLaMA-3-8B-Instruct

This is a LoRA (Low-Rank Adaptation) model fine-tuned on tool-calling datasets to enhance the model's ability to generate structured JSON responses for tool execution.

Model Details

Base Model: meta-llama/Meta-Llama-3-8B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 16
LoRA Alpha: 32
Training Dataset: Custom tool-calling dataset with 357 samples
Training Epochs: 5
Learning Rate: 5.0e-5

Usage

Load the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

# Load and merge LoRA
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/llama-traces")
model = model.merge_and_unload()

# Generate tool-calling responses
def generate_tool_call(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Check the weather in New York"
response = generate_tool_call(prompt)
print(response)

Expected Output Format

The model generates structured JSON responses like:

{
  "trace_id": "002",
  "steps": [
    {
      "action": "call_api",
      "api": "weather_api",
      "arguments": {"location": "New York"}
    },
    {
      "action": "respond",
      "message": "The weather in New York is currently sunny with a temperature of 72°F."
    }
  ]
}

Training Details

Dataset: Custom tool-calling dataset with instruction/input/output format
Template: llama3 chat template
Cutoff Length: 4096 tokens
Batch Size: 2 (effective batch size: 8 with gradient accumulation)
Optimizer: AdamW with cosine learning rate scheduling
Warmup Ratio: 0.1

Performance

The model shows improved capability in:

Generating structured JSON responses
Following tool-calling patterns
Maintaining context for multi-step tool execution
Producing consistent output formats

Limitations

Requires the base LLaMA-3-8B-Instruct model to function
May generate invalid JSON in some edge cases
Performance depends on the quality of the training data

License

This model is released under the MIT License.

pavan01729
/

llama-traces