utkmst/chimera-beta-test2-lora-merged

Model Description

This model is a fine-tuned version of Meta's Llama-3.1-8B-Instruct model, created through LoRA fine-tuning on multiple instruction datasets, followed by merging the adapter weights with the base model.

Architecture

  • Base Model: meta-llama/Llama-3.1-8B-Instruct
  • Size: 8.03B parameters
  • Type: Decoder-only transformer
  • Format: SafeTensors (full precision)

Training Details

  • Training Method: LoRA fine-tuning followed by adapter merging
  • LoRA Configuration:
    • Rank: 8
    • Alpha: 16
    • Trainable modules: Attention layers and feed-forward networks
  • Training Hyperparameters:
    • Learning rate: 2e-4
    • Batch size: 2
    • Training epochs: 1
    • Optimizer: AdamW with constant scheduler

Intended Use

This model is designed for:

  • General purpose assistant capabilities
  • Question answering and knowledge retrieval
  • Creative content generation
  • Instructional guidance

Limitations

  • Base model limitations including potential hallucinations and factual inaccuracies
  • Limited context window compared to larger models
  • Knowledge cutoff from the base Llama-3.1 model
  • May exhibit biases present in training data
  • Performance on specialized tasks may vary

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("utkmst/chimera-beta-test2-lora-merged")
tokenizer = AutoTokenizer.from_pretrained("utkmst/chimera-beta-test2-lora-merged")

License

This model inherits the license from Meta's Llama 3.1.

Downloads last month
86
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for utkmst/chimera-beta-test2-lora-merged

Finetuned
(993)
this model

Datasets used to train utkmst/chimera-beta-test2-lora-merged

Evaluation results