This model is the initial test version, finetuned using LLaMA-3-8B version provided by UnslothAI in Nepali Language.

Model Details

Directly quantized 4bit model with bitsandbytes. Built with Meta Llama 3. By UnslothAI.

  • Developed by: Norden Ghising Tamang under DarviLab Pvt. Ltd
  • Model type: Transformer-based language model
  • Language(s) (NLP): Nepali
  • License: A custom commercial license is available at: https://llama.meta.com/llama3/license

How To Use

Using HuggingFace's AutoModelForPeftCausalLM

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    "nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1"
    load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained("nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1")

Using UnslothAI [x2 Faster Inference]

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    alpaca_prompt.format(
        "गौतम बुद्धको जन्म कुन देशमा भएको थियो?"  # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
tokenizer.batch_decode(outputs)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1

Finetuned
(2585)
this model