Inference:
!pip install -q "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install -q --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes
from unsloth import FastLanguageModel
import torch
max_seq_length = 512
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Hinglish-Project/llama-3-8b-English-to-Hinglish",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
def pipe(prompt):
alpaca_prompt = """### Instrucion: Translate given text to Hinglish Text:
### Input:
{}
### Response:
"""
inputs = tokenizer(
[
alpaca_prompt.format(prompt),
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
raw_text = tokenizer.batch_decode(outputs)[0]
return raw_text.split("### Response:\n")[1].split("<|end_of_text|>")[0]
text = "This is a fine-tuned Hinglish translation model using Llama 3."
pipe(text)
## yeh ek fine-tuned Hinglish translation model hai jisme Llama 3 ka use kiya gaya hai.
Uploaded model
- Developed by: Hinglish-Project
- License: apache-2.0
- Finetuned from model : unsloth/llama-8b-bnb-4bit
This Llama3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 323
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.