Uploaded model
- Developed by: deshanksuman
- License: apache-2.0
- Finetuned from model : meta-llama/Llama-3.1-8B-Instruct
Dataset
Fews Training data arranged in the format of Instruction, Input and output https://huggingface.co/datasets/deshanksuman/WSD_DATASET_FEWS
Code
%%capture
import re
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
from pydantic import BaseModel
import json
# Load local model + LoRA adapter as before
local_directory = "meta-llama/Llama-3.1-8B-Instruct"
adapter_repo = "deshanksuman/finetuned-WSD-llama3-8b-Instruct_all"
access_token = "hfxtoken"
tokenizer = AutoTokenizer.from_pretrained(local_directory, use_auth_token=access_token)
base_model = AutoModelForCausalLM.from_pretrained(
local_directory,
use_auth_token=access_token,
device_map="auto",
torch_dtype="auto",
load_in_4bit=False
)
model = PeftModel.from_pretrained(base_model, adapter_repo, use_auth_token=access_token)
model.to("cuda" if torch.cuda.is_available() else "cpu")
# Function to generate structured JSON response
def generate_structured_response(question, context="You are a helpful assistant. Respond only with valid JSON.", device="cuda"):
prompt = (
f"{context}\n\n"
f"Question: {question}\n\n"
f"Respond with valid JSON only in the format: {{\"meaning\":}}"
)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
output_ids = model.generate(
inputs.input_ids,
max_new_tokens=256,
temperature=0.3,
top_p=0.9,
do_sample=True,
num_beams=3,
no_repeat_ngram_size=3,
early_stopping=True
)
response_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
return response_text
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for deshanksuman/finetuned-WSD-llama3-8b-Instruct_all
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct