Koyna-V2-1b-instruct - F16 GGUF
This repository contains the 16-bit (F16) GGUF quantized version of Govind222/Koyna-V2-1b-instruct.
Model Details
- Base Model: Govind222/Koyna-V2-1b-instruct
- Quantization: F16 (16-bit floating point)
- File Size: ~2GB
- Use Case: High-quality inference with llama.cpp
Usage
Download Model:
huggingface-cli download Govind222/Koyna-V2-1b-instruct-GGUF koyna-v2-1b.F16.gguf --local-dir ./models
With llama.cpp:
./main -m ./models/koyna-v2-1b.F16.gguf -p "Your prompt here" -n 100
With Python (llama-cpp-python):
from llama_cpp import Llama
# Load model
llm = Llama(
model_path="./models/koyna-v2-1b.F16.gguf",
n_ctx=2048, # Context length
n_threads=8 # Number of threads
)
# Generate text
output = llm("Your prompt here", max_tokens=100)
print(output['choices'][0]['text'])
With Ollama:
# Create Modelfile
echo 'FROM ./models/koyna-v2-1b.F16.gguf' > Modelfile
# Create model
ollama create koyna-v2 -f Modelfile
# Run model
ollama run koyna-v2
Performance
F16 quantization provides:
- Highest quality: Minimal precision loss
- Good compatibility: Works with most inference engines
- Moderate size: ~2GB file size
Original Model
This is a quantized version of Govind222/Koyna-V2-1b-instruct. Please refer to the original model card for more details about the model's capabilities, training data, and intended use cases.
- Downloads last month
- 2
Hardware compatibility
Log In
to view the estimation
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Govind222/Koyna-V2-1b-instruct-GGUF
Base model
google/gemma-3-1b-pt
Finetuned
google/gemma-3-1b-it
Finetuned
Govind222/Koyna-V2-1b-instruct