Koyna-V2-1b-instruct - F16 GGUF

This repository contains the 16-bit (F16) GGUF quantized version of Govind222/Koyna-V2-1b-instruct.

Model Details

Base Model: Govind222/Koyna-V2-1b-instruct
Quantization: F16 (16-bit floating point)
File Size: ~2GB
Use Case: High-quality inference with llama.cpp

Usage

Download Model:

huggingface-cli download Govind222/Koyna-V2-1b-instruct-GGUF koyna-v2-1b.F16.gguf --local-dir ./models

With llama.cpp:

./main -m ./models/koyna-v2-1b.F16.gguf -p "Your prompt here" -n 100

With Python (llama-cpp-python):

from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="./models/koyna-v2-1b.F16.gguf",
    n_ctx=2048,  # Context length
    n_threads=8  # Number of threads
)

# Generate text
output = llm("Your prompt here", max_tokens=100)
print(output['choices'][0]['text'])

With Ollama:

# Create Modelfile
echo 'FROM ./models/koyna-v2-1b.F16.gguf' > Modelfile

# Create model
ollama create koyna-v2 -f Modelfile

# Run model
ollama run koyna-v2

Performance

F16 quantization provides:

Highest quality: Minimal precision loss
Good compatibility: Works with most inference engines
Moderate size: ~2GB file size

Original Model

This is a quantized version of Govind222/Koyna-V2-1b-instruct. Please refer to the original model card for more details about the model's capabilities, training data, and intended use cases.

Govind222
/

Koyna-V2-1b-instruct-GGUF

You need to agree to share your contact information to access this model