You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Koyna-V2-1b-instruct - F16 GGUF

This repository contains the 16-bit (F16) GGUF quantized version of Govind222/Koyna-V2-1b-instruct.

Model Details

  • Base Model: Govind222/Koyna-V2-1b-instruct
  • Quantization: F16 (16-bit floating point)
  • File Size: ~2GB
  • Use Case: High-quality inference with llama.cpp

Usage

Download Model:

huggingface-cli download Govind222/Koyna-V2-1b-instruct-GGUF koyna-v2-1b.F16.gguf --local-dir ./models

With llama.cpp:

./main -m ./models/koyna-v2-1b.F16.gguf -p "Your prompt here" -n 100

With Python (llama-cpp-python):

from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="./models/koyna-v2-1b.F16.gguf",
    n_ctx=2048,  # Context length
    n_threads=8  # Number of threads
)

# Generate text
output = llm("Your prompt here", max_tokens=100)
print(output['choices'][0]['text'])

With Ollama:

# Create Modelfile
echo 'FROM ./models/koyna-v2-1b.F16.gguf' > Modelfile

# Create model
ollama create koyna-v2 -f Modelfile

# Run model
ollama run koyna-v2

Performance

F16 quantization provides:

  • Highest quality: Minimal precision loss
  • Good compatibility: Works with most inference engines
  • Moderate size: ~2GB file size

Original Model

This is a quantized version of Govind222/Koyna-V2-1b-instruct. Please refer to the original model card for more details about the model's capabilities, training data, and intended use cases.

Downloads last month
2
GGUF
Model size
1,000M params
Architecture
llama
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Govind222/Koyna-V2-1b-instruct-GGUF

Quantized
(2)
this model