tarun7r
/

Finance-Llama-8B-q4_k_m-GGUF

Text Generation

text-generation-inference

Model card Files Files and versions Community

tarun7r commited on Jun 6

Commit

912cf75

·

verified ·

1 Parent(s): 965a974

Update README.md

Files changed (1) hide show

README.md +70 -0

README.md CHANGED Viewed

@@ -55,6 +55,76 @@ To run the Q4_K_M quantized version (smaller and faster, with a slight trade-off
 ollama run martain7r/finance-llama-8b:q4_k_m
 ```
 **Citation 📌**
 ````

 ollama run martain7r/finance-llama-8b:q4_k_m
 ```
+To run the Finance-Llama-8B-q4_k_m-GGUF quantized model, you'll use llama.cpp via the llama-cpp-python library instead of Hugging Face Transformers. Here's the step-by-step solution:
+1. Install Required Libraries
+```bash
+pip install llama-cpp-python huggingface-hub
+```
+2. Download the GGUF Model
+Use the Hugging Face Hub to download the quantized model file:
+```bash
+from huggingface_hub import hf_hub_download
+model_name = "tarun7r/Finance-Llama-8B-q4_k_m-GGUF"  # Check for the correct repository
+model_file = "Finance-Llama-8B-GGUF-q4_K_M.gguf"     # Exact GGUF filename
+model_path = hf_hub_download(
+    repo_id=model_name,
+    filename=model_file,
+    local_dir="./models"
+)
+```
+3. Run the Quantized Model
+```bash
+from llama_cpp import Llama
+# Initialize the model
+llm = Llama(
+    model_path=model_path,
+    n_ctx=8192,           # Context window size
+    n_threads=8,          # CPU threads for inference
+    n_gpu_layers=-1,      # Offload all layers to GPU
+    verbose=False         # Disable verbose logging
+)
+# Define the prompt template
+finance_prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Input:
+{input}
+### Response:
+"""
+# Format the prompt
+system_message = "You are a highly knowledgeable finance chatbot. Your purpose is to provide accurate, insightful, and actionable financial advice."
+user_question = "What strategies can an individual investor use to diversify their portfolio effectively in a volatile market?"
+prompt = finance_prompt_template.format(
+    instruction=system_message,
+    input=user_question
+)
+# Generate response
+output = llm(
+    prompt,
+    max_tokens=2500,       # Limit response length
+    temperature=0.7,      # Creativity control
+    top_p=0.9,            # Nucleus sampling
+    echo=False,           # Return only the completion (not prompt)
+    stop=["###"]          # Stop at "###" to avoid extra text
+)
+# Extract and print the response
+response = output["choices"][0]["text"].strip()
+print("\n--- Response ---")
+print(response)
+```
 **Citation 📌**
 ````