Update README.md
Browse files
README.md
CHANGED
@@ -55,6 +55,76 @@ To run the Q4_K_M quantized version (smaller and faster, with a slight trade-off
|
|
55 |
ollama run martain7r/finance-llama-8b:q4_k_m
|
56 |
```
|
57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
**Citation 📌**
|
60 |
````
|
|
|
55 |
ollama run martain7r/finance-llama-8b:q4_k_m
|
56 |
```
|
57 |
|
58 |
+
To run the Finance-Llama-8B-q4_k_m-GGUF quantized model, you'll use llama.cpp via the llama-cpp-python library instead of Hugging Face Transformers. Here's the step-by-step solution:
|
59 |
+
|
60 |
+
1. Install Required Libraries
|
61 |
+
```bash
|
62 |
+
pip install llama-cpp-python huggingface-hub
|
63 |
+
```
|
64 |
+
|
65 |
+
2. Download the GGUF Model
|
66 |
+
Use the Hugging Face Hub to download the quantized model file:
|
67 |
+
|
68 |
+
```bash
|
69 |
+
from huggingface_hub import hf_hub_download
|
70 |
+
|
71 |
+
model_name = "tarun7r/Finance-Llama-8B-q4_k_m-GGUF" # Check for the correct repository
|
72 |
+
model_file = "Finance-Llama-8B-GGUF-q4_K_M.gguf" # Exact GGUF filename
|
73 |
+
|
74 |
+
model_path = hf_hub_download(
|
75 |
+
repo_id=model_name,
|
76 |
+
filename=model_file,
|
77 |
+
local_dir="./models"
|
78 |
+
)
|
79 |
+
```
|
80 |
+
|
81 |
+
3. Run the Quantized Model
|
82 |
+
|
83 |
+
```bash
|
84 |
+
from llama_cpp import Llama
|
85 |
+
|
86 |
+
# Initialize the model
|
87 |
+
llm = Llama(
|
88 |
+
model_path=model_path,
|
89 |
+
n_ctx=8192, # Context window size
|
90 |
+
n_threads=8, # CPU threads for inference
|
91 |
+
n_gpu_layers=-1, # Offload all layers to GPU
|
92 |
+
verbose=False # Disable verbose logging
|
93 |
+
)
|
94 |
+
|
95 |
+
# Define the prompt template
|
96 |
+
finance_prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
|
97 |
+
### Instruction:
|
98 |
+
{instruction}
|
99 |
+
### Input:
|
100 |
+
{input}
|
101 |
+
### Response:
|
102 |
+
"""
|
103 |
+
|
104 |
+
# Format the prompt
|
105 |
+
system_message = "You are a highly knowledgeable finance chatbot. Your purpose is to provide accurate, insightful, and actionable financial advice."
|
106 |
+
user_question = "What strategies can an individual investor use to diversify their portfolio effectively in a volatile market?"
|
107 |
+
|
108 |
+
prompt = finance_prompt_template.format(
|
109 |
+
instruction=system_message,
|
110 |
+
input=user_question
|
111 |
+
)
|
112 |
+
|
113 |
+
# Generate response
|
114 |
+
output = llm(
|
115 |
+
prompt,
|
116 |
+
max_tokens=2500, # Limit response length
|
117 |
+
temperature=0.7, # Creativity control
|
118 |
+
top_p=0.9, # Nucleus sampling
|
119 |
+
echo=False, # Return only the completion (not prompt)
|
120 |
+
stop=["###"] # Stop at "###" to avoid extra text
|
121 |
+
)
|
122 |
+
|
123 |
+
# Extract and print the response
|
124 |
+
response = output["choices"][0]["text"].strip()
|
125 |
+
print("\n--- Response ---")
|
126 |
+
print(response)
|
127 |
+
```
|
128 |
|
129 |
**Citation 📌**
|
130 |
````
|