tarun7r commited on
Commit
912cf75
·
verified ·
1 Parent(s): 965a974

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -55,6 +55,76 @@ To run the Q4_K_M quantized version (smaller and faster, with a slight trade-off
55
  ollama run martain7r/finance-llama-8b:q4_k_m
56
  ```
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  **Citation 📌**
60
  ````
 
55
  ollama run martain7r/finance-llama-8b:q4_k_m
56
  ```
57
 
58
+ To run the Finance-Llama-8B-q4_k_m-GGUF quantized model, you'll use llama.cpp via the llama-cpp-python library instead of Hugging Face Transformers. Here's the step-by-step solution:
59
+
60
+ 1. Install Required Libraries
61
+ ```bash
62
+ pip install llama-cpp-python huggingface-hub
63
+ ```
64
+
65
+ 2. Download the GGUF Model
66
+ Use the Hugging Face Hub to download the quantized model file:
67
+
68
+ ```bash
69
+ from huggingface_hub import hf_hub_download
70
+
71
+ model_name = "tarun7r/Finance-Llama-8B-q4_k_m-GGUF" # Check for the correct repository
72
+ model_file = "Finance-Llama-8B-GGUF-q4_K_M.gguf" # Exact GGUF filename
73
+
74
+ model_path = hf_hub_download(
75
+ repo_id=model_name,
76
+ filename=model_file,
77
+ local_dir="./models"
78
+ )
79
+ ```
80
+
81
+ 3. Run the Quantized Model
82
+
83
+ ```bash
84
+ from llama_cpp import Llama
85
+
86
+ # Initialize the model
87
+ llm = Llama(
88
+ model_path=model_path,
89
+ n_ctx=8192, # Context window size
90
+ n_threads=8, # CPU threads for inference
91
+ n_gpu_layers=-1, # Offload all layers to GPU
92
+ verbose=False # Disable verbose logging
93
+ )
94
+
95
+ # Define the prompt template
96
+ finance_prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
97
+ ### Instruction:
98
+ {instruction}
99
+ ### Input:
100
+ {input}
101
+ ### Response:
102
+ """
103
+
104
+ # Format the prompt
105
+ system_message = "You are a highly knowledgeable finance chatbot. Your purpose is to provide accurate, insightful, and actionable financial advice."
106
+ user_question = "What strategies can an individual investor use to diversify their portfolio effectively in a volatile market?"
107
+
108
+ prompt = finance_prompt_template.format(
109
+ instruction=system_message,
110
+ input=user_question
111
+ )
112
+
113
+ # Generate response
114
+ output = llm(
115
+ prompt,
116
+ max_tokens=2500, # Limit response length
117
+ temperature=0.7, # Creativity control
118
+ top_p=0.9, # Nucleus sampling
119
+ echo=False, # Return only the completion (not prompt)
120
+ stop=["###"] # Stop at "###" to avoid extra text
121
+ )
122
+
123
+ # Extract and print the response
124
+ response = output["choices"][0]["text"].strip()
125
+ print("\n--- Response ---")
126
+ print(response)
127
+ ```
128
 
129
  **Citation 📌**
130
  ````