JacopoAbate commited on
Commit
3209b0d
·
verified ·
1 Parent(s): d59e948

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md CHANGED
@@ -1,3 +1,114 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - it
5
+ - en
6
+ library_name: transformers
7
+ tags:
8
+ - sft
9
+ - it
10
+ - mistral
11
+ - chatml
12
  ---
13
+
14
+ # Model Information
15
+
16
+ VolareQuantized is a compact iteration of the model [Volare](https://huggingface.co/MoxoffSpA/Volare), optimized for efficiency.
17
+
18
+ It is offered in two distinct configurations: a 4-bit version and an 8-bit version, each designed to maintain the model's effectiveness while significantly reducing its size
19
+ and computational requirements.
20
+
21
+ - It's trained both on publicly available datasets, like [SQUAD-it](https://huggingface.co/datasets/squad_it), and datasets we've created in-house.
22
+ - it's designed to understand and maintain context, making it ideal for Retrieval Augmented Generation (RAG) tasks and applications requiring contextual awareness.
23
+ - It is quantized in a 4-bit version and an 8-bit version following the procedure [here](https://github.com/ggerganov/llama.cpp).
24
+
25
+ # Evaluation
26
+
27
+ We evaluated the model using the same test sets as used for the Open Ita LLM Leaderboard
28
+
29
+ | hellaswag_it acc_norm | arc_it acc_norm | m_mmlu_it 5-shot acc | Average |
30
+ |:----------------------| :--------------- | :-------------------- | :------- |
31
+ | 0.6474 | 0.4671 | da calcolare | da calcolare|
32
+
33
+ | f1 | Exact Match |
34
+ |:---| :---------- |
35
+ | 0.6982 | 0.0 |
36
+
37
+
38
+ ## Usage
39
+
40
+ You need to download the .gguf model first
41
+
42
+ If you want to use the cpu install these dependencies:
43
+
44
+ ```python
45
+ pip install llama-cpp-python huggingface_hub
46
+ ```
47
+
48
+ If you want to use the gpu instead:
49
+
50
+ ```python
51
+ CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install huggingface_hub llama-cpp-python --force-reinstall --upgrade --no-cache-dir
52
+ ```
53
+
54
+ And then use this code to see a response to the prompt.
55
+
56
+ ```python
57
+ from huggingface_hub import hf_hub_download
58
+ from llama_cpp import Llama
59
+
60
+ model_path = hf_hub_download(
61
+ repo_id="MoxoffSpA/AzzurroQuantized",
62
+ filename="Azzurro-ggml-Q4_K_M.gguf"
63
+ )
64
+
65
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
66
+ llm = Llama(
67
+ model_path=model_path,
68
+ n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
69
+ n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
70
+ n_gpu_layers=0 # The number of layers to offload to GPU, if you have GPU acceleration available
71
+ )
72
+
73
+ # Simple inference example
74
+ question = """Quanto è alta la torre di Pisa?"""
75
+ context = """
76
+ La Torre di Pisa è un campanile del XII secolo, famoso per la sua inclinazione. Alta circa 56 metri.
77
+ """
78
+
79
+ prompt = f"Domanda: {question}, contesto: {context}"
80
+
81
+ output = llm(
82
+ f"[INST] {prompt} [/INST]", # Prompt
83
+ max_tokens=128,
84
+ stop=["\n"],
85
+ echo=True,
86
+ temperature=0.1,
87
+ top_p=0.95
88
+ )
89
+
90
+ # Chat Completion API
91
+
92
+ print(output['choices'][0]['text'])
93
+
94
+ ## Bias, Risks and Limitations
95
+
96
+ VolareQuantized and its original model have not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of
97
+ responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition
98
+ of the corpus was used to train the base model, however, it is likely to have included a mix of Web data and technical sources
99
+ like books and code.
100
+
101
+ ## Links to resources
102
+
103
+ - SQUAD-it dataset: https://huggingface.co/datasets/squad_it
104
+ - Gemma-7b model: https://huggingface.co/google/gemma-7b
105
+ - Open Ita LLM Leaderbord: https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard
106
+
107
+ ## Quantized versions
108
+
109
+ We have the not quantized version here:
110
+ https://huggingface.co/MoxoffSpA/Volare
111
+
112
+ ## The Moxoff Team
113
+
114
+ Jacopo Abate, Marco D'Ambra, Luigi Simeone, Gianpaolo Francesco Trotta