h4shy commited on
Commit
1f4c44a
·
verified ·
1 Parent(s): 2c393c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -4,4 +4,13 @@ base_model:
4
  - google/gemma-3-1b-it
5
  base_model_relation: quantized
6
  pipeline_tag: text-generation
7
- ---
 
 
 
 
 
 
 
 
 
 
4
  - google/gemma-3-1b-it
5
  base_model_relation: quantized
6
  pipeline_tag: text-generation
7
+ ---
8
+ This version is quantized by h4shy for the purpose of production usage on old and/or cheap hardware and CPU-only setups, the goal here is to achieve an inference-ready setup aiming for production use with considerable resource constrains, these particular quantization choices will help inferences with medium to heavy CPU constrains and low to medium RAM constrains, as well as reservations for production efficiency.
9
+
10
+ Q5_0: Medium to fast inference, optimal RAM usage.
11
+ Q8_0: More inference speed, more RAM usage.
12
+
13
+ Evaluations and precise research coming soon.
14
+
15
+ Original model: [gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
16
+ Software used for quantization: [llama.cpp](https://github.com/ggml-org/llama.cpp)