4bit Quantized Model: Mistral-7B-Instruct-v0.3
This is a 4bit quantized variant of mistralai/Mistral-7B-Instruct-v0.3, optimized to reduce memory footprint and accelerate inference while maintaining high output similarity.
Overview
Mistral-7B-Instruct-v0.3 is an instruction fine-tuned model derived from Mistral-7B-v0.3, featuring:
- An extended 32,768 token vocabulary.
- Support for v3 tokenizer.
- Built-in function calling capabilities.
This quantized checkpoint was produced with BitsAndBytes and evaluated using standard text similarity metrics.
Model Architecture
Attribute | Value |
---|---|
Model class | MistralForCausalLM |
Number of parameters | 3,758,362,624 |
Hidden size | 4096 |
Number of layers | 32 |
Attention heads | 32 |
Vocabulary size | 32768 |
Compute dtype | torch.bfloat16 |
Quantization Configuration
The following configuration dictionary was used during quantization:
{'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>, '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'fp4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}
Intended Use
- Research and experimentation with instruction-following tasks.
- Demonstrations of quantized model capabilities in resource-constrained environments.
- Prototyping workflows requiring extended vocabulary and function calling support (v3 tokenizer).
Limitations
- May reproduce biases and factual inaccuracies present in the original model.
- This instruct variant does not include any moderation or safety guardrails by default.
- Quantization can reduce generation diversity and precision.
- Not intended for production without thorough evaluation and alignment testing.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("PJEDeveloper/Mistral-7B-Instruct-v0.3-4bit-20250716_003938")
model = AutoModelForCausalLM.from_pretrained("PJEDeveloper/Mistral-7B-Instruct-v0.3-4bit-20250716_003938", device_map="auto")
prompt = "Explain the concept of reinforcement learning."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Function Calling
For function calling workflows, please see the Transformers Function Calling Guide and the original Mistral examples.
Prompt Used for Evaluation
You are an expert assistant helping a user plan a themed event.
Please complete the following tasks:
Event Summary: Write a concise paragraph (3-5 sentences) describing a steampunk-themed anniversary party in a unique style, incorporating at least two uncommon English words from the extended vocabulary (for example, pulchritudinous, vicissitude, susurration, ephemeral, etc.).
Guest List JSON: Create a JSON array listing five fictional guests, each with the following fields:
"name": A full name.
"role": e.g., "Inventor", "Airship Captain".
"special_item": A unique steampunk artifact.
Checklist Function Call: Format the following checklist as a function call named create_event_checklist with a single argument items, which should be a list of strings. The checklist items are:
"Order brass goggles"
"Prepare mechanical owl centerpiece"
"Send invitations"
"Arrange clockwork music ensemble"
"Book airship docking slot"
Respond in three sections labeled Event Summary, Guest List JSON, and Checklist Function Call.
Make sure all JSON is properly formatted and the function call syntax is valid.
Reference Output (Full-weight Model)
Event Summary:
Amidst the ephemeral whimsy of a gaslit evening, the anniversary of our esteemed society shall be celebrated in a grand Steampunk extravaganza. The Victorian-esque manor, adorned with intricate brass machinery and gears, shall echo with the susurration of time's relentless march, as we revel in the pulchritudinous splendor of our pulchritudinous past.
Guest List JSON:
json
[
{
"name": "Professor Archibald F. Brassbolt III",
"role": "Inventor",
"special_item": "Aether-infused Time-Turner"
},
{
"name": "Captain Amelia Cogsworth",
"role": "Airship Captain",
"special_item": "The Leviathan - a clockwork airship"
},
{
"name": "Lady Isabella Gearsby",
"role": "Society Matron",
"special_item": "A diamond-encrusted Steam-Pendulum"
},
{
"name": "Sir Percival Automaton",
"role": "Mechanical Knight",
"special_item": "A steam-powered automaton suit"
},
{
"name": "Dr. Eleanor Clockwork",
"role": "Mad Scientist",
"special_item": "A living, breathing steam-powered rose"
}
]
Checklist Function Call:
python
def create_event_checklist(items):
return "\n".join(items)
checklist = create_event_checklist(["Order brass goggles", "Prepare mechanical owl centerpiece", "Send invitations", "Arrange clockwork music ensemble", "Book airship docking slot"])
With this checklist, we can ensure that our Steampunk anniversary party will be an unforgettable event, filled with the marvels of a bygone era.
Quantized Model Output
---
Event Summary:
In the heart of the industrial ether, where the air is thick with the scent of brass and steam, a grand celebration of marital vicissitude shall unfold. The ephemeral spectacle of a steampunk anniversary party shall be a veritable cornucopia of pulchritudinous contraptions, where the susurration of gears and the hiss of steam shall serenade the guests in a symphony of mechanical harmony.
Guest List JSON:
[ { "name": "Professor Abigail Edison-Smythe", "role": "Inventor", "special_item": "Aether-infused Tesla Coil" }, { "name": "Captain Amelia Brassbright", "role": "Airship Captain", "special_item": "Steampowered Parasol" }, { "name": "Lord Percival Cogsworth", "role": "Steampunk Poet", "special_item": "Mechanical Quill and Inkwell" }, { "name": "Miss Isabella Gearheart", "role": "Steampunk Fashion Designer", "special_item": "Steam-powered Dress with Built-in Fan" }, { "name": "Sir Archibald Clockwork", "role": "Clockwork Mechanic", "special_item": "Mechanical Hand with Built-in Compass" } ]
Checklist Function Call:
def create_event_checklist(items): return "\n".join(items)
create_event_checklist(["Order brass goggles", "Prepare mechanical owl centerpiece", "Send invitations", "Arrange clockwork music ensemble", "Book airship docking slot"])
Evaluation Metrics
Metric | Value |
---|---|
ROUGE-L F1 | 0.4581 |
BLEU | 0.2442 |
Cosine Similarity | 0.9141 |
BERTScore F1 | 0.6955 |
- Higher ROUGE and BLEU scores indicate closer alignment with the original output.
Interpretation: The quantized model output exhibits moderate similarity to the full-weight model.
Warning: The quantized output has 3 sentences, while the reference has 6. This may indicate structural divergence.
Generation Settings
This model produces best results when generated with:
max_new_tokens=1024,
do_sample=False,
temperature=0.3,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
Model Files Metadata
Filename | Size (bytes) | SHA-256 |
---|---|---|
quant_config.txt |
446 | f7a08f6dc4b46a4803dce152c536ceed2ee802755840db11231fb5a895b2e022 |
Notes
- Produced on 2025-07-16T00:43:52.476070.
- Quantized automatically using BitsAndBytes.
- Base model: mistralai/Mistral-7B-Instruct-v0.3 with extended 32,768-token vocabulary and function calling capabilities.
Intended primarily for research and experimentation.
Citation
Mistralai/Mistral-7B-Instruct-v0.3
License
This model is distributed under the Apache 2.0 license, consistent with the original Mistral-7B-Instruct-v0.3.
Model Card Authors
This quantized model was prepared by PJEDeveloper.
- Downloads last month
- 5