4bit Quantized Model: Mistral-7B-Instruct-v0.3

This is a 4bit quantized variant of mistralai/Mistral-7B-Instruct-v0.3, optimized to reduce memory footprint and accelerate inference while maintaining high output similarity.

Overview

Mistral-7B-Instruct-v0.3 is an instruction fine-tuned model derived from Mistral-7B-v0.3, featuring:

  • An extended 32,768 token vocabulary.
  • Support for v3 tokenizer.
  • Built-in function calling capabilities.

This quantized checkpoint was produced with BitsAndBytes and evaluated using standard text similarity metrics.


Model Architecture

Attribute Value
Model class MistralForCausalLM
Number of parameters 3,758,362,624
Hidden size 4096
Number of layers 32
Attention heads 32
Vocabulary size 32768
Compute dtype torch.bfloat16

Quantization Configuration

The following configuration dictionary was used during quantization:

{'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>, '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'fp4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}

Intended Use

  • Research and experimentation with instruction-following tasks.
  • Demonstrations of quantized model capabilities in resource-constrained environments.
  • Prototyping workflows requiring extended vocabulary and function calling support (v3 tokenizer).

Limitations

  • May reproduce biases and factual inaccuracies present in the original model.
  • This instruct variant does not include any moderation or safety guardrails by default.
  • Quantization can reduce generation diversity and precision.
  • Not intended for production without thorough evaluation and alignment testing.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PJEDeveloper/Mistral-7B-Instruct-v0.3-4bit-20250716_003938")
model = AutoModelForCausalLM.from_pretrained("PJEDeveloper/Mistral-7B-Instruct-v0.3-4bit-20250716_003938", device_map="auto")

prompt = "Explain the concept of reinforcement learning."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Function Calling

For function calling workflows, please see the Transformers Function Calling Guide and the original Mistral examples.


Prompt Used for Evaluation

You are an expert assistant helping a user plan a themed event.

Please complete the following tasks:

Event Summary: Write a concise paragraph (3-5 sentences) describing a steampunk-themed anniversary party in a unique style, incorporating at least two uncommon English words from the extended vocabulary (for example, pulchritudinous, vicissitude, susurration, ephemeral, etc.).

Guest List JSON: Create a JSON array listing five fictional guests, each with the following fields:

"name": A full name.

"role": e.g., "Inventor", "Airship Captain".

"special_item": A unique steampunk artifact.

Checklist Function Call: Format the following checklist as a function call named create_event_checklist with a single argument items, which should be a list of strings. The checklist items are:

"Order brass goggles"

"Prepare mechanical owl centerpiece"

"Send invitations"

"Arrange clockwork music ensemble"

"Book airship docking slot"

Respond in three sections labeled Event Summary, Guest List JSON, and Checklist Function Call.

Make sure all JSON is properly formatted and the function call syntax is valid.

Reference Output (Full-weight Model)

Event Summary:
Amidst the ephemeral whimsy of a gaslit evening, the anniversary of our esteemed society shall be celebrated in a grand Steampunk extravaganza. The Victorian-esque manor, adorned with intricate brass machinery and gears, shall echo with the susurration of time's relentless march, as we revel in the pulchritudinous splendor of our pulchritudinous past.

Guest List JSON:

json

[
  {
    "name": "Professor Archibald F. Brassbolt III",
    "role": "Inventor",
    "special_item": "Aether-infused Time-Turner"
  },
  {
    "name": "Captain Amelia Cogsworth",
    "role": "Airship Captain",
    "special_item": "The Leviathan - a clockwork airship"
  },
  {
    "name": "Lady Isabella Gearsby",
    "role": "Society Matron",
    "special_item": "A diamond-encrusted Steam-Pendulum"
  },
  {
    "name": "Sir Percival Automaton",
    "role": "Mechanical Knight",
    "special_item": "A steam-powered automaton suit"
  },
  {
    "name": "Dr. Eleanor Clockwork",
    "role": "Mad Scientist",
    "special_item": "A living, breathing steam-powered rose"
  }
]
Checklist Function Call:

python

def create_event_checklist(items):
    return "\n".join(items)

checklist = create_event_checklist(["Order brass goggles", "Prepare mechanical owl centerpiece", "Send invitations", "Arrange clockwork music ensemble", "Book airship docking slot"])
With this checklist, we can ensure that our Steampunk anniversary party will be an unforgettable event, filled with the marvels of a bygone era.

Quantized Model Output

---

Event Summary:
In the heart of the industrial ether, where the air is thick with the scent of brass and steam, a grand celebration of marital vicissitude shall unfold. The ephemeral spectacle of a steampunk anniversary party shall be a veritable cornucopia of pulchritudinous contraptions, where the susurration of gears and the hiss of steam shall serenade the guests in a symphony of mechanical harmony.

Guest List JSON:

[ { "name": "Professor Abigail Edison-Smythe", "role": "Inventor", "special_item": "Aether-infused Tesla Coil" }, { "name": "Captain Amelia Brassbright", "role": "Airship Captain", "special_item": "Steampowered Parasol" }, { "name": "Lord Percival Cogsworth", "role": "Steampunk Poet", "special_item": "Mechanical Quill and Inkwell" }, { "name": "Miss Isabella Gearheart", "role": "Steampunk Fashion Designer", "special_item": "Steam-powered Dress with Built-in Fan" }, { "name": "Sir Archibald Clockwork", "role": "Clockwork Mechanic", "special_item": "Mechanical Hand with Built-in Compass" } ]


Checklist Function Call:

def create_event_checklist(items): return "\n".join(items)

create_event_checklist(["Order brass goggles", "Prepare mechanical owl centerpiece", "Send invitations", "Arrange clockwork music ensemble", "Book airship docking slot"])


Evaluation Metrics

Metric Value
ROUGE-L F1 0.4581
BLEU 0.2442
Cosine Similarity 0.9141
BERTScore F1 0.6955
  • Higher ROUGE and BLEU scores indicate closer alignment with the original output.

Interpretation: The quantized model output exhibits moderate similarity to the full-weight model.

Warning: The quantized output has 3 sentences, while the reference has 6. This may indicate structural divergence.

Generation Settings

This model produces best results when generated with:

max_new_tokens=1024,
do_sample=False,
temperature=0.3,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id

Model Files Metadata

Filename Size (bytes) SHA-256
quant_config.txt 446 f7a08f6dc4b46a4803dce152c536ceed2ee802755840db11231fb5a895b2e022

Notes

  • Produced on 2025-07-16T00:43:52.476070.
  • Quantized automatically using BitsAndBytes.
  • Base model: mistralai/Mistral-7B-Instruct-v0.3 with extended 32,768-token vocabulary and function calling capabilities.

Intended primarily for research and experimentation.

Citation

Mistralai/Mistral-7B-Instruct-v0.3

Mistral 7B Announcement

License

This model is distributed under the Apache 2.0 license, consistent with the original Mistral-7B-Instruct-v0.3.

Model Card Authors

This quantized model was prepared by PJEDeveloper.

Downloads last month
5
Safetensors
Model size
3.87B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support