metadata

library_name: transformers
tags:
  - mistral
  - instruct
  - quantization
  - 4bit
  - bitsandbytes
  - causal-lm

4bit Quantized Model: Mistral-Nemo-Instruct-2407

This is a 4bit quantized variant of mistralai/Mistral-Nemo-Instruct-2407, optimized to reduce memory footprint and accelerate inference while maintaining high output similarity.

Overview

This checkpoint was quantized using BitsAndBytes and evaluated with standard text similarity metrics.

Model Architecture

Attribute	Value
Model class	MistralForCausalLM
Number of parameters	6,795,187,200
Hidden size	5120
Number of layers	40
Attention heads	32
Vocabulary size	131072
Compute dtype	torch.bfloat16

Quantization Configuration

The following configuration dictionary was used during quantization:

{'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>, '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'fp4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}

Intended Use

Research and experimentation.
Instruction-following tasks in resource-constrained environments.
Demonstrations of quantized model capabilities.

Limitations

May reproduce biases from the original model.
Quantization may reduce generation diversity and factual accuracy.
Not intended for production without additional evaluation.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PJEDeveloper/Mistral-Nemo-Instruct-2407-4bit-20250714_213418")
model = AutoModelForCausalLM.from_pretrained("PJEDeveloper/Mistral-Nemo-Instruct-2407-4bit-20250714_213418", device_map="auto")

prompt = "Explain the concept of reinforcement learning."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Prompt Used for Evaluation

Summarize the Apollo 11 mission in five sentences. Use clear, factual language, name the astronauts, state the date of the Moon landing, and describe the mission sequence from launch to return.

Reference Output (Full-weight Model)

On July 16, 1969, NASA launched Apollo 11 from Kennedy Space Center, carrying astronauts Neil Armstrong, Buzz Aldrin, and Michael Collins. After a successful lunar orbit insertion, Armstrong and Aldrin descended to the moon's surface on July 20 in the lunar module, named 'Eagle'. Armstrong became the first human to step onto the moon at 02:56 UTC, followed by Aldrin approximately 20 minutes later. After collecting samples and planting the American flag, they spent about two and a half hours exploring the moon's surface before rendezvousing with Collins in lunar orbit and returning to Earth on July 24.

Quantized Model Output

On July 16, 1969, NASA launched the Apollo 11 mission from Kennedy Space Center in Florida, carrying astronauts Neil Armstrong, Buzz Aldrin, and Michael Collins. After a successful lunar orbit insertion, Armstrong and Aldrin descended to the lunar surface in the Lunar Module (LM) named "Eagle" on July 20, while Collins remained in lunar orbit aboard the Command Module (CM) "Columbia". Armstrong became the first person to step on the Moon at 02:56 UTC, followed by Aldrin about 20 minutes later. After spending approximately two and a half hours outside the spacecraft, collecting lunar samples, and deploying experiments, Armstrong and Aldrin lifted off from the lunar surface and rejoined Collins in lunar orbit. The three astronauts then returned to Earth and splashed down in the Pacific Ocean on July 24, successfully completing the first manned mission to the Moon.

Evaluation Metrics

Metric	Value
ROUGE-L F1	0.6307
BLEU	0.3282
Cosine Similarity	0.959
BERTScore F1	0.7098

Higher ROUGE and BLEU scores indicate closer alignment with the original output.

Interpretation: The quantized model output maintains substantial similarity to the full-weight model.

Generation Settings

This model produces best results when generated with:

temperature: 0.3
top_p: 0.9

Model Files Metadata

Filename	Size (bytes)	SHA-256
`quant_config.txt`	446	`f7a08f6dc4b46a4803dce152c536ceed2ee802755840db11231fb5a895b2e022`

Notes

Produced on 2025-07-14T21:39:55.939163.
Quantized automatically using BitsAndBytes.

Intended primarily for research and experimentation.

Citation

Mistralai/Mistral-Nemo-Instruct2407

Mistral Nemo Announcement

License

This model is distributed under the Apache 2.0 license, consistent with the original Mistral-Nemo-Instruct-2407.

Model Card Authors

This quantized model was prepared by PJEDeveloper.