Kikinoking
/

MNLP_M2_quantized_model

8-bit precision

compressed-tensors

Model card Files Files and versions Community

MNLP_M2_quantized_model / README.md

Kikinoking's picture

Upload README.md with huggingface_hub

15ee3b9 verified 3 months ago

|

history blame contribute delete

2.21 kB

	---
	tags:
	- causal-lm
	- qwen
	- fine-tuned
	- quantized
	- mnlp
	---

	# Qwen3-0.6B Full-Precision + W8A8 Quantized MCQA Model

	Repository: [Kikinoking/MNLP_M2_quantized_model](https://huggingface.co/Kikinoking/MNLP_M2_quantized_model)

	This is a fine-tuned Qwen-3-0.6B causal-LM, trained on a concatenation of multiple MCQA datasets and then quantized to 8-bit weights and activations using the compressed-tensors format. It is designed for multiple-choice QA tasks, evaluated with the LightEval EPFL MNLP suite.

	---

	## Model Details

	- Base architecture: Qwen-3 (0.6B parameters)
	- Pretrained checkpoint: `Qwen/Qwen3-0.6B-Base`
	- Fine-tuning data sources:
	- ScienceQA
	- QASC
	- OpenBookQA
	- MathQA
	- CommonsenseQA
	- MCQA prompts generated via ChatGPT (labeled `M1_chatgpt`)
	- Dataset split: 95% train / 5% validation
	- Tokenization:
	- `AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")`
	- Left padding, EOS token as pad_token
	- Sequence length capped at 2048 tokens

	---

	## Quantization

	- Method: `compressed-tensors` / `naive-quantized`
	- Precision: 8-bit weights + 8-bit activations
	- Layers kept in FP32: Language modeling head
	- Checkpoint: Compatible with CPU and GPU inference

	---

	## Evaluation

	Tested using LightEval EPFL MNLP on the MCQA task:

	```bash
	lighteval accelerate --eval-mode lighteval --save-details --override-batch-size 8 --custom-tasks community_tasks/mnlp_mcqa_evals.py --output-dir out/lighteval_quant model_configs/quantized_model.yaml "community\|mnlp_mcqa_evals\|0\|0"

	Results:

	Accuracy: 0.30 ± 0.15

	Normalized Accuracy: 0.30 ± 0.15

	How to Use

	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained(
	"Kikinoking/MNLP_M2_quantized_model", trust_remote_code=True
	)
	model = AutoModelForCausalLM.from_pretrained(
	"Kikinoking/MNLP_M2_quantized_model",
	trust_remote_code=True,
	device_map="auto",
	)

	License

	Being a 0.6B-parameter model, it may struggle with very long or ambiguous queries.

	Quantization can introduce a slight drop in accuracy (~5–10%).

	License: CC BY-NC 4.0 (inherits from the base Qwen-3 license)