Model Card for ethicalabs/Kurtis-E1.1-Qwen3-4B

Kurtis E1.1 fine-tuned with flower

Eval Results

Evaluation tasks were performed with the LM Evaluation Harness on a Mac Mini M4 Pro.

mmlu

lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen3-4B  --tasks mmlu --device mps --batch_size 4

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
mmlu	2	none		acc	↑	0.6849	±	0.0037
- humanities	2	none		acc	↑	0.5951	±	0.0067
- formal_logic	1	none	0	acc	↑	0.5952	±	0.0439
- high_school_european_history	1	none	0	acc	↑	0.7879	±	0.0319
- high_school_us_history	1	none	0	acc	↑	0.8333	±	0.0262
- high_school_world_history	1	none	0	acc	↑	0.8439	±	0.0236
- international_law	1	none	0	acc	↑	0.7686	±	0.0385
- jurisprudence	1	none	0	acc	↑	0.7685	±	0.0408
- logical_fallacies	1	none	0	acc	↑	0.8037	±	0.0312
- moral_disputes	1	none	0	acc	↑	0.7081	±	0.0245
- moral_scenarios	1	none	0	acc	↑	0.3754	±	0.0162
- philosophy	1	none	0	acc	↑	0.7170	±	0.0256
- prehistory	1	none	0	acc	↑	0.7346	±	0.0246
- professional_law	1	none	0	acc	↑	0.4844	±	0.0128
- world_religions	1	none	0	acc	↑	0.7778	±	0.0319
- other	2	none		acc	↑	0.7161	±	0.0078
- business_ethics	1	none	0	acc	↑	0.7300	±	0.0446
- clinical_knowledge	1	none	0	acc	↑	0.7396	±	0.0270
- college_medicine	1	none	0	acc	↑	0.7168	±	0.0344
- global_facts	1	none	0	acc	↑	0.3300	±	0.0473
- human_aging	1	none	0	acc	↑	0.6771	±	0.0314
- management	1	none	0	acc	↑	0.8155	±	0.0384
- marketing	1	none	0	acc	↑	0.8675	±	0.0222
- medical_genetics	1	none	0	acc	↑	0.7600	±	0.0429
- miscellaneous	1	none	0	acc	↑	0.8008	±	0.0143
- nutrition	1	none	0	acc	↑	0.7255	±	0.0256
- professional_accounting	1	none	0	acc	↑	0.5390	±	0.0297
- professional_medicine	1	none	0	acc	↑	0.7390	±	0.0267
- virology	1	none	0	acc	↑	0.5000	±	0.0389
- social sciences	2	none		acc	↑	0.7813	±	0.0074
- econometrics	1	none	0	acc	↑	0.6228	±	0.0456
- high_school_geography	1	none	0	acc	↑	0.8283	±	0.0269
- high_school_government_and_politics	1	none	0	acc	↑	0.8756	±	0.0238
- high_school_macroeconomics	1	none	0	acc	↑	0.7590	±	0.0217
- high_school_microeconomics	1	none	0	acc	↑	0.8151	±	0.0252
- high_school_psychology	1	none	0	acc	↑	0.8679	±	0.0145
- human_sexuality	1	none	0	acc	↑	0.7405	±	0.0384
- professional_psychology	1	none	0	acc	↑	0.7173	±	0.0182
- public_relations	1	none	0	acc	↑	0.6818	±	0.0446
- security_studies	1	none	0	acc	↑	0.7265	±	0.0285
- sociology	1	none	0	acc	↑	0.8308	±	0.0265
- us_foreign_policy	1	none	0	acc	↑	0.8100	±	0.0394
- stem	2	none		acc	↑	0.6943	±	0.0079
- abstract_algebra	1	none	0	acc	↑	0.5700	±	0.0498
- anatomy	1	none	0	acc	↑	0.6370	±	0.0415
- astronomy	1	none	0	acc	↑	0.8092	±	0.0320
- college_biology	1	none	0	acc	↑	0.8333	±	0.0312
- college_chemistry	1	none	0	acc	↑	0.5400	±	0.0501
- college_computer_science	1	none	0	acc	↑	0.6600	±	0.0476
- college_mathematics	1	none	0	acc	↑	0.5700	±	0.0498
- college_physics	1	none	0	acc	↑	0.5784	±	0.0491
- computer_security	1	none	0	acc	↑	0.7800	±	0.0416
- conceptual_physics	1	none	0	acc	↑	0.7787	±	0.0271
- electrical_engineering	1	none	0	acc	↑	0.7586	±	0.0357
- elementary_mathematics	1	none	0	acc	↑	0.6878	±	0.0239
- high_school_biology	1	none	0	acc	↑	0.8742	±	0.0189
- high_school_chemistry	1	none	0	acc	↑	0.7192	±	0.0316
- high_school_computer_science	1	none	0	acc	↑	0.8500	±	0.0359
- high_school_mathematics	1	none	0	acc	↑	0.4741	±	0.0304
- high_school_physics	1	none	0	acc	↑	0.6225	±	0.0396
- high_school_statistics	1	none	0	acc	↑	0.7083	±	0.0310
- machine_learning	1	none	0	acc	↑	0.5268	±	0.0474

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6849	±	0.0037
- humanities	2	none	acc	↑	0.5951	±	0.0067
- other	2	none	acc	↑	0.7161	±	0.0078
- social sciences	2	none	acc	↑	0.7813	±	0.0074
- stem	2	none	acc	↑	0.6943	±	0.0079

ethicalabs
/

Kurtis-E1.1-Qwen3-4B

Model Card for ethicalabs/Kurtis-E1.1-Qwen3-4B

Eval Results

mmlu

Model tree for ethicalabs/Kurtis-E1.1-Qwen3-4B

Dataset used to train ethicalabs/Kurtis-E1.1-Qwen3-4B

Collection including ethicalabs/Kurtis-E1.1-Qwen3-4B

Kurtis E1