--- license: other license_name: qwen license_link: https://huggingface.co/skt/A.X-4.0/blob/main/LICENSE language: - en - ko pipeline_tag: text-generation library_name: transformers model_id: skt/A.X-4.0 developers: SKT AI Model Lab model-index: - name: A.X-4.0 results: - task: type: generate_until name: mmlu dataset: name: mmlu (chat CoT) type: hails/mmlu_no_train metrics: - type: exact_match value: 86.62 name: exact_match - task: type: generate_until name: kmmlu dataset: name: kmmlu (chat CoT) type: HAERAE-HUB/KMMLU metrics: - type: exact_match value: 78.32 name: exact_match --- # A.X 4.0

🤗 Models | 💬 Chat | 📬 APIs (FREE!) | 🖥️ Github

## A.X 4.0 Family Highlights SK Telecom released **A.X 4.0** (pronounced "A dot X"), a large language model (LLM) optimized for Korean-language understanding and enterprise deployment, on July 03, 2025. Built on the open-source [Qwen2.5](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) model, A.X 4.0 has been further trained with large-scale Korean datasets to deliver outstanding performance in real-world business environments. - **Superior Korean Proficiency**: Achieved a score of 78.3 on [KMMLU](https://huggingface.co/datasets/HAERAE-HUB/KMMLU), the leading benchmark for Korean-language evaluation and a Korean-specific adaptation of MMLU, outperforming GPT-4o (72.5). - **Deep Cultural Understanding**: Scored 83.5 on [CLIcK](https://huggingface.co/datasets/EunsuKim/CLIcK), a benchmark for Korean cultural and contextual comprehension, surpassing GPT-4o (80.2). - **Efficient Token Usage**: A.X 4.0 uses approximately 33% fewer tokens than GPT-4o for the same Korean input, enabling more cost-effective and efficient processing. - **Deployment Flexibility**: Offered in both a 72B-parameter standard model (A.X 4.0) and a 7B lightweight version (A.X 4.0 Light). - **Long Context Handling**: Supports up to 131,072 tokens, allowing comprehension of lengthy documents and conversations. (Lightweight model supports up to 16,384 tokens length) ## Performance ### Model Performance

Benchmarks		A.X 4.0	Qwen3-235B-A22B (w/o reasoning)	Qwen2.5-72B	GPT-4o
Knowledge	KMMLU	78.32	73.64	66.44	72.51
	CLIcK	83.51	74.55	72.59	80.22
	KoBALT	47.30	41.57	37.00	44.00
	MMLU	86.62	87.37	85.70	88.70
General	Ko-MT-Bench	86.69	88.00	82.69	88.44
	MT-Bench	83.25	86.56	93.50	88.19
	LiveBench^2024.11	52.30	64.50	54.20	52.19
Instruction Following	Ko-IFEval	77.96	77.53	77.07	75.38
Instruction Following	IFEval	86.05	85.77	86.54	83.86
Math	HRM8K	48.55	54.52	46.37	43.27
Math	MATH	74.28	72.72	77.00	72.38
Code	HumanEval+	79.27	79.27	81.71	86.00
	MBPP+	73.28	70.11	75.66	75.10
	LiveCodeBench^{2024.10~2025.04}	26.07	33.09	27.58	29.30
Long Context	LongBench^<128K	56.70	49.40	45.60	47.50
Tool-use	FunctionChatBench	85.96	82.43	88.30	95.70

### Lightweight Model Performance

Benchmarks		A.X 4.0 Light	Qwen3-8B (w/o reasoning)	Qwen2.5-7B	EXAONE-3.5-7.8B	Kanana-1.5-8B
Knowledge	KMMLU	64.15	63.53	49.56	53.76	48.28
	CLIcK	68.05	62.71	60.56	64.30	61.30
	KoBALT	30.29	26.57	21.57	21.71	23.14
	MMLU	75.43	82.89	75.40	72.20	68.82
General	Ko-MT-Bench	79.50	64.06	61.31	81.06	76.30
	MT-Bench	81.56	65.69	79.37	83.50	77.60
	LiveBench	37.10	50.20	37.00	40.20	29.40
Instruction Following	Ko-IFEval	72.99	73.39	60.73	65.01	69.96
Instruction Following	IFEval	84.68	85.38	76.73	82.61	80.11
Math	HRM8K	40.12	52.50	35.13	31.88	30.87
Math	MATH	68.88	71.48	65.58	63.20	59.28
Code	HumanEval+	75.61	77.44	74.39	76.83	76.83
	MBPP+	67.20	62.17	68.50	64.29	67.99
	LiveCodeBench	18.03	23.93	16.62	17.98	16.52

## 🚀 Quickstart ### with HuggingFace Transformers - `transformers>=4.46.0` or the latest version is required to use `skt/A.X-4.0` ```bash pip install transformers>=4.46.0 ``` #### Example Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "skt/A.X-4.0" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", ) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_name) messages = [ {"role": "system", "content": "당신은 사용자가 제공하는 영어 문장들을 한국어로 번역하는 AI 전문가입니다."}, {"role": "user", "content": "The first human went into space and orbited the Earth on April 12, 1961."}, ] input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate( input_ids, max_new_tokens=128, do_sample=False, ) len_input_prompt = len(input_ids[0]) response = tokenizer.decode(output[0][len_input_prompt:], skip_special_tokens=True) print(response) # Output: # 최초의 인간이 1961년 4월 12일에 우주로 가서 지구 궤도를 돌았습니다. ``` ### with vLLM - `vllm>=v0.6.4.post1` or the latest version is required to use tool-use function ```bash pip install vllm>=v0.6.4.post1 # if you don't want to activate tool-use function, just commenting out below vLLM option VLLM_OPTION="--enable-auto-tool-choice --tool-call-parser hermes" vllm serve skt/A.X-4.0 $VLLM_OPTION ``` #### Example Usage ```python from openai import OpenAI def call(messages, model): completion = client.chat.completions.create( model=model, messages=messages, ) print(completion.choices[0].message) client = OpenAI( base_url="http://localhost:8000/v1", api_key="api_key" ) model = "skt/A.X-4.0" messages = [{"role": "user", "content": "에어컨 여름철 적정 온도는? 한줄로 답변해줘"}] call(messages, model) # Output: # ChatCompletionMessage(content='여름철 에어컨 적정 온도는 24~26도입니다.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], reasoning_content=None) messages = [{"role": "user", "content": "What is the appropriate temperature for air conditioning in summer? Response in a single sentence."}] call(messages, model) # Output: # ChatCompletionMessage(content='A comfortable and energy-efficient temperature for air conditioning in summer is typically around 24-26°C (75-79°F).', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], reasoning_content=None) ``` #### Examples for tool-use ```python from openai import OpenAI def call(messages, model): completion = client.chat.completions.create( model=model, messages=messages, tools=tools ) print(completion.choices[0].message) client = OpenAI( base_url="http://localhost:8000/v1", api_key="api_key" ) model = "skt/A.X-4.0" calculate_discount = { "type": "function", "function": { "name": "calculate_discount", "description": "원가격과 할인율(퍼센트 단위)을 입력받아 할인된 가격을계산한다.", "parameters": { "type": "object", "properties": { "original_price": { "type": "number", "description": "상품의 원래 가격" }, "discount_percentage": { "type": "number", "description": "적용할 할인율(예: 20% 할인의 경우 20을 입력)" } }, "required": ["original_price", "discount_percentage"] } } } get_exchange_rate = { "type": "function", "function": { "name": "get_exchange_rate", "description": "두 통화 간의 환율을 가져온다.", "parameters": { "type": "object", "properties": { "base_currency": { "type": "string", "description": "The currency to convert from." }, "target_currency": { "type": "string", "description": "The currency to convert to." } }, "required": ["base_currency", "target_currency"] } } } tools = [calculate_discount, get_exchange_rate] ### Slot filling ### messages = [{"role": "user", "content": "우리가 뭘 사야되는데 원래 57600원인데 직원할인 받을 수 있거든? 할인가좀 계산해줘"}] call(messages, model) # Output: # ChatCompletionMessage(content='직원 할인을 몇 퍼센트 받을 수 있는지 알려주시겠어요?', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], reasoning_content=None) ### Function calling ### messages = [ {"role": "user", "content": "우리가 뭘 사야되는데 원래 57600원인데 직원할인 받을 수 있거든? 할인가좀 계산해줘"}, {"role": "assistant", "content": "직원 할인을 몇 퍼센트 받을 수 있는지 알려주시겠어요?"}, {"role": "user", "content": "15% 할인 받을 수 있어."}, ] call(messages, model) # Output: # ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-8634f423e5494ebfa7a428b8385b4981', function=Function(arguments='{"original_price": 57600, "discount_percentage": 15}', name='calculate_discount'), type='function')], reasoning_content=None) ### Completion ### messages = [ {"role": "user", "content": "우리가 뭘 사야되는데 원래 57600원인데 직원할인 받을 수 있거든? 할인가좀 계산해줘"}, {"role": "assistant", "content": "직원 할인을 몇 퍼센트 받을 수 있는지 알려주시겠어요?"}, {"role": "user", "content": "15% 할인 받을 수 있어."}, {"role": "tool", "tool_call_id": "random_id", "name": "calculate_discount", "content": "{\"original_price\": 57600, \"discount_percentage\": 15, \"discounted_price\": 48960.0}"} ] call(messages, model) # Output: # ChatCompletionMessage(content='15% 할인을 적용하면 상품의 가격은 48,960원이 됩니다.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], reasoning_content=None) ``` ## Citation ``` @article{SKTAdotX4, title={A.X 4.0}, author={SKT AI Model Lab}, year={2025}, url={https://huggingface.co/skt/A.X-4.0} } ``` ## Contact - Business & Partnership Contact: [a.x@sk.com](a.x@sk.com)