Qwen2.5-7B-Instruct-R1-forfinance

模型简介 / Model Description

Qwen2.5-7B-Instruct-R1-forfinance 是一个专门针对金融领域进行微调的大语言模型。该模型基于 Qwen2.5-7B-Instruct 进行全量微调，结合了开源金融问答数据集和高质量的思维链推理数据。

Qwen2.5-7B-Instruct-R1-forfinance is a large language model specifically fine-tuned for the financial domain. This model is based on Qwen2.5-7B-Instruct with full parameter fine-tuning, combining open-source financial Q&A datasets with high-quality chain-of-thought reasoning data.

数据集 / Training Data

数据来源 / Data Sources

开源金融问答数据集 / Open-source financial Q&A datasets
DeepSeek-R1 生成的思维链数据 / Chain-of-thought data generated by DeepSeek-R1
- 使用 DeepSeek-R1 进行推理生成思维链数据 / Use DeepSeek-R1 for inference to generate chain-of-thought data
- 通过 GPT-5 对生成的回答进行质量评分 / Quality scoring of generated responses using GPT-5
- 筛选高质量回答作为训练数据 / Select high-quality responses as training data

数据内容 / Data Content

基础金融知识问答 / Basic financial knowledge Q&A
金融计算题 / Financial calculation problems
金融概念解释 / Financial concept explanations
思维链推理 / Chain-of-thought reasoning

数据质量控制：使用 GPT-5 对 DeepSeek-R1 的回答进行评分，只选择高质量的回答作为 SFT 训练数据。

Quality control: GPT-5 was used to score DeepSeek-R1's responses, and only high-quality answers were selected as SFT training data.

训练详情 / Training Details

基础模型 / Base Model

模型 / Model: Qwen2.5-7B-Instruct
微调方式 / Fine-tuning Method: 全量微调 (Full Fine-tuning)
训练类型 / Training Type: 监督微调 (Supervised Fine-Tuning, SFT)

训练环境 / Training Environment

硬件 / Hardware: 8 × NVIDIA A100 GPU
分布式训练 / Distributed Training: 多GPU并行训练 (Multi-GPU parallel training)

训练超参数 / Training Hyperparameters

学习率 / Learning Rate: 1e-05
训练批次大小 / Train Batch Size: 1
评估批次大小 / Eval Batch Size: 8
随机种子 / Seed: 42
分布式类型 / Distributed Type: multi-GPU
设备数量 / Number of Devices: 8
梯度累积步数 / Gradient Accumulation Steps: 16
总训练批次大小 / Total Train Batch Size: 128
总评估批次大小 / Total Eval Batch Size: 64
优化器 / Optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08)
学习率调度器 / LR Scheduler: Linear
预热比例 / Warmup Ratio: 0.03
训练轮数 / Epochs: 2.0

训练结果 / Training Results

最终训练损失 / Final Training Loss: 0.7332
训练步数 / Training Steps: 312
训练时长 / Training Runtime: 6450.97 秒 (seconds)
训练样本处理速度 / Samples per Second: 6.168
训练步骤处理速度 / Steps per Second: 0.048

快速开始 / Quick Start

模型推理 / Model Inference

我们提供了一个简单的推理脚本 inference.py，可以直接使用模型进行金融问答。

We provide a simple inference script inference.py for direct financial Q&A using the model.

使用方法 / Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# 使用你本地的检查点路径 / Use your local checkpoint path
model_path = "/root/Qwen2.5-7B-Instruct-R1-forfinance/"

# 加载模型和分词器 / Load model and tokenizer
print("正在加载模型... / Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,  # 根据config.json中的torch_dtype
    device_map="auto",
    trust_remote_code=True  # 如果需要的话
)

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

print("模型加载完成！/ Model loaded successfully!")

# 准备输入 / Prepare input
prompt = "假设你是一位金融行业专家，请回答下列问题。\n在宏观分析中，描述在既定利率水平下产品市场达到均衡状态的曲线是什么？\n请一步步思考。"

messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

# 应用聊天模板 / Apply chat template
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# 编码输入 / Encode input
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 生成回答 / Generate response
print("正在生成回答... / Generating response...")
with torch.no_grad():  # 节省显存 / Save GPU memory
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=2048,
        do_sample=True,
        temperature=0.7,
        top_p=0.8,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id
    )

# 解码生成的tokens / Decode generated tokens
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

# 输出结果 / Output result
print("模型回答 / Model Response:")
print(response)

运行推理脚本 / Run Inference Script

# 确保模型路径正确 / Ensure the model path is correct
python inference.py

环境要求 / Requirements

Python: ≥ 3.8
PyTorch: ≥ 2.0
Transformers: ≥ 4.55.0
GPU: 建议使用 NVIDIA GPU with CUDA support
显存 / GPU Memory: 建议 ≥ 16GB (推荐 24GB+)

后续计划 / Future Plans

强化学习训练 / Reinforcement Learning Training

计划使用 GRPO (Group Relative Policy Optimization) 进行强化学习训练 / Plan to use GRPO for reinforcement learning training
进一步提升模型在金融领域的表现和安全性 / Further improve model performance and safety in the financial domain

We plan to conduct reinforcement learning training using GRPO (Group Relative Policy Optimization) to further improve the model's performance and safety in the financial domain.

使用场景 / Use Cases

金融知识问答 / Financial knowledge Q&A
金融计算和分析 / Financial calculations and analysis
投资建议咨询 / Investment advice consultation
金融概念解释 / Financial concept explanations
风险评估 / Risk assessment

限制和注意事项 / Limitations and Disclaimers

⚠️ 重要提醒 / Important Notice:

本模型仅供学习和研究使用，不构成投资建议 / This model is for educational and research purposes only and does not constitute investment advice
在实际应用中请谨慎使用，并结合专业判断 / Please use with caution in practical applications and combine with professional judgment
模型可能存在幻觉和错误，请进行事实核查 / The model may have hallucinations and errors, please fact-check the outputs

⚠️ This model is for educational and research purposes only and does not constitute investment advice. Please use with caution in practical applications and combine with professional judgment. The model may have hallucinations and errors, please fact-check the outputs.

技术框架版本 / Framework Versions

Transformers: 4.55.0
PyTorch: 2.6.0+cu124
Datasets: 3.6.0
Tokenizers: 0.21.1

abocide
/

Qwen2.5-7B-Instruct-R1-forfinance