Question Answering
Transformers
Safetensors
English
doge
text-generation
custom_code

Doge 160M Reason Distill

SmallDoge

Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by SmallDoge community, for detailed algorithm and model architecture, please refer to Wonderful Matrices, all training details and code are publicly available on the small-doge repository.

Uses

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer

tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-160M-Reason-Distill")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-160M-Reason-Distill", trust_remote_code=True)

generation_config = GenerationConfig(
      max_new_tokens=100, 
      use_cache=True, 
      do_sample=True, 
      temperature=0.8, 
      top_p=0.9,
      repetition_penalty=1.0
)
steamer = TextStreamer(
      tokenizer=tokenizer, 
      skip_prompt=True
)

system_prompt = """
Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:
""".strip()
prompt = "Which number is bigger, 3.9 or 3.11?"
conversation = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(
    conversation=conversation,
    tokenize=True,
    return_tensors="pt",
)

outputs = model.generate(
    inputs, 
    tokenizer=tokenizer,
    generation_config=generation_config, 
    streamer=steamer
)

Model Details

We build the Doge-Reason-Distill by SFT on Reason-Distill.

TODO: The larger model is under training and will be uploaded soon.

SFT:

Model Training Data Epochs Content Length LR Batch Size Precision
Doge-160M-Reason-Distil SmallDoge/Reason-Distill 2 4096 4e-4 0.5M bfloat16

Procedure:

SFT: Visualize in Weights & Biases

Environment:

  • Image: nvcr.io/nvidia/pytorch:24.12-py3
  • Hardware: 1x NVIDIA RTX 4090
  • Software: Transformers, TRL

Citation

@misc{shi2024wonderfulmatrices,
      title={Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture}, 
      author={Jingze Shi and Bingheng Wu},
      year={2024},
      eprint={2412.11834},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2412.11834}, 
}
Downloads last month
632
Safetensors
Model size
153M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Model tree for SmallDoge/Doge-160M-Reason-Distill

Finetuned
(3)
this model

Datasets used to train SmallDoge/Doge-160M-Reason-Distill

Collection including SmallDoge/Doge-160M-Reason-Distill