---
library_name: transformers
tags: []
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->


## Model Details

```
#!/bin/bash
#SBATCH --job-name="fintune"
#SBATCH  --account=bckr-dtai-gh
#SBATCH --partition=ghx4
#SBATCH --nodes=1
#SBATCH --gpus-per-node=4
#SBATCH --tasks=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=512g
#SBATCH --time=23:59:00
#SBATCH --output="run.log"
#SBATCH --error="run.err"

set -e
export WANDB_API_KEY='1b2611814911cad498235f1ccb1a2e182638bd62'

# set up exp1 or exp3!!!!!
# launch this script after bilevel weighting and preparing data
# this script is for exp1 and exp3

# 1. finetune on bilevel and baseline

CUDA_VISIBLE=0,1

hf_ds=pxyyy/NuminaMath-CoT-smp10k
hf_val_ds=pxyyy/NuminaMath-CoT-smp10k
model_and_tok=Qwen/Qwen2.5-7B
# conv_template=llama3
conv_template=qwen2_5_math

hf_ds_str=$(echo ${hf_ds}|sed 's/\//-/g')
tmp_data_dir=./tmp_data/${hf_ds_str}/
val_data_dir=./tmp_data/${hf_ds_str}_val/
mkdir -p ${tmp_data_dir}
mkdir -p ${val_data_dir}
python3 hf2lmflow.py --ds_name ${hf_ds} --save ${tmp_data_dir}/data.json --split train
python3 hf2lmflow.py --ds_name ${hf_val_ds} --save ${val_data_dir}/data.json --split test

model_str=$(echo ${model_and_tok}|sed 's/\//-/g')

gradient_accumulation_steps=4
per_device_train_batch_size=8
epoch=1
project_dir=/u/xpan2/projects/mp-llm/MATH/finetune
for lr in 2e-5
do
    # Finetune
    exp_id=finetune-${model_str}-${hf_ds_str}-${epoch}-$lr
    # project_dir=$(cd "$(dirname $0)"; pwd)
    log_dir=${project_dir}/log/${exp_id}
    output_dir=${project_dir}/output_models/${exp_id}
    
    echo $exp_id
    
    mkdir -p ${output_dir} ${log_dir}

    export TRANSFORMERS_VERBOSITY=info

    deepspeed --master_port=7964 --include=localhost:${CUDA_VISIBLE} finetune.py \
        --model_name_or_path ${model_and_tok} \
        --trust_remote_code 1 \
        --dataset_path ${tmp_data_dir}/ \
        --eval_dataset_path ${val_data_dir}/ \
        --output_dir ${output_dir} --overwrite_output_dir \
        --conversation_template ${conv_template} \
        --num_train_epochs $epoch \
        --learning_rate $lr \
        --disable_group_texts 1 \
        --block_size 512 \
        --per_device_train_batch_size ${per_device_train_batch_size} \
        --per_device_eval_batch_size 1 \
        --bf16 \
        --deepspeed configs/ds_config_zero3_no_offload.json \
        --torch_dtype bfloat16 \
        --run_name ${exp_id} \
        --optim adamw_torch_fused \
        --logging_steps 1 \
        --do_train \
        --do_eval \
        --ddp_timeout 72000 \
        --save_total_limit 1 \
        --load_best_model_at_end False \
        --eval_steps 10 \
        --save_only_model \
        --evaluation_strategy "steps" \
        --dataloader_num_workers 1 \
        --lr_scheduler_type cosine \
        --warmup_ratio 0.03 \
        --gradient_checkpointing True \
        --use_flash_attention 1 \
        --gradient_accumulation_steps ${gradient_accumulation_steps} \
        | tee ${log_dir}/train.log \
        2> ${log_dir}/train.err
done

```

https://wandb.ai/llm_infoscore/huggingface/runs/58ronlvj

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

[More Information Needed]


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary


## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

[More Information Needed]

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]