SafeLM-1.7B

SafeLM is a 1.7B parameter model family that is trained via Safety Pretraining. We train language models to be natively safe by incorporating safety directly into the pretraining pipeline. This is our natively safe base model. Our safety data curation involves scoring harmful content, rephrasing and contextualizing potentially harmful examples, and refusal training throughout pretraining. Please check out our paper and website for more details!

Model Details

Architecture: SmolLM2
Parameters: 1.7B

Training Configuration

optimizer:
  class_path: torch.optim.AdamW
  init_args:
    lr: 0.0005
    weight_decay: 0.01
precision: bf16-mixed
seed: 42
train:
  global_batch_size: 1024
  max_seq_length: 2048
  max_tokens: 600000000000
  micro_batch_size: 8

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("locuslab/safelm-1.7b_rephrase_refusal_moral_ed_600B")
tokenizer = AutoTokenizer.from_pretrained("locuslab/safelm-1.7b_rephrase_refusal_moral_ed_600B")

Citation

If you find our work helpful, please cite our work as:

@article{maini2025safety,
  title={Safety pretraining: Toward the next generation of safe ai},
  author={Maini, Pratyush and Goyal, Sachin and Sam, Dylan and Robey, Alex and Savani, Yash and Jiang, Yiding and Zou, Andy and Lipton, Zachary C and Kolter, J Zico},
  journal={arXiv preprint arXiv:2504.16980},
  year={2025}
}

Downloads last month: 19

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for locuslab/safelm-1.7b

Finetunes

1 model

Datasets used to train locuslab/safelm-1.7b

Space using locuslab/safelm-1.7b 1

Collection including locuslab/safelm-1.7b

Safety Pretraining Artifacts

Collection

Artifacts released with Safety Pretraining • 7 items • Updated 4 days ago