Safetensors
English
llama
nielsr HF Staff commited on
Commit
c423b79
·
verified ·
1 Parent(s): 3fba893

Add library name and pipeline tag

Browse files

This PR adds the `library_name` and `pipeline_tag` to the model card's YAML metadata. The `library_name` is set to `transformers` as the model is compatible with the Transformers library. The `pipeline_tag` is set to `text-generation` because the model is a language model used for text generation.

Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -1,10 +1,13 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - common-pile/comma_v0.1_training_dataset
5
  language:
6
  - en
 
 
 
7
  ---
 
8
  # Comma v0.1-2T
9
 
10
  Comma v0.1-2T is a 7 billion parameter language model trained on 2 trillion tokens from [the Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), comprising of openly licensed text from [the Common Pile](https://huggingface.co/collections/common-pile/common-pile-v01-68307d37df48e36f02717f21).
@@ -38,7 +41,7 @@ Finally, please note that Comma v0.1-2T is a base model that has not undergone a
38
 
39
  ## Citation
40
 
41
- ```bibtext
42
  @article{kandpal2025common,
43
  title={{The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text}},
44
  author={Nikhil Kandpal and Brian Lester and Colin Raffel and Sebastian Majstorovic and Stella Biderman and Baber Abbasi and Luca Soldaini and Enrico Shippole and A. Feder Cooper and Aviya Skowron and Shayne Longpre and Lintang Sutawika and Alon Albalak and Zhenlin Xu and Guilherme Penedo and Loubna Ben and Elie Bakouch and John David and Honglu Fan and Dashiell Stander and Guangyu Song and Aaron Gokaslan and John Kirchenbauer and Tom Goldstein and Brian R and Bhavya Kailkhura and Tyler Murray},
 
1
  ---
 
2
  datasets:
3
  - common-pile/comma_v0.1_training_dataset
4
  language:
5
  - en
6
+ license: apache-2.0
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
  ---
10
+
11
  # Comma v0.1-2T
12
 
13
  Comma v0.1-2T is a 7 billion parameter language model trained on 2 trillion tokens from [the Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), comprising of openly licensed text from [the Common Pile](https://huggingface.co/collections/common-pile/common-pile-v01-68307d37df48e36f02717f21).
 
41
 
42
  ## Citation
43
 
44
+ ```bibtex
45
  @article{kandpal2025common,
46
  title={{The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text}},
47
  author={Nikhil Kandpal and Brian Lester and Colin Raffel and Sebastian Majstorovic and Stella Biderman and Baber Abbasi and Luca Soldaini and Enrico Shippole and A. Feder Cooper and Aviya Skowron and Shayne Longpre and Lintang Sutawika and Alon Albalak and Zhenlin Xu and Guilherme Penedo and Loubna Ben and Elie Bakouch and John David and Honglu Fan and Dashiell Stander and Guangyu Song and Aaron Gokaslan and John Kirchenbauer and Tom Goldstein and Brian R and Bhavya Kailkhura and Tyler Murray},