stanford-crfm
/

BioMedLM

@@ -15,7 +15,6 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
 - [Model Details](#model-details)
   - [Model Description](#model-description)
 - [Uses](#uses)
-  - [Direct Use](#direct-use)
   - [Downstream Use](#downstream-use)
   - [Out-of-Scope Use](#out-of-scope-use)
 - [Bias, Risks, and Limitations](#bias-risks-and-limitations)
@@ -24,23 +23,11 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
   - [Training Data](#training-data)
   - [Training Procedure](#training-procedure)
     - [Preprocessing](#preprocessing)
-    - [Speeds, Sizes, Times](#speeds-sizes-times)
-- [Evaluation](#evaluation)
-  - [Testing Data, Factors & Metrics](#testing-data-factors--metrics)
-    - [Testing Data](#testing-data)
-    - [Factors](#factors)
-    - [Metrics](#metrics)
-  - [Results](#results)
-- [Model Examination](#model-examination)
 - [Environmental Impact](#environmental-impact)
 - [Technical Specifications](#technical-specifications)
   - [Model Architecture and Objective](#model-architecture-and-objective)
   - [Compute Infrastructure](#compute-infrastructure)
-    - [Hardware](#hardware)
-    - [Software](#software)
-- [Citation](#citation)
-- [Model Card Contact](#model-card-contact)
-- [How to Get Started with the Model](#how-to-get-started-with-the-model)
 # Model Details
@@ -61,6 +48,8 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
 - **Language(s) (NLP):** en
 - **License:** openrail
 ## Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
@@ -83,8 +72,6 @@ The main way we have used this model is finetuning for downstream question answe
 We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
 # Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
@@ -155,19 +142,12 @@ This allows the model to encode information about these concepts in their indivi
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** More information needed
-- **Hours used:** More information needed
-- **Cloud Provider:** More information needed
-- **Compute Region:** More information needed
-- **Carbon Emitted:** More information needed
 # Technical Specifications
 ## Model Architecture and Objective
 Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention) with the following hyperparameters:
 |             |       |
 | ----------- | ----- |
 | hidden size | 2560  |
@@ -176,7 +156,6 @@ Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention
 | vocab size  | 28896 |
 | sequence length| 1024      |
 ## Compute Infrastructure
 The model was trained on [MosaicML Cloud](https://www.mosaicml.com/cloud), a platform designed for large workloads like LLMs. Using the [Composer](https://github.com/mosaicml/composer) training library and [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html), it was easy to enable multi-node training across 128 A100-40GB GPUs, and the total run was completed in ~6.25 days.

 - [Model Details](#model-details)
   - [Model Description](#model-description)
 - [Uses](#uses)
   - [Downstream Use](#downstream-use)
   - [Out-of-Scope Use](#out-of-scope-use)
 - [Bias, Risks, and Limitations](#bias-risks-and-limitations)
   - [Training Data](#training-data)
   - [Training Procedure](#training-procedure)
     - [Preprocessing](#preprocessing)
 - [Environmental Impact](#environmental-impact)
 - [Technical Specifications](#technical-specifications)
   - [Model Architecture and Objective](#model-architecture-and-objective)
   - [Compute Infrastructure](#compute-infrastructure)
 # Model Details
 - **Language(s) (NLP):** en
 - **License:** openrail
+# Uses
 ## Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
 # Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 # Technical Specifications
 ## Model Architecture and Objective
 Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention) with the following hyperparameters:
 |             |       |
 | ----------- | ----- |
 | hidden size | 2560  |
 | vocab size  | 28896 |
 | sequence length| 1024      |
 ## Compute Infrastructure
 The model was trained on [MosaicML Cloud](https://www.mosaicml.com/cloud), a platform designed for large workloads like LLMs. Using the [Composer](https://github.com/mosaicml/composer) training library and [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html), it was easy to enable multi-node training across 128 A100-40GB GPUs, and the total run was completed in ~6.25 days.