Spaces:
Running
Running
Jae-Won Chung
commited on
Commit
·
48843fe
1
Parent(s):
91c65f8
Install lm-evaluation-harness in Dockerfile
Browse files- Dockerfile +8 -0
- LEADERBOARD.md +1 -1
- pegasus/README.md +1 -1
Dockerfile
CHANGED
|
@@ -26,6 +26,14 @@ ADD . /workspace/leaderboard
|
|
| 26 |
RUN cd /workspace/leaderboard \
|
| 27 |
&& pip install -r requirements-benchmark.txt
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
# Where all the weights downloaded from Hugging Face Hub will go to
|
| 30 |
ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
|
| 31 |
|
|
|
|
| 26 |
RUN cd /workspace/leaderboard \
|
| 27 |
&& pip install -r requirements-benchmark.txt
|
| 28 |
|
| 29 |
+
# Clone lm-evaluation-harness and install
|
| 30 |
+
RUN cd /workspace \
|
| 31 |
+
&& git clone https://github.com/EleutherAI/lm-evaluation-harness.git \
|
| 32 |
+
&& cd lm-evaluation-harness \
|
| 33 |
+
&& git checkout 72b7f0c00a6ff94632c5b873fc24e093ae74fa47 \
|
| 34 |
+
&& rm -r .git \
|
| 35 |
+
&& pip install -e .
|
| 36 |
+
|
| 37 |
# Where all the weights downloaded from Hugging Face Hub will go to
|
| 38 |
ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
|
| 39 |
|
LEADERBOARD.md
CHANGED
|
@@ -42,7 +42,7 @@ Find our benchmark script for one model [here](https://github.com/ml-energy/lead
|
|
| 42 |
- PyTorch 2.0.1
|
| 43 |
- [Zeus](https://ml.energy/zeus) -- For GPU time and energy measurement
|
| 44 |
- [FastChat](https://github.com/lm-sys/fastchat) -- For running inference on various models
|
| 45 |
-
- [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/
|
| 46 |
|
| 47 |
### Hardware
|
| 48 |
|
|
|
|
| 42 |
- PyTorch 2.0.1
|
| 43 |
- [Zeus](https://ml.energy/zeus) -- For GPU time and energy measurement
|
| 44 |
- [FastChat](https://github.com/lm-sys/fastchat) -- For running inference on various models
|
| 45 |
+
- [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/72b7f0c00a6ff94632c5b873fc24e093ae74fa47) -- For NLP evaluation metrics
|
| 46 |
|
| 47 |
### Hardware
|
| 48 |
|
pegasus/README.md
CHANGED
|
@@ -65,7 +65,7 @@ After all the tasks finish, aggregate all the data into one node and run [`compu
|
|
| 65 |
|
| 66 |
## NLP benchmark
|
| 67 |
|
| 68 |
-
We'll use [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/
|
| 69 |
|
| 70 |
Use Pegasus to run benchmarks for all the models across all nodes.
|
| 71 |
|
|
|
|
| 65 |
|
| 66 |
## NLP benchmark
|
| 67 |
|
| 68 |
+
We'll use [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/72b7f0c00a6ff94632c5b873fc24e093ae74fa47) to run models through three NLP datasets: ARC challenge (`arc`), HellaSwag (`hellaswag`), and TruthfulQA (`truthfulqa`).
|
| 69 |
|
| 70 |
Use Pegasus to run benchmarks for all the models across all nodes.
|
| 71 |
|