clowman
/

Llama-3.3-70B-Instruct-Dynamic-F8

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions Community

clowman commited on Apr 2

Commit

68fab69

·

verified ·

1 Parent(s): 2705396

Update README.md

Files changed (1) hide show

README.md +20 -16

README.md CHANGED Viewed

@@ -1,19 +1,3 @@
-# Quantization
-Created with [lambda-quant](https://github.com/LambdaLabsML/lambda-quant/tree/f97108fe4a9ee061a7b969b23a9605a6d561863d) on `Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]`
-Base Model: [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
-Quantized using [llmcompressor==0.4.1](https://github.com/vllm-project/llm-compressor)
-Steps to create:
-1. `git clone https://github.com/LambdaLabsML/lambda-quant`
-2. `git checkout f97108fe4a9ee061a7b969b23a9605a6d561863d`
-3. `python quantize.py -m meta-llama/Llama-3.3-70B-Instruct -q Dynamic-F8`
-## Evaluation
-TODO
-## Benchmarks
-TODO
-# Base Model README.md
 ---
 library_name: transformers
 language:
@@ -58,6 +42,26 @@ extra_gated_description: >-
 extra_gated_button_content: Submit
 license: llama3.3
 ---
 ## Model Information
 The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

 ---
 library_name: transformers
 language:
 extra_gated_button_content: Submit
 license: llama3.3
 ---
+# Quantization
+Created with [lambda-quant](https://github.com/LambdaLabsML/lambda-quant/tree/f97108fe4a9ee061a7b969b23a9605a6d561863d) on `Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]`
+Base Model: [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
+Quantized using [llmcompressor==0.4.1](https://github.com/vllm-project/llm-compressor)
+Steps to create:
+1. `git clone https://github.com/LambdaLabsML/lambda-quant`
+2. `git checkout f97108fe4a9ee061a7b969b23a9605a6d561863d`
+3. `python quantize.py -m meta-llama/Llama-3.3-70B-Instruct -q Dynamic-F8`
+## Evaluation
+TODO
+## Benchmarks
+TODO
+# Base Model README.md
 ## Model Information
 The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.