--- library_name: transformers tags: - bitnet - falcon-e license: other license_name: falcon-llm-license license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/KVAEDoch-o0HgA0e2L4HL.png) # Table of Contents 0. [TL;DR](#TL;DR) 1. [Model Details](#model-details) 2. [Training Details](#training-details) 3. [Usage](#usage) 4. [Evaluation](#evaluation) 5. [Citation](#citation) # TL;DR # Model Details ## Model Description - **Developed by:** [https://www.tii.ae](https://www.tii.ae) - **Model type:** Causal decoder-only / Base version - **Architecture:** Pure-transformer - 1.58bit version - **Language(s) (NLP):** English - **License:** Falcon-LLM License # Training details For more details about the training protocol of this model, please refer to the [Falcon-E technical blogpost](https://falcon-lm.github.io/blog/falcon-edge/). # Usage Currently to use this model you can either rely on Hugging Face transformers library or [BitNet](https://github.com/microsoft/BitNet) library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the `bfloat16` version of the BitNet model. ### Inference #### 🤗 transformers In case you want to perform inference on the BitNet checkpoint run: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "tiiuae/Falcon-E-1B-Base" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, ).to("cuda") # Perform text generation ``` If you want to rather use the classic `bfloat16` version, you can run: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "tiiuae/Falcon-E-1B-Base" revision = "bfloat16" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, revision=revision, ).to("cuda") # Perform text generation ``` #### BitNet ``` git clone https://github.com/microsoft/BitNet && cd BitNet pip install -r requirements.txt python setup_env.py --hf-repo tiiuae/Falcon-E-1B-Base -q i2_s python run_inference.py -m models/Falcon-E-1B-Base/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv ``` ### Fine-tuning For fine-tuning the model, you should load the `prequantized` revision of the model and use the `onebitllms` Python package: ```diff import torch from transformers import AutoModelForCausalLM, AutoTokenizer from trl import SFTTrainer + from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit model_id = "tiiuae/Falcon-E-1B-Base" tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized") model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, + revision="prequantized" ) + model = replace_linear_with_bitnet_linear(model) trainer = SFTTrainer( model, ... ) trainer.train() + quantize_to_1bit(output_directory) ``` # Evaluation We report in the following table our internal pipeline benchmarks: **Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks**
For 1B scale models and below | Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | Qwen-2.5-0.5B | 0.5B | 1GB | 16.27 | 3.93 | 0.0 | 2.08 | 6.95 | 10.06 | 6.55 | | SmolLM2-360M | 0.36B | 720MB | 21.15 | 1.21 | 0.0 | 7.73 | 5.54 | 1.88 | 6.25 | | Qwen-2.5-1.5B | 1.5B | 3.1GB | 26.74 | 9.14 | 16.66 | 5.27 | 20.61 | 4.7 | 13.85 | | Llama-3.2-1B | 1.24B | 2.47GB | 14.78 | 1.21 | 4.37 | 2.56 | 2.26 | 0 | 4.2 | | SmolLM2-1.7B | 1.7B | 3.4GB | 24.4 | 2.64 | 9.3 | 4.6 | 12.64 | 3.91 | 9.58 | | Falcon-3-1B-Base | 1.5B | 3GB | 24.28 | 3.32 | 11.34 | 9.71 | 6.76 | 3.91 | 9.89 | | Hymba-1.5B-Base | 1.5B | 3GB | 22.95 | 1.36 | 7.69 | 5.18 | 10.25 | 0.78 | 8.04 | | Falcon-E-1B-Base | 1.8B | **635MB** | 32.9 | 10.97 | 2.8 | 3.65 | 12.28 | 17.82 | 13.40 |
For 3B scale models | Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | Falcon-3-3B-Base | 3B | 6.46GB | 15.74 | 11.78 | 21.58 | 6.27 | 18.09 | 6.26 | 15.74 | | Qwen2.5-3B | 3B | 6.17GB | 26.9 | 14.8 | 24.3 | 11.76 | 24.48 | 6.38 | 18.1 | | Falcon-E-3B-Base | 3B | **999MB** | 36.67 | 13.45 | 8.67 | 4.14 | 19.83 | 27.16 | 18.32 |
Below are the results for instruction fine-tuned models:
For 1B scale models and below | Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | Qwen-2.5-0.5B-Instruct | 500M | 1GB | 30.71 | 0 | 8.43 | 0.94 | 7.75 | 0 | 6.59 | | SmolLM2-360M-Instruct | 360M | 720MB | 38.42 | 1.51 | 4.17 | 2.77 | 1.3 | 0.67 | 8.14 | | Qwen-2.5-1.5B-Instruct | 1.5B | 3.1GB | 44.76 | 22.05 | 19.81 | 3.19 | 19.99 | 0.78 | 18.43 | | SmolLM2-1.7B | 1.7B | 3.4GB | 53.68 | 5.82 | 10.92 | 4.1 | 11.71 | 0 | 15.02 | | Falcon-3-1B-Instruct | 1.5B | 3GB | 55.57 | 6.34 | 12.96 | 10.56 | 9.32 | 2.24 | 16.16 | | Hymba-1.5B-Instruct | 1.5B | 3GB | 60.09 | 2.72 | 4.59 | 1.05 | 11.56 | 5.515 | 14.19 | | Falcon-E-1B-Instruct | 1.8B | **635MB** | 54.35 | 9.12 | 16.5 | 2.51 | 19.42 | 9.64 | 18.59 |
For 3B scale models | Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | Falcon-3-3B-Instruct | 3B | 6.46GB | 69.77 | 25 | 26.29 | 11.13 | 22.28 | 5.15 | 26.6 | | Qwen2.5-3B-Instruct | 3B | 6.17GB | 64.75 | 36.78 | 25.8 | 7.57 | 25.05 | 3.02 | 27.16 | | Falcon-E-3B-Instruct | 3B | **999MB** | 60.97 | 15.3 | 23.59 | 2.12 | 26.45 | 7.45 | 22.64666667 |
## Useful links - View [our release blogpost](https://falcon-lm.github.io/blog/falcon-edge/). - Learn more about [`onebitllms` library](https://github.com/tiiuae/onebitllms). - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers. ## Citation If the Falcon-E family of models were helpful to your work, feel free to give us a cite. ``` @misc{tiionebitllms, title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.}, author = {Falcon-LLM Team}, month = {April}, url = {https://falcon-lm.github.io/blog/falcon-edge}, year = {2025} } ```