--- license: apache-2.0 base_model: swiss-ai/Apertus-8B-2509 tags: - text-embeddings - multilingual - encoder - apertus - experimental language: - multilingual library_name: transformers pipeline_tag: feature-extraction model_type: apertus --- # Apertus-8B-2509-Encoder ## Model Overview **Apertus-8B-2509-Encoder** is an experimental bidirectional encoder model derived from the swiss-ai/Apertus-8B-2509 decoder-only model. This model represents the first attempt to create a native Apertus-based encoder for text embedding generation and semantic similarity tasks. **⚠️ Experimental Notice**: This model is in experimental stage and may not perform optimally for production embedding tasks. See limitations section for details. ## Model Details - **Model Type**: Bidirectional Transformer Encoder - **Base Model**: swiss-ai/Apertus-8B-2509 - **Parameters**: 8.053 billion - **Architecture**: 32-layer transformer with XIELUActivation - **Embedding Dimension**: 4096 - **Supported Languages**: 1811 (inherited from base model) - **License**: Apache 2.0 ## Intended Use ### Primary Use Cases - Text embedding generation for research purposes - Cross-lingual semantic analysis experiments - Proof-of-concept for decoder-to-encoder conversion - Base model for further fine-tuning on embedding tasks ### Downstream Tasks - Semantic similarity analysis - Information retrieval systems - Cross-lingual text comparison - Vector database integration ## How to Use ```python from transformers import AutoModel, AutoTokenizer import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained( "speakdatawith/Apertus-8B-2509-Encoder", trust_remote_code=True ) model = AutoModel.from_pretrained( "speakdatawith/Apertus-8B-2509-Encoder", trust_remote_code=True, torch_dtype=torch.bfloat16 ) # Generate embeddings def get_embeddings(texts): inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) embeddings = outputs.last_hidden_state.mean(dim=1) return embeddings # Example usage texts = ["Hello world", "Hallo Welt", "Bonjour monde"] embeddings = get_embeddings(texts) print(f"Embeddings shape: {embeddings.shape}") ``` ## Model Architecture The model maintains the original Apertus-8B-2509 architecture with key modifications: - **Attention Mechanism**: Converted from causal (decoder) to bidirectional (encoder) - **Configuration Changes**: - `is_decoder = False` - `is_causal = False` - `architectures = ['ApertusModel']` - **Pooling Strategy**: Mean pooling over last hidden states ## Training Details ### Conversion Process 1. Loaded pre-trained swiss-ai/Apertus-8B-2509 model 2. Disabled causal masking in all attention layers 3. Updated model configuration for encoder usage 4. No additional training performed ### Training Data Inherits training data from the base model swiss-ai/Apertus-8B-2509. Refer to the base model documentation for detailed data information. ## Performance & Limitations ### Known Limitations **⚠️ Important Performance Notice**: - Initial testing revealed suboptimal embedding quality - Semantic similarity scores appear inconsistent with expected behavior - Model may produce embeddings that do not accurately reflect semantic relationships - Performance significantly below specialized embedding models ### Technical Limitations - **Resource Requirements**: 16GB+ GPU memory for inference - **Speed**: Significantly slower than specialized embedding models - **Optimization**: Not fine-tuned for embedding tasks - **Pooling**: Uses simple mean pooling strategy ### Benchmark Results Preliminary testing on basic similarity tasks showed: - Cross-lingual similarity detection: Inconsistent - Direct translation pairs: Below expected performance - Semantic relationship recognition: Requires improvement ## System Requirements ### Hardware - **GPU**: 16GB+ VRAM recommended (A100, H100, or equivalent) - **CPU**: High-memory alternative possible but significantly slower - **RAM**: 32GB+ system RAM recommended ### Software - Python 3.12+ - PyTorch 2.8.0+cu126 - Transformers >= 4.56.1 - `trust_remote_code=True` required ## Ethical Considerations & Biases ### Inherited Considerations This model inherits all ethical considerations and potential biases from the base swiss-ai/Apertus-8B-2509 model. Users should: - Review base model documentation for bias analysis - Conduct appropriate bias testing for their specific use cases - Consider potential cultural and linguistic biases across 1811 supported languages ### EU AI Act Compliance This model is developed in compliance with EU AI Act requirements: - Comprehensive documentation provided - Risk assessment conducted - Transparency obligations fulfilled - Technical documentation available ## Environmental Impact - **Energy Consumption**: High due to 8B parameter size - **Carbon Footprint**: Significant computational requirements - **Efficiency**: Substantially less efficient than specialized embedding models ## Future Development Potential improvements for future versions: - Fine-tuning on embedding-specific datasets - Implementation of advanced pooling strategies - Model distillation for efficiency improvements - Comprehensive evaluation on standard embedding benchmarks ## Citation ``` @misc{apertus8b2509encoder, title={Apertus-8B-2509-Encoder: Experimental Bidirectional Encoder}, author={speakdatawith}, year={2025}, howpublished={Hugging Face Model Hub}, url={https://huggingface.co/speakdatawith/Apertus-8B-2509-Encoder} } ``` ## Acknowledgments - Base model: swiss-ai/Apertus-8B-2509 - Architecture: Transformer-based encoder conversion - Framework: Hugging Face Transformers ## Contact For questions regarding this model or its implementation, please open an issue in the model repository. --- **Disclaimer**: This is an experimental model. Production use is not recommended without thorough evaluation and potential fine-tuning for specific embedding tasks.