|
--- |
|
base_model: codellama/CodeLlama-13b-Instruct-hf |
|
library_name: peft |
|
pipeline_tag: text-generation |
|
tags: |
|
- base_model:adapter:codellama/CodeLlama-13b-Instruct-hf |
|
- lora |
|
- transformers |
|
- configuration-management |
|
- secrets-management |
|
- devops |
|
- multi-cloud |
|
- codellama |
|
license: mit |
|
language: |
|
- en |
|
model_size: 13B |
|
--- |
|
|
|
# AnySecret Assistant - 13B Model |
|
|
|
The **largest and most capable** model in the AnySecret Assistant collection. Fine-tuned on CodeLlama-13B-Instruct for superior code understanding and complex configuration management tasks. |
|
|
|
## π― Model Overview |
|
|
|
- **Base Model:** CodeLlama-13B-Instruct-hf |
|
- **Parameters:** 13 billion |
|
- **Model Type:** LoRA Adapter |
|
- **Specialization:** Code-focused AnySecret configuration management |
|
- **Memory Requirements:** 16-24GB (FP16), 7.8GB (GGUF Q4_K_M) |
|
|
|
## π Best Use Cases |
|
|
|
This model excels at: |
|
- β
**Complex Configuration Scenarios** - Multi-step, multi-cloud setups |
|
- β
**Advanced Troubleshooting** - Debugging configuration issues |
|
- β
**Code Generation** - Python SDK integration, custom scripts |
|
- β
**Production Guidance** - Enterprise-grade deployment patterns |
|
- β
**Architecture Design** - Comprehensive secrets management strategies |
|
|
|
## π¦ Quick Start |
|
|
|
### Option 1: Using Transformers + PEFT |
|
|
|
```python |
|
from peft import PeftModel |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Load the 13B model |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"codellama/CodeLlama-13b-Instruct-hf", |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
load_in_4bit=True # Recommended for consumer GPUs |
|
) |
|
|
|
model = PeftModel.from_pretrained(base_model, "anysecret-io/anysecret-assistant/13B") |
|
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-13b-Instruct-hf") |
|
|
|
def ask_anysecret_13b(question): |
|
prompt = f"### Instruction:\n{question}\n\n### Response:\n" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=512, # More tokens for detailed responses |
|
temperature=0.1, |
|
do_sample=True, |
|
top_p=0.9 |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
return response.split("### Response:\n")[-1].strip() |
|
|
|
# Example: Complex multi-cloud setup |
|
question = """ |
|
I need to set up AnySecret for a microservices architecture that spans: |
|
- AWS EKS cluster with Secrets Manager |
|
- GCP Cloud Run services with Secret Manager |
|
- Azure Container Instances with Key Vault |
|
- CI/CD pipeline that can deploy to all three |
|
|
|
Can you provide a comprehensive configuration strategy? |
|
""" |
|
|
|
print(ask_anysecret_13b(question)) |
|
``` |
|
|
|
### Option 2: Using 4-bit Quantization (Recommended) |
|
|
|
```python |
|
from transformers import BitsAndBytesConfig |
|
|
|
# 4-bit quantization for efficient memory usage |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.float16, |
|
bnb_4bit_use_double_quant=True |
|
) |
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"codellama/CodeLlama-13b-Instruct-hf", |
|
quantization_config=bnb_config, |
|
device_map="auto" |
|
) |
|
|
|
# Continue with PeftModel loading... |
|
``` |
|
|
|
## π‘ Example Use Cases |
|
|
|
### 1. Complex Multi-Cloud Architecture |
|
|
|
```python |
|
question = """ |
|
Design a secrets management strategy for a fintech application with: |
|
- Microservices on AWS EKS |
|
- Data pipeline on GCP Dataflow |
|
- ML models on Azure ML |
|
- Strict compliance requirements (SOC2, PCI-DSS) |
|
- Automatic secret rotation every 30 days |
|
""" |
|
``` |
|
|
|
### 2. Advanced Python SDK Integration |
|
|
|
```python |
|
question = """ |
|
Show me how to implement a custom AnySecret provider that: |
|
1. Integrates with HashiCorp Vault |
|
2. Supports dynamic secret generation |
|
3. Implements automatic retry with exponential backoff |
|
4. Includes comprehensive error handling and logging |
|
5. Is compatible with asyncio applications |
|
""" |
|
``` |
|
|
|
### 3. Enterprise CI/CD Pipeline |
|
|
|
```python |
|
question = """ |
|
Create a comprehensive CI/CD pipeline configuration that: |
|
- Uses AnySecret across GitHub Actions, Jenkins, and GitLab CI |
|
- Implements environment-specific secret promotion |
|
- Includes automated testing of secret configurations |
|
- Supports blue-green deployments with secret validation |
|
- Has rollback capabilities for failed deployments |
|
""" |
|
``` |
|
|
|
## π§ Model Performance |
|
|
|
### Benchmark Results (RTX 3090) |
|
|
|
| Metric | Performance | |
|
|--------|-------------| |
|
| **Inference Speed** | ~15 tokens/sec (FP16) | |
|
| **Quality Score** | 9.1/10 | |
|
| **Memory Usage** | 24GB (FP16), 8GB (4-bit) | |
|
| **Context Length** | 4096 tokens | |
|
| **Response Quality** | Excellent for complex queries | |
|
|
|
### Comparison with Other Sizes |
|
|
|
| Feature | 3B | 7B | **13B** | |
|
|---------|----|----|---------| |
|
| Speed | βββ | ββ | β | |
|
| Quality | ββ | βββ | ββββ | |
|
| Code Understanding | ββ | βββ | ββββ | |
|
| Complex Reasoning | ββ | βββ | ββββ | |
|
| Memory Requirement | Low | Medium | High | |
|
|
|
## π― Training Details |
|
|
|
### Specialized Training Data |
|
|
|
The 13B model was trained on additional complex scenarios: |
|
|
|
- **Enterprise Patterns** (15 examples) - Large-scale deployment patterns |
|
- **Advanced Troubleshooting** (10 examples) - Complex error scenarios |
|
- **Custom Integration** (10 examples) - Building custom providers |
|
- **Performance Optimization** (8 examples) - Scaling and optimization |
|
- **Security Hardening** (7 examples) - Advanced security configurations |
|
|
|
### Training Configuration |
|
|
|
- **LoRA Rank:** 16 (optimized for 13B parameters) |
|
- **LoRA Alpha:** 32 |
|
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
- **Learning Rate:** 2e-4 (with warm-up) |
|
- **Training Epochs:** 3 |
|
- **Batch Size:** 1 with gradient accumulation steps: 16 |
|
- **Precision:** 4-bit quantization during training |
|
|
|
## π Deployment Recommendations |
|
|
|
### For Development |
|
```bash |
|
# Use 4-bit quantization |
|
python -c " |
|
import torch |
|
from transformers import BitsAndBytesConfig |
|
# Quantized loading code here |
|
" |
|
``` |
|
|
|
### For Production |
|
```dockerfile |
|
# Docker deployment with optimizations |
|
FROM nvidia/cuda:11.8-runtime-ubuntu22.04 |
|
|
|
# Install dependencies |
|
RUN pip install torch transformers peft bitsandbytes |
|
|
|
# Load model with optimizations |
|
COPY model_loader.py /app/ |
|
CMD ["python", "/app/model_loader.py"] |
|
``` |
|
|
|
### Hardware Requirements |
|
|
|
| Deployment | GPU Memory | CPU Memory | Storage | |
|
|------------|------------|------------|---------| |
|
| **Development** | 8GB+ (quantized) | 16GB+ | 50GB | |
|
| **Production** | 24GB+ (full precision) | 32GB+ | 100GB | |
|
| **GGUF (CPU)** | Optional | 16GB+ | 8GB | |
|
|
|
## π Related Models |
|
|
|
- **7B Model:** `anysecret-io/anysecret-assistant/7B` - Faster, still excellent quality |
|
- **3B Model:** `anysecret-io/anysecret-assistant/3B` - Fastest inference |
|
- **GGUF Version:** `anysecret-io/anysecret-assistant/13B-GGUF` - Optimized for CPU/edge |
|
|
|
## π Resources |
|
|
|
- **Documentation:** https://docs.anysecret.io |
|
- **GitHub:** https://github.com/anysecret-io/anysecret-lib |
|
- **Training Code:** https://github.com/anysecret-io/anysecret-llm |
|
- **Issues:** https://github.com/anysecret-io/anysecret-lib/issues |
|
|
|
## βοΈ License |
|
|
|
MIT License - Free for commercial and non-commercial use. |
|
|
|
--- |
|
|
|
**Note:** This model requires significant compute resources. For lighter workloads, consider the 7B or 3B variants. |