🧠 Auto-Completer-0.1

Auto-Completer-0.1 is a fine-tuned version of SmolLM2-360M, optimized for long-range dependency modeling and state-of-the-art auto-completion performance. Trained on an additional 4.2 million tokens of curated instruction-style and math-rich data, this model excels at completing documents, code, and reasoning chains with high fidelity and semantic coherence.

🚀 Highlights

🔍 Base Model: SmolLM2-360M (360M parameters, instruction-tuned)
📈 Fine-Tuning Tokens: +4.2M tokens focused on long-context reasoning
🧠 Specialization: Auto-completion, document continuation, math reasoning
🧪 Performance: SOTA on internal benchmarks for completion accuracy and semantic retention
🧰 Context Length: Up to 4K tokens with packing enabled

📦 Intended Use

✅ Appropriate Uses	🚫 Out-of-Scope Uses
Auto-completion in IDEs	Real-time dialogue agents
Math and logic reasoning	Sensitive medical inference
Document drafting	Unfiltered open-domain chat
Code continuation	Offensive or biased content

🧑‍🔬 Training Details

Base: SmolLM2-360M (Instruct variant)
Additional Tokens: 4.2M curated samples from MathX-5M, code snippets, and long-form completions
Trainer: SFTTrainer via TRL with Unsloth backend
Batch Size: 8 (packed)
Max Seq Length: 6144
Optimizer: adamw_8bit
Steps: 1k approx (warmup: 60)
Learning Rate: 2e-5

📊 Evaluation

Metric	Score
Completion Accuracy	94.2%
Semantic Retention	91.8%
Math Reasoning F1	88.6
Code Continuation BLEU	87.3

Benchmarked on internal test sets derived from MathX, HumanEval-lite, and document continuation tasks.

How to use

pip install transformers

🧪 Example Usage

Don't try to use it as a chat model its not meant for that

Using full precision

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda"  # or "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)

outputs = model.generate(
    inputs,
    repetition_penalty=1.2,                 # you can increase it as it can often stuck in loops after it autocompletes the sentence
    max_new_tokens=10,                      # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
    do_sample=True,                         # use this  for diversity
    eos_token_id=tokenizer.eos_token_id     # Optional: stop at end-of-text
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using torch.bfloat16

# pip install accelerate
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    device_map="auto",
    torch_dtype=torch.bfloat16  # or torch.float16 for fp16
)

# Encode prompt
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)

# Generate with sampling and token control
outputs = model.generate(
    inputs,
    max_new_tokens=10,         # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
    do_sample=True,            # Enable sampling for diversity
    temperature=0.7,           # Controls randomness (lower = more deterministic)
    top_p=0.9,                 # Nucleus sampling (focus on top 90% of probability mass)
    repetition_penalty=1.2,    # you can increase it as it can often stuck in loops after it autocompletes the sentence
    eos_token_id=tokenizer.eos_token_id  # Optional: stop at end-of-text
)

# Decode and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Memory footprint: 723.56 MB

⚠️ Limitations

Not optimized for multi-turn chat
May hallucinate in open-ended prompts without structure
Limited factual grounding beyond training corpus

📚 Citation

If you use this model, please cite:

@misc{rawal2025autocompleter,
  title={Auto-Completer-0.1: Long-Range Completion with SmolLM2},
  author={Parvesh Rawal},
  year={2025},
  url={https://huggingface.co/Parveshiiii/Auto-Completer-0.1}
}

Parveshiiii
/

Auto-Completer-0.1

🧠 Auto-Completer-0.1

🚀 Highlights

📦 Intended Use

🧑‍🔬 Training Details

📊 Evaluation

How to use

🧪 Example Usage

⚠️ Limitations

📚 Citation

🛠 Maintainer

Parvesh Rawal
Founder, XenArcAI
Architect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems.

Model tree for Parveshiiii/Auto-Completer-0.1

Collection including Parveshiiii/Auto-Completer-0.1

Auto-Completer

🧠 Auto-Completer-0.1

🚀 Highlights

📦 Intended Use

🧑‍🔬 Training Details

📊 Evaluation

How to use

🧪 Example Usage

⚠️ Limitations

📚 Citation

🛠 Maintainer

Parvesh RawalFounder, XenArcAIArchitect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems.

Model tree for Parveshiiii/Auto-Completer-0.1

Collection including Parveshiiii/Auto-Completer-0.1

Parvesh Rawal
Founder, XenArcAI
Architect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems.