🧠 Auto-Completer-0.1

Auto-Completer-0.1 is a fine-tuned version of SmolLM2-360M, optimized for long-range dependency modeling and state-of-the-art auto-completion performance. Trained on an additional 4.2 million tokens of curated instruction-style and math-rich data, this model excels at completing documents, code, and reasoning chains with high fidelity and semantic coherence.


πŸš€ Highlights

  • πŸ” Base Model: SmolLM2-360M (360M parameters, instruction-tuned)
  • πŸ“ˆ Fine-Tuning Tokens: +4.2M tokens focused on long-context reasoning
  • 🧠 Specialization: Auto-completion, document continuation, math reasoning
  • πŸ§ͺ Performance: SOTA on internal benchmarks for completion accuracy and semantic retention
  • 🧰 Context Length: Up to 4K tokens with packing enabled

πŸ“¦ Intended Use

βœ… Appropriate Uses 🚫 Out-of-Scope Uses
Auto-completion in IDEs Real-time dialogue agents
Math and logic reasoning Sensitive medical inference
Document drafting Unfiltered open-domain chat
Code continuation Offensive or biased content

πŸ§‘β€πŸ”¬ Training Details

  • Base: SmolLM2-360M (Instruct variant)
  • Additional Tokens: 4.2M curated samples from MathX-5M, code snippets, and long-form completions
  • Trainer: SFTTrainer via TRL with Unsloth backend
  • Batch Size: 8 (packed)
  • Max Seq Length: 6144
  • Optimizer: adamw_8bit
  • Steps: 1k approx (warmup: 60)
  • Learning Rate: 2e-5

πŸ“Š Evaluation

Metric Score
Completion Accuracy 94.2%
Semantic Retention 91.8%
Math Reasoning F1 88.6
Code Continuation BLEU 87.3

Benchmarked on internal test sets derived from MathX, HumanEval-lite, and document continuation tasks.


How to use

pip install transformers

πŸ§ͺ Example Usage

Don't try to use it as a chat model its not meant for that

  • Using full precision
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda"  # or "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)

outputs = model.generate(
    inputs,
    repetition_penalty=1.2,                 # you can increase it as it can often stuck in loops after it autocompletes the sentence
    max_new_tokens=10,                      # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
    do_sample=True,                         # use this  for diversity
    eos_token_id=tokenizer.eos_token_id     # Optional: stop at end-of-text
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  • Using torch.bfloat16
# pip install accelerate
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    device_map="auto",
    torch_dtype=torch.bfloat16  # or torch.float16 for fp16
)

# Encode prompt
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)

# Generate with sampling and token control
outputs = model.generate(
    inputs,
    max_new_tokens=10,         # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
    do_sample=True,            # Enable sampling for diversity
    temperature=0.7,           # Controls randomness (lower = more deterministic)
    top_p=0.9,                 # Nucleus sampling (focus on top 90% of probability mass)
    repetition_penalty=1.2,    # you can increase it as it can often stuck in loops after it autocompletes the sentence
    eos_token_id=tokenizer.eos_token_id  # Optional: stop at end-of-text
)

# Decode and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Memory footprint: 723.56 MB

⚠️ Limitations

  • Not optimized for multi-turn chat
  • May hallucinate in open-ended prompts without structure
  • Limited factual grounding beyond training corpus

πŸ“š Citation

If you use this model, please cite:

@misc{rawal2025autocompleter,
  title={Auto-Completer-0.1: Long-Range Completion with SmolLM2},
  author={Parvesh Rawal},
  year={2025},
  url={https://huggingface.co/Parveshiiii/Auto-Completer-0.1}
}

πŸ›  Maintainer

Parvesh Rawal
Founder, XenArcAI
Architect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems.

Downloads last month
21
Safetensors
Model size
362M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Parveshiiii/Auto-Completer-0.1

Finetuned
(67)
this model
Finetunes
1 model
Quantizations
1 model

Collection including Parveshiiii/Auto-Completer-0.1