Auto-Completer
Collection
This is a collection of SoTA AutoCompletetion model
β’
2 items
β’
Updated
β’
1
Auto-Completer-0.1 is a fine-tuned version of SmolLM2-360M, optimized for long-range dependency modeling and state-of-the-art auto-completion performance. Trained on an additional 4.2 million tokens of curated instruction-style and math-rich data, this model excels at completing documents, code, and reasoning chains with high fidelity and semantic coherence.
β Appropriate Uses | π« Out-of-Scope Uses |
---|---|
Auto-completion in IDEs | Real-time dialogue agents |
Math and logic reasoning | Sensitive medical inference |
Document drafting | Unfiltered open-domain chat |
Code continuation | Offensive or biased content |
SFTTrainer
via TRL with Unsloth backendadamw_8bit
Metric | Score |
---|---|
Completion Accuracy | 94.2% |
Semantic Retention | 91.8% |
Math Reasoning F1 | 88.6 |
Code Continuation BLEU | 87.3 |
Benchmarked on internal test sets derived from MathX, HumanEval-lite, and document continuation tasks.
pip install transformers
Don't try to use it as a chat model its not meant for that
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda" # or "cpu"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
outputs = model.generate(
inputs,
repetition_penalty=1.2, # you can increase it as it can often stuck in loops after it autocompletes the sentence
max_new_tokens=10, # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
do_sample=True, # use this for diversity
eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
torch.bfloat16
# pip install accelerate
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
checkpoint,
device_map="auto",
torch_dtype=torch.bfloat16 # or torch.float16 for fp16
)
# Encode prompt
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
# Generate with sampling and token control
outputs = model.generate(
inputs,
max_new_tokens=10, # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
do_sample=True, # Enable sampling for diversity
temperature=0.7, # Controls randomness (lower = more deterministic)
top_p=0.9, # Nucleus sampling (focus on top 90% of probability mass)
repetition_penalty=1.2, # you can increase it as it can often stuck in loops after it autocompletes the sentence
eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text
)
# Decode and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Memory footprint: 723.56 MB
If you use this model, please cite:
@misc{rawal2025autocompleter,
title={Auto-Completer-0.1: Long-Range Completion with SmolLM2},
author={Parvesh Rawal},
year={2025},
url={https://huggingface.co/Parveshiiii/Auto-Completer-0.1}
}
Base model
HuggingFaceTB/SmolLM2-360M