π Real-Time On-Device AI Agent with Polaris-4B β Run It Yourself, No Cloud, No Cost
We just deployed a real-time on-device AI agent using the Polaris-4B-Preview model β one of the top-performing <6B open LLMs on Hugging Face.
π± Whatβs remarkable? This model runs entirely on a mobile device, without cloud, and without any manual optimization. It was built using ZETIC.MLange, and the best part?
β‘οΈ Itβs totally automated, free to use, and anyone can do it. You donβt need to write deployment code, tweak backends, or touch device-specific SDKs. Just upload your model β and ZETIC.MLange handles the rest.
π§ About the Model - Model: Polaris-4B-Preview - Size: ~4B parameters - Ranking: Top 3 on Hugging Face LLM Leaderboard (<6B) - Tokenizer: Token-incremental inference supported - Modifications: None β stock weights, just optimized for mobile
βοΈ What ZETIC.MLange Does ZETIC.MLange is a fully automated deployment framework for On-Device AI, built for AI engineers who want to focus on models β not infrastructure.
Hereβs what it does in minutes: - π Analyzes model structure - βοΈ Converts to mobile-optimized format (e.g., GGUF, ONNX) - π¦ Generates a runnable runtime environment with pre/post-processing - π± Targets real mobile hardware (CPU, GPU, NPU β including Qualcomm, MediaTek, Apple) - π― Gives you a downloadable SDK or mobile app component β ready to run And yes β this is available now, for free, at https://mlange.zetic.ai
π§ͺ For AI Engineers Like You, If you want to: - Test LLMs directly on-device - Run models offline with no latency - Avoid cloud GPU costs - Deploy to mobile without writing app-side inference code
Then this is your moment. You can do exactly what we did, using your own models β all in a few clicks.
We (@osma, @MonaLehtinen & me, i.e. the Annif team at the National Library of Finland) recently took part in the LLMs4Subjects challenge at the SemEval-2025 workshop. The task was to use large language models (LLMs) to generate good quality subject indexing for bibliographic records, i.e. titles and abstracts.
We are glad to report that our system performed well; it was ranked
π₯ 1st in the category where the full vocabulary was used π₯ 2nd in the smaller vocabulary category π 4th in the qualitative evaluations.
14 participating teams developed their own solutions for generating subject headings and the output of each system was assessed using both quantitative and qualitative evaluations. Research papers about most of the systems are going to be published around the time of the workshop in late July, and many pre-prints are already available.
We applied Annif together with several LLMs that we used to preprocess the data sets: translated the GND vocabulary terms to English, translated bibliographic records into English and German as required, and generated additional synthetic training data. After the preprocessing, we used the traditional machine learning algorithms in Annif as well as the experimental XTransformer algorithm that is based on language models. We also combined the subject suggestions generated using English and German language records in a novel way.
Iβm excited to announce a major update to VisionScout, my interactive vision tool that now supports VIDEO PROCESSING, in addition to powerful object detection and scene understanding!
βοΈ NEW: Video Analysis Is Here! π¬ Upload any video file to detect and track objects using YOLOv8. β±οΈ Customize processing intervals to balance speed and thoroughness. π Get comprehensive statistics and summaries showing object appearances across the entire video.
What else can VisionScout do?
πΌοΈ Analyze any image and detect 80 object types with YOLOv8. π Switch between Nano, Medium, and XLarge models for speed or accuracy. π― Filter by object classes (people, vehicles, animals, etc.) to focus on what matters. π View detailed stats on detections, confidence levels, and distributions. π§ Understand scenes β interpreting environments and potential activities. β οΈ Automatically identify possible safety concerns based on detected objects.
My goal: To bridge the gap between raw detection and meaningful interpretation. Iβm constantly exploring ways to help machines not just "see" but truly understand context β and to make these advanced tools accessible to everyone, regardless of technical background.
This time Gemini is very quick with API support on its 2.5 pro May release. The performance is impressive too, now it is among top contenders like o4, R1, and Claude.