Francesco Fugazzi's picture
2 1

Francesco Fugazzi

franzipol
·

AI & ML interests

None yet

Recent Activity

reacted to codelion's post with ❤️ 19 days ago
I recently worked on a LoRA that improves tool use in LLM. Thought the approach might interest folks here. The issue I have had when trying to use some of the local LLMs with coding agents is this: Me: "Find all API endpoints with authentication in this codebase" LLM: "You should look for @app.route decorators and check if they have auth middleware..." But I often want it to search the files and show me but the LLM doesn't trigger a tool use call. To fine-tune it for tool use I combined two data sources: 1. Magpie scenarios - 5000+ diverse tasks (bug hunting, refactoring, security audits) 2. Real execution - Ran these on actual repos (FastAPI, Django, React) to get authentic tool responses This ensures the model learns both breadth (many scenarios) and depth (real tool behavior). Tools We Taught: - `read_file` - Actually read file contents - `search_files` - Regex/pattern search across codebases - `find_definition` - Locate classes/functions - `analyze_imports` - Dependency tracking - `list_directory` - Explore structure - `run_tests` - Execute test suites Improvements: - Tool calling accuracy: 12% → 80% - Correct parameters: 8% → 87% - Multi-step tasks: 3% → 78% - End-to-end completion: 5% → 80% - Tools per task: 0.2 → 3.8 The LoRA really improves on intential tool call as an example consider the query: "Find ValueError in payment module" The response proceeds as follows: 1. Calls `search_files` with pattern "ValueError" 2. Gets 4 matches across 3 files 3. Calls `read_file` on each match 4. Analyzes context 5. Reports: "Found 3 ValueError instances: payment/processor.py:47 for invalid amount, payment/validator.py:23 for unsupported currency..." Resources: - Colab notebook https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_3_Enhanced_Tool_Calling_and_Code_Understanding.ipynb - Model - https://huggingface.co/codelion/Llama-3.2-1B-Instruct-tool-calling-lora - GitHub - https://github.com/codelion/ellora
reacted to MonsterMMORPG's post with ❤️ 20 days ago
Nano Banana (Gemini 2.5 Flash Image) Full Tutorial — 27 Unique Cases vs Qwen Image Edit — Free 2 Use : https://youtu.be/qPUreQxB8zQ Tutorial link : https://youtu.be/qPUreQxB8zQ Nano Banana AI image editing model was published by Google today. It is officially named the Google Gemini 2.5 Flash Image model. It is the most advanced zero-shot image editing model ever made. I have conducted a thorough, in-depth review of this model with 27 unique cases. All prompts, images used, and results are demonstrated in real-time—live in this tutorial. Moreover, I have compared each result with the state-of-the-art (SOTA) best open-source, locally available, and free-to-use Qwen Image Edit model, so we can see which model performs better at which tasks. Video Chapters 0:00 Introduction to Google's "Nano Banana" (Gemini 2.5 Flash) 0:28 Comparing Gemini vs. Qwen Image Edit Model (27 Test Cases) 1:33 Solving Gemini's Low Resolution with SUPIR Upscaling 2:28 Teaser: Upcoming Qwen Image LoRA Training Application 2:41 How to Access Gemini 2.5 Flash in Google AI Studio 2:55 Test Case 1: Text Conversion 3:31 Test Case 2: Photorealism Test (Portrait) 4:36 Test Case 3: Adding Sunglasses 5:44 Test Case 4: Adding Iron Man to a Surfer (Gemini Wins) 6:38 Test Case 5: Adding a Cat (Qwen Wins) 7:20 Test Case 6: Clothing Extraction (Gemini Fails) 8:02 Test Case 7: Character Back View (Qwen Wins on Accuracy) 9:24 Test Case 8: Photo to Anime Style (Gemini Wins on Resemblance) 10:18 Test Case 9: Changing Background to Night 11:37 Test Case 10: Outpainting a Portrait (Qwen Wins on Proportions) 13:22 Test Case 11: Adding a Lion to a Scene (Gemini Wins) 13:59 Test Cases 12 & 13: Stylization Failures (Pixel Art & Claymation) 15:44 Test Case 14: Adding a Knight's Helmet 16:47 Test Case 15: Adding Reflections (Qwen is More Accurate) 18:00 Test Case 16: Changing Day to Night (Window View) 19:33 Test Case 17: Adding a Wooden Sign 20:22 Test Case 18: Old Photo Restoration
reacted to codelion's post with ❤️ 21 days ago
I wanted to share a technique that's been working really well for recovering performance after INT4 quantization. Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses. Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47). We saw similar results on Qwen3-0.6B: Perplexity: 2.40 → 2.09 (only 5.7% degradation from FP16 baseline) Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction) Speed: 3.0x faster inference than FP16 Quality: Generates correct, optimized code solutions - Pre-trained adapter: https://huggingface.co/codelion/Qwen3-0.6B-accuracy-recovery-lora - GitHub repo: https://github.com/codelion/ellora Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization. Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!
View all activity

Organizations

None yet