MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System

arXiv Paper Apache 2.0 License

The MoC was fully fine-tuned on the Qwen2.5-1.5B-Instruct utilizing 20K data entries from the CRUD benchmark, which was prepared with GPT-4o. Leveraging the segmented data generated by GPT-4o, we assigned granularity labels ranging from 0 to 3 to the text, corresponding to average chunk length intervals such as (0, 120], (120, 150], (150, 180], and (180, +โˆž).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support