MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System
The MoC was fully fine-tuned on the Qwen2.5-1.5B-Instruct utilizing 20K data entries from the CRUD benchmark, which was prepared with GPT-4o. Leveraging the segmented data generated by GPT-4o, we assigned granularity labels ranging from 0 to 3 to the text, corresponding to average chunk length intervals such as (0, 120], (120, 150], (150, 180], and (180, +โ).
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support