Enhance model card: Add metadata, paper/code links, and Transformers usage
This PR significantly enhances the model card for HRWKV7-Reka-Flash3-Preview
by:
- Adding
pipeline_tag: text-generation
to the metadata, which ensures the model appears in relevant searches on the Hugging Face Hub and enables the interactive inference widget. - Adding
library_name: transformers
to the metadata, indicating compatibility with the Hugging Face Transformers library and enabling the "Use in Transformers" widget with associated code snippets. - Adding relevant
tags
such aslinear-attention
,reka
,rwkv
,knowledge-distillation
, and specifyinglanguages: ['mul']
to reflect its multilingual nature, improving discoverability. - Introducing a prominent "Paper and Project Details" section at the top, linking directly to the Hugging Face Papers page for RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale and the main project's GitHub repository (https://github.com/recursal/RADLADS-paper).
- Including a standard
transformers
code snippet for text generation, making it easier for users to get started with the model. The originalRWKV-Infer
usage is retained for completeness. - Adding the BibTeX citation for the RADLADS paper to ensure proper attribution.
These changes collectively make the model card more informative, discoverable, and user-friendly on the Hugging Face Hub.
Thank you for pointing that out.
I'm currently coding HF compatible inference code.
Thanks, feel free to remove library_name: transformers
and the Transformers code snippet as those seem wrong for this particular checkpoint which does not seem Transformers compatible.
I apologize for any misunderstanding.
This model is based on the RADLADS distillation method,
but the training code and model architecture are different.
RADLADS1: Modified RWKV v6 (Gated Linear Attention kernel)
My: Modified RWKV v7(RWKV kernel) + No Position Embedding GQA Hybrid
Please feel free to point out any issues. :)