--- license: apache-2.0 --- ** Model Detail * Model type: RWKV7 SigLIP2 is an opensource chatbot trained using RWKV7 architecture and SigLIP2 Encoder. * Model date: Feb,2025 * Paper or resources for more information: https://github.com/JL-er/WorldRWKV * Where to send questions or comments about the model: https://github.com/JL-er/WorldRWKV/issues ** Training datasets: * Pretrain: LLaVA 595k * Fine-tune: LLaVA 665k ** Evaluation dataset Currently, we tested RWKV7 SigLIP2 on 4 benchmarks proposed for instruction-following LMMs. More benchmarks will be released soon. * Benchmarks * | **Encoder** | **LLM** | **VQAV2** | **TextVQA** | **GQA** | **ScienceQA** | |:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:| | [**SigLIP2**](https://huggingface.co/google/siglip2-base-patch16-384) | RWKV7-3B | 78.30 | 51.09 | 60.75 | 70.93 | * Inference * ``` from infer.worldmodel import Worldinfer from PIL import Image llm_path='WorldRWKV/RWKV7-3B-siglip2/rwkv-0' #Local model path encoder_path='google/siglip2-base-patch16-384' encoder_type='siglip' model = Worldinfer(model_path=llm_path, encoder_type=encoder_type, encoder_path=encoder_path) img_path = './docs/03-Confusing-Pictures.jpg' image = Image.open(img_path).convert('RGB') text = '\x16User: What is unusual about this image?\x17Assistant:' result = model.generate(text, image) print(result) ```