---
license: apache-2.0
---
** Model Detail

* Model type:
RWKV7 SigLIP2 is an opensource chatbot trained using RWKV7 architecture and SigLIP2 Encoder.

* Model date: Feb,2025

* Paper or resources for more information: https://github.com/JL-er/WorldRWKV

* Where to send questions or comments about the model: https://github.com/JL-er/WorldRWKV/issues

** Training datasets:
* Pretrain: LLaVA 595k
* Fine-tune: LLaVA 665k


** Evaluation dataset

Currently, we tested RWKV7 SigLIP2 on  4 benchmarks proposed for instruction-following LMMs. More benchmarks will be released soon.

* Benchmarks
* | **Encoder** | **LLM** | **VQAV2** | **TextVQA** | **GQA** | **ScienceQA** |
|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|
| [**SigLIP2**](https://huggingface.co/google/siglip2-base-patch16-384) | RWKV7-3B     |    78.30     | 51.09       | 60.75       | 70.93       |


* Inference
  
* ```
  from infer.worldmodel import Worldinfer
  from PIL import Image
  
  
  llm_path='WorldRWKV/RWKV7-3B-siglip2/rwkv-0' #Local model path
  encoder_path='google/siglip2-base-patch16-384'
  encoder_type='siglip'
  
  model = Worldinfer(model_path=llm_path, encoder_type=encoder_type, encoder_path=encoder_path)
  
  img_path = './docs/03-Confusing-Pictures.jpg'
  image = Image.open(img_path).convert('RGB')
  
  text = '\x16User: What is unusual about this image?\x17Assistant:'
  
  result = model.generate(text, image)
  
  print(result)
  ```