WorldRWKV
/

RWKV7-3B-siglip2

Model card Files Files and versions

SupYumm commited on Mar 4

Commit

4c45e80

·

verified ·

1 Parent(s): b578bf6

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+** Model Detail
+* Model type:
+RWKV7 SigLIP2 is an opensource chatbot trained using RWKV7 architecture and SigLIP2 Encoder.
+* Model date: Feb,2025
+* Paper or resources for more information: https://github.com/JL-er/WorldRWKV
+* Where to send questions or comments about the model: https://github.com/JL-er/WorldRWKV/issues
+** Training datasets:
+* Pretrain: LLaVA 595k
+* Fine-tune: LLaVA 665k
+** Evaluation dataset
+Currently, we tested RWKV7 SigLIP2 on  4 benchmarks proposed for instruction-following LMMs. More benchmarks will be released soon.
+* Benchmarks
+* | **Encoder** | **LLM** | **VQAV2** | **TextVQA** | **GQA** | **ScienceQA** |
+|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|
+| [**SigLIP2**](https://huggingface.co/google/siglip2-base-patch16-384) | RWKV7-3B     |    78.30     | 51.09       | 60.75       | 70.93       |
+* Inference
+* ```
+  from infer.worldmodel import Worldinfer
+  from PIL import Image
+  llm_path='WorldRWKV/RWKV7-3B-siglip2/rwkv-0' #Local model path
+  encoder_path='google/siglip2-base-patch16-384'
+  encoder_type='siglip'
+  model = Worldinfer(model_path=llm_path, encoder_type=encoder_type, encoder_path=encoder_path)
+  img_path = './docs/03-Confusing-Pictures.jpg'
+  image = Image.open(img_path).convert('RGB')
+  text = '\x16User: What is unusual about this image?\x17Assistant:'
+  result = model.generate(text, image)
+  print(result)
+  ```