arxiv:2412.02611
Shijia Yang
shijiay
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 1 month ago
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
Audio-Visual Information?
authored
a paper
about 1 month ago
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
Audio-Visual Information?
commented on
a paper
4 months ago
Law of Vision Representation in MLLMs
Organizations
None yet
models
27
shijiay/llava_clip224_stage1
Image-Text-to-Text
•
Updated
•
20
shijiay/llava_clip224_stage2
Image-Text-to-Text
•
Updated
•
45
shijiay/llava_dinov2_stage2
Image-Text-to-Text
•
Updated
•
23
•
1
shijiay/llava_clip_stage1
Image-Text-to-Text
•
Updated
•
16
shijiay/llava_clip_stage2
Image-Text-to-Text
•
Updated
•
45
shijiay/llava_openclip_stage1
Image-Text-to-Text
•
Updated
•
10
shijiay/llava_openclip_stage2
Image-Text-to-Text
•
Updated
•
12
shijiay/llava_siglip_stage1
Image-Text-to-Text
•
Updated
•
17
shijiay/llava_siglip_stage2
Image-Text-to-Text
•
Updated
•
19
shijiay/llava_sdim_stage1
Image-Text-to-Text
•
Updated
•
8
datasets
None public yet