timm==0.4.12 transformers>=4.25.1 fairscale==0.4.4 pycocoevalcap torch torchvision Pillow scipy clip @ git+https://github.com/openai/CLIP.git