Why does it generates arafed so much ?
I have the same problem... it also has incorrect grammar... An image of chunks of ice in water in the sea got:
"arafed ice floess are floating in the water near the shore"
I have no clue what 'arafed' or 'floess' are..
googling 'arafed' led me to this dataset: https://huggingface.co/datasets/multimodalart/facesyntheticsspigacaptioned
Which I am assuming they might have used to train on for some weird reason?
Yeah, I believe it's primarily because of that dataset used which has a lot of "arafed" in it for whatever reason. A just made a little function to remove that word as normally it could be removed and the caption was still grammatically correct since "Arafed" is basically just prepended to the caption. You could also finetune it as well which would probably help
The dataset linked by Sof22 uses BLIP-generated captions, so I doubt that it was used for blib itself ;)
I came here looking for an answer to this so thank you all.
I keep seeing 'arafed' and 'araffe'. Contextually, in my testing at least, it seems these words should simply be the letter 'a'.
I've got a script that strips these words out and replaces them with 'a' and I no longer have any issues. I thought it was nonsense words but if you say it's Arabic/Urdu then I will believe you!
Thanks again.
Thanks chrispie I'll give that a watch.