bootstrapping the captions
the noisy ones
Vision-Language Pre-training (VLP)
Bootstrapping Language-Image Pre-training
a suboptimal source of supervision.
both vision-language understanding and generation tasks.
a wide range of
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.