Multimodal Finetuning with CLIP