Computer Vision

Tutorials for image and vision-language model workloads.

Fine-tuning a VLM
Adapt Qwen2.5-VL to occluded image classification by training a 10K-parameter adapter with multi-node DeepSpeed, automatic recovery, and live training dashboards.
Multimodal retrieval evaluation
Benchmark ColPali, SigLIP, and OCR+BM25 visual document retrieval on ViDoRe with warm GPU containers, dynamic batching, and an interactive report.