Vision-Language Models for Whole Slide Image Classification

Basic Information

Project Title / Domain: Vision-Language Models for Whole Slide Image Classification
Name: Shounak Das
Guide: Amit Sethi
Project Type: implementation-driven

Short Description of Project

Working on vision-language models for whole slide images, including multi-resolution pipelines, prompt tuning with CLIP/LLaVA, and improving alignment for better zero-shot performance

Whom did you work with?

PG students, The prof directly

Tools / Simulation / Software / Hardware

PyTorch, Docker, remote GPU servers for training and experimentation

Expectations from Guide

Expectations for 8th Sem & Summer

No, did an internship. Had a few meets with my guide and a PhD student

Load: 8th Sem vs 9th (Placement) Sem

In 9th sem, the prof allowed slightly less load in November after the DDP1 presentation for placements

Summer on Campus

Didn’t stay on campus, worked during internship with occasional discussions, so moderate progress

Is DDP Guide Same as SRE/RnD Guide?

Yes

How did your SRE help with your DDP?

Provided initial exposure to the problem and relevant literature about WSIs

End Deliverables

A working pipeline, experiments and analysis and a novel contribution