Basic Information

  • Project Title: Generalizable Data Subset Selection
  • Name: Tushar Nandy
  • Project: BTP
  • Semester(s): 7, 8
  • Guide: Prof. Abir De

Abstract

Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches, which lack generalizability— for each new model, the algorithm has to be executed from the beginning. Therefore, for an unseen architecture, one cannot use the subset chosen for a different model. In this work, we propose SubSelNet, a non-adaptive subset selection framework, which tackles these problems. Here, we first introduce an attention-based neural gadget that leverages the graph structure of architectures and acts as a surrogate to trained deep neural networks for quick model prediction. Then, we use these predictions to build subset samplers. This naturally provides us two variants of SubSelNet. The first variant is transductive (called Transductive-SubSelNet), which computes the subset separately for each model by solving a small optimization problem. Such an optimization is still super fast, thanks to the replacement of explicit model training by the model approximator. The second variant is inductive (called Inductive-SubSelNet), which computes the subset using a trained subset selector, without any optimization. Our experiments show that our model outperforms several methods across several real datasets.

Any courses you completed relevant to the project

None

Describe your experience on the project

Motivation: I initially took a different project under Prof. Abir De, to begin with, but then he onboarded me onto this project. The idea we were working on was quite novel and I saw myself learning a lot through the project.

Type of work: 95% coding + 5% Paper reading. Prof. Abid De wants students to quickly implement ideas and get results as soon as possible

Workload: Absolutely insane. I don’t remember how much time I spent doing the project, but I can’t even remember doing anything other than the project.

Publication potential: Prof. De stes only one goal for every project he works on: publication. My paper was submitted to NeurIPS. Results awaited.

Describe your experience with the guide

Guide’s Involvement: 100% Ping him anytime and he’ll respond at the earliest.

Frequency and structure: Twice a week. Meetings last a maximum of 30 minutes.