Basic Information
- Course Code: DH 602
- Course Name: Machine Learning and Statistical Methods in Healthcare
- Course Offered In: 2023-2024
- Semester Season: Spring
- Instructors: Prof. Kshitij Jadhav
- Prerequisites: DH302 or some basic course in applied Machine Learning is required. There’s no official hard prerequisite though. It really helps if you have a hands-on experience with practical ML. Good understanding of basic theoretical ML concepts is required for midsem and endsem.
- Difficulty (1 being easy and 5 being tough): 4
Course Content
Biostatistics Medical Data Acquisition and Pre-processing Medical Data Efficient Machine Learning Feature Extraction Whole slide imaging: Breast cancer detection. Image processing for Brain tumor detection, Harling long tailed problems in Healthcare Sickle Cell Disease Management: A Machine Learning Approach Detection of Pulmonary Diseases, Longitudinal predictive analysis Dialysis patient data Federated Learning in Healthcare, Large Language models in Healthcare Normative aging Model using MRI data Optical Character Recognition for data extraction from medical records, Multimodal data analysis Utilizing Autoencoders in Healthcare, Generative Al in Healthcare
All of these topics are covered through discussion of research papers during the classes. The biology content is very less - just enough to understand how to apply ML in the particular domain. Apart from that, each group (5 to 7 students) creates a presentation based on their project, and that is also included in the course content for evaluations.
Feedback on Lectures
In the first few classes, professor introduces the course and what it entails. He begins with explaining some relevant research papers. In the meantime, students are expected to make groups of 5-7 and choose a topic for the project. The project is supposed to be novel and should be aimed towards publishing a research paper. The students can approach the professor and seek guidance to choose a good project topic.
Once students choose the project topic (after around 2 weeks), the following format was followed: 2 lectures in a week - 1.5 hours each.
Wednesday class: Prof would pick a research paper and explain all the biological and ML concepts involved in it. Then, he would discuss the entire paper in detail, the results from it, and the implications it holds. The lectures were intended to be interactive. Professor explained using slides, but also talked a lot about what was not written in the slides, so taking notes would have helped. He doesn’t go deep into mathematical proofs - it’s enough to know what each term in a loss function “represents”.
Friday class: 3 teams presented their progress on the project. Rest of the students were expected to ask questions.
Feedback on Evaluations
Midsem: 20% Endsem: 30% Project: 50%
The midsem and endsem were MCQs with a single correct answer for each question. Questions were asked based on theory covered in the class as well as based on concepts used by all the teams(other teams as well as own team) in their projects. The project related questions were mostly general ML questions regarding some keywords mentioned in the presentation slides made by the students. Some questions were also based on some general ML concepts that may or may not be covered in the classes. There were around 80 to 100 questions, and some questions felt pretty ambiguous. Going through 100 questions and raising cribs one week after the endsems would require a lot of effort.
The project is supposed to be novel and should be an honest effort towards solving a real problem. Total three presentations were taken throughout the semester for each team, and the professor and the TAs provided feedback during the presentation. There is no particular format/rubrics for grading the project, and an honest effort towards the project is expected, regardless of whether the novel idea brings about significant results or not.
Study Material and Resources
Slides are provided corresponding to each Wednesday class, and they contain all the necessary keywords and concepts covered in the class. However, taking notes might provide a good flow to the concepts. Slides from presentations of other teams is also required to be studied.
I’m not sure how relevant these might be, but these were official references provided: Singh, B.K., & Sinha, G.R. (2022). Machine Learning in Healthcare: Fundamentals and recent applications of Machine Learning Hartzband. D. (2019). Information Technology and Data in Healthcare: Using and Understanding Data Natarajan, P Frenzel, J. C., & Smaltz, D. H. (2023). Demystifying Big Data and Machine Learning or Healthcare Kumar N, Gupta R, Gupta S. Whole Slide Imaging (WSI) in Pathology Current Perspectives and Future Directions
Follow-up Courses
There is no particular follow-up course to this, but you can check out this website for related courses: https://www.kcdh.iitb.ac.in/interdisciplinary-dual-degree-program
Final Takeaway
Take this course only if you have a decent experience in practical ML (Used some data analysis techniques, trained different kinds of models). You will learn a lot about the challenges in the application of different ML methods in the medical domain. You will read some interesting research papers and get a good understanding of the field and where things are at currently. You can continue the work on your project in the summer to lead it towards publishing a research paper.