Week 07 – Embedding Paper Techniques into Models

Dates: 2025-07-15 – 2025-07-21 Internship: AI/ML Intern at SynerSense Pvt. Ltd. Mentor: Praveen Kulkarni Sir


Focus

This week was dedicated to applying the learnings from research papers into our internal pipeline. We experimented with model slicing, feature transfer, and designing modular networks.


Goals for the Week

  • Adapt pretrained CLIP visual encoder with custom classifier
  • Test inference pipeline with sample dataset
  • Build scripts to visualize intermediate embeddings
  • Clean up the learnings section with proper structure

Tasks Completed

Task Status Notes
Integrated CLIP vision encoder with MLP head ✅ Completed Used frozen encoder; trained classifier from scratch
Evaluated classification pipeline on sample images ✅ Completed Achieved stable training with low loss (~0.87)
Documented working code for VLM to MLP architecture ✅ Completed Recorded notebook in private repo
Added two research summaries to Learnings section ✅ Completed Each paper has callouts, code examples, and reflection
NDA compliance maintained throughout documentation ✅ Completed Only shared allowed portions publicly

Key Learnings

  • Pretrained vision encoders can significantly boost convergence
  • Modular design helps in attaching heads, slicing blocks flexibly
  • Accurate logs and markdown docs improve reproducibility
  • Reviewing loss trends helps in identifying class imbalance

Problems Faced & Solutions

Problem Solution
Feature mismatch from encoder Used .pooler_output and reshaped inputs
Classifier not converging initially Tuned learning rate; applied CrossEntropyLoss
Keeping learnings under NDA Used tags and disclaimers in markdown docs

📎 References


Goals for Next Week

  • Finalize slicing strategy for full VLM to vision module
  • Extend MLP head with dropout, batchnorm
  • Prepare a sharable demo (NDA-compliant)

“Week 7 was all about bridging theory and practice — papers to working code. Each experiment took us closer to a modular, trainable, and explainable VLM classifier stack.”