Work Report - 18 February 2026

1. Objective

Progress toward deploying the trained CLIP + DNN model using Hugging Face infrastructure (Space / Job / Endpoint), and resolve inference-time model loading issues.


2. Key Activities Completed

A. Analyzed Embedding Pipeline

  • Verified embedding shape: (2864, 1, 1024)
  • Confirmed backbone: timm/vit_large_patch14_clip_quickgelu_336.openai
  • Identified that embeddings are 1024-dimensional transformer features.
  • Reviewed dataset loading logic for:
    • Consolidated clip.npy
    • clip_metadata.json mapping
    • Label extraction and filtering logic

Outcome:

  • Clear understanding of how embeddings were generated and consumed during training.

B. Designed Initial Inference Architecture

Created initial inference structure for:

  • Image → CLIP (timm) → 1024-d embedding
  • Embedding + input_biases → DNN
  • DNN → multi-label predictions

Defined:

  • Project structure for Hugging Face Space
  • requirements.txt
  • app.py structure
  • Production-style wrapper class

Outcome:

  • A working deployment blueprint ready for HF Space.

C. Investigated state_dict Loading Error

Encountered runtime error:

Missing key(s) in state_dict
Unexpected key(s) in state_dict

Root Cause Identified:

  • The weights file (dnnV1_model_best.pth) was trained using a different architecture (likely MultiLabelRegressionNN)
  • Current inference app used a simplified CustomDNN
  • Architecture mismatch caused parameter key inconsistency

Conclusion:

  • Inference architecture must exactly match training architecture.
  • Cannot reuse weights with a different model definition.

D. Clarified Hugging Face Infrastructure Options

Reviewed Hugging Face:

  • Jobs (batch compute, offline)
  • Spaces (interactive app)
  • Inference Endpoints (production API)

Decision:

  • Start with HF Space (Gradio) for rapid validation.
  • Move to Inference Endpoint after architecture alignment.

3. Technical Decisions Made Today

  • Use timm CLIP backbone directly (avoid HF CLIPModel mismatch risk).
  • Do not use strict=False for weight loading.
  • Align inference architecture exactly with training architecture.
  • Avoid deploying until embedding parity is confirmed.

4. Key Learnings

  • PyTorch weight loading requires exact architectural parity
  • Even minor architecture changes break state_dict compatibility
  • Embedding extraction method (e.g., forward vs forward_features) can cause silent inference drift
  • Deployment should only proceed after confirming full training–inference parity

5. Risks Identified & Mitigation

Risks:

  • Hidden normalization or preprocessing steps in training pipeline
  • Bias embedding layers may not be fully replicated
  • Silent performance degradation if embedding pipeline differs
  • Embedding extraction logic may differ from training (forward vs forward_features)
  • Hidden normalization step may exist in original training pipeline
  • Bias embedding layers in trained model must be replicated exactly
  • Silent drift possible if architecture is modified

Mitigation:

  • Reconstruct the exact MultiLabelRegressionNN training architecture in the inference application before redeploying
  • Extract and review MultiLabelRegressionNN definition
  • Port full architecture into HF Space
  • Ensure successful weight loading
  • Validate prediction consistency against training outputs
  • Benchmark inference speed (CPU vs GPU)

6. Next Steps (Planned)

  1. Extract and review MultiLabelRegressionNN definition
  2. Port full architecture into HF Space
  3. Ensure successful weight loading
  4. Validate prediction consistency against training outputs
  5. Benchmark inference speed (CPU vs GPU)

7. Overall Progress Status

✔ Architecture clarified
✔ Deployment path defined
✔ Root cause of model loading issue identified
⏳ Final alignment of training and inference architecture pending


This site uses Just the Docs, a documentation theme for Jekyll.