Work Report - 18 February 2026
1. Objective
Progress toward deploying the trained CLIP + DNN model using Hugging Face infrastructure (Space / Job / Endpoint), and resolve inference-time model loading issues.
2. Key Activities Completed
A. Analyzed Embedding Pipeline
- Verified embedding shape:
(2864, 1, 1024) - Confirmed backbone:
timm/vit_large_patch14_clip_quickgelu_336.openai - Identified that embeddings are 1024-dimensional transformer features.
- Reviewed dataset loading logic for:
- Consolidated
clip.npy clip_metadata.jsonmapping- Label extraction and filtering logic
- Consolidated
Outcome:
- Clear understanding of how embeddings were generated and consumed during training.
B. Designed Initial Inference Architecture
Created initial inference structure for:
- Image → CLIP (timm) → 1024-d embedding
- Embedding + input_biases → DNN
- DNN → multi-label predictions
Defined:
- Project structure for Hugging Face Space
requirements.txtapp.pystructure- Production-style wrapper class
Outcome:
- A working deployment blueprint ready for HF Space.
C. Investigated state_dict Loading Error
Encountered runtime error:
Missing key(s) in state_dict
Unexpected key(s) in state_dict
Root Cause Identified:
- The weights file (
dnnV1_model_best.pth) was trained using a different architecture (likelyMultiLabelRegressionNN) - Current inference app used a simplified
CustomDNN - Architecture mismatch caused parameter key inconsistency
Conclusion:
- Inference architecture must exactly match training architecture.
- Cannot reuse weights with a different model definition.
D. Clarified Hugging Face Infrastructure Options
Reviewed Hugging Face:
- Jobs (batch compute, offline)
- Spaces (interactive app)
- Inference Endpoints (production API)
Decision:
- Start with HF Space (Gradio) for rapid validation.
- Move to Inference Endpoint after architecture alignment.
3. Technical Decisions Made Today
- Use
timmCLIP backbone directly (avoid HF CLIPModel mismatch risk). - Do not use
strict=Falsefor weight loading. - Align inference architecture exactly with training architecture.
- Avoid deploying until embedding parity is confirmed.
4. Key Learnings
- PyTorch weight loading requires exact architectural parity
- Even minor architecture changes break
state_dictcompatibility - Embedding extraction method (e.g.,
forwardvsforward_features) can cause silent inference drift - Deployment should only proceed after confirming full training–inference parity
5. Risks Identified & Mitigation
Risks:
- Hidden normalization or preprocessing steps in training pipeline
- Bias embedding layers may not be fully replicated
- Silent performance degradation if embedding pipeline differs
- Embedding extraction logic may differ from training (forward vs forward_features)
- Hidden normalization step may exist in original training pipeline
- Bias embedding layers in trained model must be replicated exactly
- Silent drift possible if architecture is modified
Mitigation:
- Reconstruct the exact
MultiLabelRegressionNNtraining architecture in the inference application before redeploying - Extract and review
MultiLabelRegressionNNdefinition - Port full architecture into HF Space
- Ensure successful weight loading
- Validate prediction consistency against training outputs
- Benchmark inference speed (CPU vs GPU)
6. Next Steps (Planned)
- Extract and review
MultiLabelRegressionNNdefinition - Port full architecture into HF Space
- Ensure successful weight loading
- Validate prediction consistency against training outputs
- Benchmark inference speed (CPU vs GPU)
7. Overall Progress Status
✔ Architecture clarified
✔ Deployment path defined
✔ Root cause of model loading issue identified
⏳ Final alignment of training and inference architecture pending