Day 37 - March 18, 2026

Internship diary

  • Role: AI Engineer - SynerSense
  • Project: AnanaCare ML Control Plane (Phase 4: Promotion & Optimization)
  • Hours: 8

Executive summary

  • Implemented production-grade Model Promotion Workflow.
  • Improved training log observability and metrics extraction.
  • Hardened process management with psutil and improved local/HF results sync.

Work summary

Backend: Model promotion

  • Added endpoint: POST /api/promote/{trial_id}.
  • Flow:
    1. Reads best hyperparameters from trial_*.json.
    2. Updates consolidated training config.
    3. Triggers final training run with prediction enabled.
  • Write path: .anana-results/train/ (structured, separate from tuning outputs).

Backend: Process management

  • Integrated psutil to manage process trees and cleanup terminated jobs.
  • Ensures child processes are terminated when a job is stopped from the UI.

Frontend: log monitoring and metric UX

  • Added structured parsing layer to log viewer (beyond raw text).
  • Extracted metrics: validation loss, epoch progression, learning rate.
  • Real-time visualization via lightweight charts.

Sync improvements and consistency

  • Refined local/Hugging Face result sync logic.
  • Added manual sync to reconcile split-run environments.
  • Ensures leaderboard reflects latest trials across local + cloud runs.

Impact: completed the loop from experiment run → monitoring → promotion into reproducible production model runs.


Technical stack & skills used

Component Technology Applied skill
Backend FastAPI API design, workflow automation, process management
Process control psutil process tree handling, resource cleanup
Frontend SvelteKit real-time UI updates, log visualization
MLOps Hugging Face model tracking, remote sync, experiment management

Learnings & outcomes

  • Designed a model promotion pipeline from experimentation to reproducible production outputs.
  • Validated the importance of process lifecycle management for long-running tasks.
  • Converted streaming logs into actionable insights with structured parsing.
  • Strengthened consistency across distributed environments (local + cloud).

Blockers & risks

  • Potential race condition if multiple promotions run during sync; UI lock in place, backend-level lock advised.
  • Log parsing depends on consistent output format; standardization needed for reliable metric extraction.

Next steps

  1. Dockerize the full backend + frontend stack for production readiness.
  2. Document the promotion workflow in user-facing docs.
  3. Add backend safeguards for concurrent promotion conflict prevention.

This site uses Just the Docs, a documentation theme for Jekyll.