Daily Work Report 2026-03-16

Summary

Implemented an MVP Training Control Plane (backend + Svelte dashboard): job lifecycle APIs (start/list/status/logs/stop), an in-memory job registry with local-run log streaming, and a connected frontend. Added README and completed smoke validation.

Highlights

  • Backend: FastAPI lifecycle endpoints, background job runner, live log capture.
  • Frontend: SvelteKit dashboard with job submission, live logs, and stop control.
  • Validation: Frontend type checks (svelte-check) and backend syntax/run smoke tests.

Completed (changes made)

Backend

  • main.py - FastAPI endpoints: POST /jobs/start, GET /jobs, GET /jobs/{id}, GET /jobs/{id}/logs, POST /jobs/{id}/stop, GET /trials, GET /health.
  • job_runner.py - in-memory registry, local background runs, live log capture, duplicate protection, stop/list/log helpers.

Frontend

  • +page.svelte - job start form, jobs list, live log polling, stop button, trials view.
  • +page.ts - dynamic prerender disabled for runtime API access.

Docs

  • README.md - quick start and run instructions.

Tests & Validation

  • Installed frontend deps and ran svelte-check (0 errors; addressed minor warnings).
  • Compiled backend Python files - no syntax errors.
  • Exercised API endpoints manually:
    • GET /health → 200 OK
    • GET /trials → 200 OK (no trials yet)
    • POST /jobs/start (local run) → returned queued job id; background thread executed run.ps1 locally
    • GET /jobs/{id} → transitioned to completed after run; return code captured
    • GET /jobs/{id}/logs → returned captured output lines
    • POST /jobs/{id}/stop → returned appropriate conflict when job not stoppable

Current limitations

  • Job registry is in-memory (no persistence); restart clears history.
  • Remote Hugging Face Jobs: only submission-output capture exists; no polling or remote log streaming yet.
  • train.py must write trial_*.json into .anana-results/anana_v3/tune/ for trials to appear; UI-driven hyperparam config requires train.py to accept a config file.
  1. Add Hugging Face Jobs polling & remote log streaming (HF Jobs SDK).
  2. Replace polling with SSE/WebSocket for live logs.
  3. Persist job metadata (SQLite/Postgres) so history survives restarts.
  4. Add a --config/config.json input for train.py so the UI can set search spaces.

How to run locally (quick)

Backend

cd backend
uvicorn main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev -- --port 5173

Health check (PowerShell)

Invoke-RestMethod http://127.0.0.1:8000/health

Internship Diary

Role: AI Engineer - SynerSense
Date: 16 Mar 2026
Hours: 8

Work summary

Built the first MVP of a Training Control Plane to manage ML job lifecycles from a web dashboard. Implemented backend APIs and a job runner with live log capture, connected a SvelteKit dashboard, and validated functionality end-to-end.

Learnings & Blockers

  • Learned patterns for background job execution and log streaming in FastAPI.
  • Blocker: job registry persistence and remote-job polling still needed for production-grade monitoring.

References

  • FastAPI - https://fastapi.tiangolo.com
  • SvelteKit - https://kit.svelte.dev/docs
  • Uvicorn - https://www.uvicorn.org
  • Node.js - https://nodejs.org

This site uses Just the Docs, a documentation theme for Jekyll.