Daily Work Report - 2026-03-17
Summary
- Implemented live-fetching of remote Hugging Face job logs and integrated it into the backend poller.
- Polished documentation (root, backend, frontend) and updated
.env.example.
Files changed
- Docs:
README.md(root, backend, frontend),.env.example - Backend:
job_runner.py- added_fetch_hf_job_logs()and integrated it into_poll_hf_jobs_loop(). - Project plan: updated internal todo list to track HF live-log polling.
What I implemented
- HF live-log polling:
_fetch_hf_job_logs()trieshuggingface_hubfirst, then falls back tohf jobs logs <id>CLI. New lines are appended to per-job logs for incremental reads. - Docs: rewrote and cross-linked top-level, backend, and frontend READMEs; clarified
.env.exampleplaceholders and formats. - Tracking: added and completed a todo item for HF live-log polling.
How to verify locally
-
Ensure
HF_TOKENis set (repo.envor environment). -
Start the backend (PowerShell):
cd backend
.\.venv\Scripts\Activate.ps1 # if using venv
uv run uvicorn main:app --reload --port 8000
- (Optional) Start the frontend:
cd frontend
npm install
npm run dev -- --port 5173
- Submit a Hugging Face job (example):
curl -X POST http://localhost:8000/api/jobs/start \
-H "Content-Type: application/json" \
-d '{"command":"tune","is_local":false,"flavor":"cpu-basic","timeout":"3h","script":"train.py","args":{ "n_trials":3 }}'
- Check job registry and logs:
- Poll job list:
GET http://localhost:8000/jobs - Poll incremental logs:
GET http://localhost:8000/jobs/<job_id>/logs?since=0 - Tail per-job file:
backend/.anana-results/logs/<job_id>.log
- Observe HF logs: after submission you should see submission output (including HF job URL). The backend poller will fetch remote logs at
HF_POLL_INTERVAL_SECONDS(default 30s) and append new lines to the job log.
Notes & recommendations
- Requirements:
huggingface_hubPython package or thehfCLI must be available, andHF_TOKENmust be valid. - Default
HF_POLL_INTERVAL_SECONDSis 30s. Lower to 5–10s for near-real-time logs but be mindful of rate limits. - Option: add a dedicated log-only poller with backoff for more frequent updates.
Context & diary (concise)
You’ve transitioned from local scripts to a control-plane architecture: the FastAPI backend now orchestrates long-running training jobs and surfaces logs/metadata to the SvelteKit frontend.
Internship - Mar 17, 2026
- Role: AI Engineer - SynerSense
- Project: AnanaCare ML Control Plane
- Hours: 8
Work summary
- Migrated execution behavior into a backend control plane and added HF live-log polling.
- Implemented streaming of training
stdoutinto a live UI console. - Prepared
train.pyto accept externalconfig.jsoninputs for reproducible runs.
Tech stack
- Backend: FastAPI / Python - async background tasks, subprocess management, WebSockets
- Frontend: SvelteKit / Tailwind - live console and dashboard
- MLOps: Hugging Face Hub / Jobs - remote GPU orchestration and metadata
- Infra: PowerShell / Docker - environment management
Learnings
- Focus on job lifecycle management rather than single-run scripts.
- HF Hub can be used as a lightweight experiment store to simplify deployment.
Blockers & risks
- Zombie processes: child training tasks may outlive killed parent shells - plan to add
psutilprocess-tree termination. - HF Job latency: HF Jobs API can be slow; consider optimizations for eventual consistency in the leaderboard.
If you’d like, I can extract an HF live-logs troubleshooting checklist into backend/README.md or add an explicit .env variables section.