Day 36 - March 17, 2026

Daily Work Report - 2026-03-17

Summary

Implemented live-fetching of remote Hugging Face job logs and integrated it into the backend poller.
Polished documentation (root, backend, frontend) and updated .env.example.

Files changed

Docs: README.md (root, backend, frontend), .env.example
Backend: job_runner.py - added _fetch_hf_job_logs() and integrated it into _poll_hf_jobs_loop().
Project plan: updated internal todo list to track HF live-log polling.

What I implemented

HF live-log polling: _fetch_hf_job_logs() tries huggingface_hub first, then falls back to hf jobs logs <id> CLI. New lines are appended to per-job logs for incremental reads.
Docs: rewrote and cross-linked top-level, backend, and frontend READMEs; clarified .env.example placeholders and formats.
Tracking: added and completed a todo item for HF live-log polling.

How to verify locally

Ensure HF_TOKEN is set (repo .env or environment).
Start the backend (PowerShell):

cd backend
.\.venv\Scripts\Activate.ps1   # if using venv
uv run uvicorn main:app --reload --port 8000

(Optional) Start the frontend:

cd frontend
npm install
npm run dev -- --port 5173

Submit a Hugging Face job (example):

curl -X POST http://localhost:8000/api/jobs/start \
  -H "Content-Type: application/json" \
  -d '{"command":"tune","is_local":false,"flavor":"cpu-basic","timeout":"3h","script":"train.py","args":{ "n_trials":3 }}'

Check job registry and logs:

Poll job list: GET http://localhost:8000/jobs
Poll incremental logs: GET http://localhost:8000/jobs/<job_id>/logs?since=0
Tail per-job file: backend/.anana-results/logs/<job_id>.log

Observe HF logs: after submission you should see submission output (including HF job URL). The backend poller will fetch remote logs at HF_POLL_INTERVAL_SECONDS (default 30s) and append new lines to the job log.

Notes & recommendations

Requirements: huggingface_hub Python package or the hf CLI must be available, and HF_TOKEN must be valid.
Default HF_POLL_INTERVAL_SECONDS is 30s. Lower to 5–10s for near-real-time logs but be mindful of rate limits.
Option: add a dedicated log-only poller with backoff for more frequent updates.

Context & diary (concise)

You’ve transitioned from local scripts to a control-plane architecture: the FastAPI backend now orchestrates long-running training jobs and surfaces logs/metadata to the SvelteKit frontend.

Internship - Mar 17, 2026

Role: AI Engineer - SynerSense
Project: AnanaCare ML Control Plane
Hours: 8

Work summary

Migrated execution behavior into a backend control plane and added HF live-log polling.
Implemented streaming of training stdout into a live UI console.
Prepared train.py to accept external config.json inputs for reproducible runs.

Tech stack

Backend: FastAPI / Python - async background tasks, subprocess management, WebSockets
Frontend: SvelteKit / Tailwind - live console and dashboard
MLOps: Hugging Face Hub / Jobs - remote GPU orchestration and metadata
Infra: PowerShell / Docker - environment management

Learnings

Focus on job lifecycle management rather than single-run scripts.
HF Hub can be used as a lightweight experiment store to simplify deployment.

Blockers & risks

Zombie processes: child training tasks may outlive killed parent shells - plan to add psutil process-tree termination.
HF Job latency: HF Jobs API can be slow; consider optimizations for eventual consistency in the leaderboard.

If you’d like, I can extract an HF live-logs troubleshooting checklist into backend/README.md or add an explicit .env variables section.