Internship Diary Entry: Mar 24, 2026

Role: AI Engineer — SynerSense Project: AnanaCare Training Control Plane (Job Orchestration Phase) Hours Worked: 8


Work Summary

Today’s work focused on building a dedicated Job Control Sub-App, turning the platform into a practical command center for managing machine learning workflows from the frontend. The main goal was to connect the SvelteKit UI directly with backend execution logic and make job operations observable in real time.


Key Work Done

1. Remote Job Execution Pipeline

  • Implemented backend request handling to trigger jobs from the UI using structured payloads (CLI args like trials, quick mode, hardware flavor).
  • Ensured seamless flow from frontend form → API → backend execution (train.py via subprocess).

2. Root-Level Path Integration

  • Configured backend to correctly operate relative to the project root instead of the API subdirectory.
  • Ensured all reads/writes (logs, results, configs) correctly target .anana-results/.

3. Real-Time Log Streaming (WebSockets)

  • Built a live log pipeline using WebSockets (/ws/logs/{job_id}).
  • Captured both stdout and stderr and streamed them directly to the frontend console.
  • Enabled near real-time visibility into training progress.

4. Process Lifecycle Management

  • Improved job termination logic using psutil.
  • Ensured full cleanup of parent + child processes to prevent zombie training jobs.
  • Handled edge cases where shell exits but Python processes continue.

5. Persistent Logging System

  • Implemented file-based log storage under .anana-results/logs/.
  • Enabled:

    • Log replay after refresh
    • Late session join visibility
    • Debug trace retention independent of UI

System Behavior (Current State)

  • Jobs can be:

    • Started from UI
    • Monitored live (logs streaming)
    • Stopped safely (full process cleanup)
  • Logs are:

    • Streamed in real time
    • Persisted to disk
    • Re-readable after reload
  • Backend correctly:

    • Executes in root context
    • Tracks job states (queued, running, completed, failed)

Technical Stack Used

Area Technology What was applied
Backend FastAPI, subprocess, psutil Job execution + lifecycle management
Real-time WebSockets Live log streaming
Frontend SvelteKit Reactive job UI + console
File System pathlib Cross-directory path handling

Key Learnings

  • Managing subprocesses in a web server requires explicit control of process trees, not just parent processes.
  • WebSockets need defensive handling (disconnects, retries) to avoid backend instability.
  • Treating filesystem (.anana-results) as a source of truth simplifies architecture but requires strict path discipline.
  • Real-time visibility (logs) significantly improves debugging speed compared to batch outputs.

Challenges / Risks

  • WebSocket reliability: abrupt client disconnects can break streams if not handled properly.
  • Process leaks risk: improper termination can still leave orphan processes in edge cases.
  • Concurrency: multiple jobs running simultaneously may need resource throttling later.

Next Steps

  1. Add log filtering (INFO / DEBUG / ERROR) in the UI for better readability.
  2. Implement smart auto-scroll in log console (stick to bottom unless user scrolls up).
  3. Introduce job queueing or limits to manage concurrent executions.
  4. Optional: add SSE fallback in case WebSockets fail in some environments.

Overall Progress

This was a significant step forward: the system is no longer just running jobs — it can now control, observe, and manage them interactively, which is the core of any production-grade MLOps platform.


This site uses Just the Docs, a documentation theme for Jekyll.