Day 19 - Version Control System for Model Training

Summary: Today marks a major step in evolving our platform from a simple “single-run” workflow to a robust, version-controlled system for model training and evaluation. Each training cycle now lives in its own folder (e.g., anana_v1, anana_v2, …), and a central versions.yaml manifest tracks all versions and the currently active one. This enables reproducibility, easy rollback, and side-by-side comparison of model results.


Goals for Today

  • Implement a VersionManager utility for managing model versions.
  • Update the QualityApp to allow users to select and compare different model versions.
  • Update the TrainApp to automate version creation and registration.
  • Establish a clear Backup & Relabel Loop for safe, iterative model improvement.

1. VersionManager Utility (versions.yaml Manager)

To enable version control, we need a utility class that manages the versions.yaml manifest. This manifest records all model versions, their metadata, and which version is currently active. The utility will be placed in the utils directory for shared access by both the Training and Quality apps.

Key Features:

  • Ensures versions.yaml exists and is initialized.
  • Loads and saves version data.
  • Registers new versions with timestamp and description.
  • Tracks the active version.

How it works:

The VersionManager utility is a Python class designed to manage the versions.yaml manifest file. Its main responsibilities are:

  • Ensuring that the versions.yaml file exists and is properly initialized when the system starts.
  • Loading and saving version information, including all past versions and the currently active version.
  • Registering new model versions by adding their metadata (such as version ID, timestamp, and description) to the manifest and updating which version is active.
  • Providing a list of all available version IDs for use in the UI or other logic.

This utility is placed in the utils directory so it can be shared by both the Training and Quality applications, ensuring consistent version tracking across the platform.


2. QualityApp: Version Selection & Comparison

The QualityApp now features a dropdown menu for “time travel” between model versions. This allows users to select any version and instantly view its results, enabling side-by-side comparison and historical analysis.

How it works:

  • The dropdown is populated with all available version IDs from versions.yaml.
  • When a version is selected, the app loads data from the corresponding folder (e.g., anana-results/anana_v2/train/).
  • The active version is selected by default.

Sample UI Component:

version_select = gr.Dropdown(
    label="Select Model Version",
    choices=vm.get_all_version_ids(),
    value=vm.get_active_version()
)

Implementation Notes:

  • The load_data function should use the selected version to dynamically set the data path.
  • This enables instant switching between model results for comparison and validation.

3. TrainApp: Automated Versioning Workflow

The TrainApp is responsible for creating new version folders and updating the manifest automatically during each training cycle.

Workflow Steps:

  1. Start Training: User initiates a new training run.
  2. Determine Next Version: The app checks existing versions and generates the next version ID (e.g., anana_v3).
  3. Save Results: Training outputs (actual.csv, predicted.csv, etc.) are saved in the new version’s folder (anana-results/anana_v3/train/).
  4. Register Version: The VersionManager registers the new version as the active one, updating versions.yaml.

Benefits:

  • Ensures all training runs are tracked and reproducible.
  • Prevents accidental overwrites.
  • Enables easy rollback and comparison.

4. Backup & Relabel Loop: Safe Iterative Improvement

This flow ensures that every relabeling and retraining step is safely versioned and recoverable:

  1. Backup Current: Before relabeling, the current version’s CSVs are either copied to a /backups subfolder or simply preserved in their versioned folder (e.g., anana_v1).
  2. Relabel: Run relabel.py to modify the data as needed.
  3. Train: Execute train.py to generate a new version (e.g., anana_v2).
  4. Verify: Open the QualityApp and use the dropdown to compare the old and new versions side-by-side, visually confirming improvements.

Advantages:

  • No data loss—every step is preserved.
  • Easy to audit and reproduce any result.
  • Facilitates rapid experimentation and safe iteration.

📌 Summary & Next Steps

  • Version control is now at the heart of our training workflow.
  • Both Training and Quality apps are integrated with the new versioning system.
  • The backup and relabel loop ensures safe, iterative model improvement.

Next: Continue refining the UI for better usability, and expand the manifest to track additional metadata (e.g., metrics, notes, hyperparameters) for each version.



This site uses Just the Docs, a documentation theme for Jekyll.