Command Line Interface (CLI)

The AutoMIL Command Line Interface (CLI) provides a user-friendly way to interact with the framework directly from the terminal. It allows users to perform various tasks such as data preprocessing, model training, evaluation, and prediction without needing to write any code.

run_pipeline

run_pipeline(
    slide_dir: str | Path,
    annotation_file: str | Path,
    project_dir: str | Path,
    patient_column: str,
    label_column: str,
    slide_column: str | None,
    resolutions: str,
    model: str,
    k: int,
    split_file: str | None,
    transform_labels: bool,
    is_pretiled: bool,
    verbose: bool,
)

Execute the complete AutoMIL pipeline for whole slide image analysis.

This command runs the full AutoMIL workflow, including project setup, dataset preparation, model training with k-fold cross-validation, evaluation, and result visualization.

Pipeline stages:

Project setup and configuration
Dataset preparation and tile extraction
Model training with k-fold cross-validation
Model evaluation and ensemble creation
Result visualization

Parameters:

Name	Type	Description	Default
`slide_dir`	`str \| Path`	Directory containing whole-slide images or pre-extracted tiles.	required
`annotation_file`	`str \| Path`	CSV file containing slide- or patient-level annotations and labels.	required
`project_dir`	`str \| Path`	Output directory where trained models and intermediate files will be written.	required
`patient_column`	`str`	Name of the column containing patient identifiers.	required
`label_column`	`str`	Name of the column containing class labels.	required
`slide_column`	`str \| None`	Name of the column containing slide identifiers.	required
`resolutions`	`str`	Comma-separated list of resolution presets to train on.	required
`model`	`str`	Model architecture to train.	required
`k`	`int`	Number of folds used for k-fold cross-validation.	required
`is_pretiled`	`bool`	Indicates that the input slides are already tiled.	required
`transform_labels`	`bool`	If enabled, transforms labels to floating-point values.	required
`verbose`	`bool`	Enables verbose logging output.	required

Examples

Basic usage with default settings:

automil run-pipeline /data/slides /data/annotations.csv ./results

Multi-resolution training with verbose output:

automil run-pipeline -r "Low,High" -v /data/slides /data/annotations.csv ./results

Custom model and k-fold settings:

automil run-pipeline -m TransMIL -k 5 /data/slides /data/annotations.csv ./results

Skip tiling if tiles are pre-extracted:

automil run-pipeline -p /data/slides /data/annotations.csv ./results

Custom column names in the annotation file:

automil run-pipeline -pc "patient_name" -lc "diagnosis" -sc "slide_name" /data/slides /data/annotations.csv ./results

Provide a predefined train-test split:

automil run-pipeline --split-file /data/split.json /data/slides /data/annotations.csv ./results

Annotation file requirements

The annotation file must be a CSV file containing at least the following columns:

Patient identifiers (default column name: patient)
Slide identifiers (default column name: slide; optional)
Class labels (default column name: label)

By default, AutoMIL looks for columns named patient, slide, and label. These defaults can be overridden using the --patient_column, --slide_column, and --label_column options.

Minimal annotation file example

patient,slide,label
001,001_1,0
001,001_2,0
002,002,1
003,003,1

Expected slide directory structure

SLIDE_DIR should contain whole slide images in supported formats such as .svs, .tiff, or .png. Example structure:

/data/slides/
|-- slide1.svs
|-- slide2.tiff
|-- slide3.tiff

PNG Slide Handling

If slides are in PNG, AutoMIL will first convert them to TIFF for easier processing.

Using pretiled data

If tiles have already been extracted from the slides, use the --is_pretiled flag. In the case of pretiled data, AutoMIL expects the following directory structure for SLIDE_DIR:

/data/slides/
|-- slide1/
|    |-- tile_0_0.png
|    |-- tile_0_1.png
|    |-- ...
|-- slide2/
|    |-- tile_0_0.png
|    |-- tile_0_1.png
|    |-- ...

Slide name matching

Tile names are arbitrary but slide subdirectories must match the slide names in ANNOTATION_FILE.

Providing a train-test split

Use the --split-file option to provide a JSON file defining train-test splits. The JSON file will have the following structure:

    {
    "train": ["slide1", "slide2", ...],
    "test":  ["slide3", "slide4", ...]
    }

or:

    {
    "train": ["slide1", "slide2", ...],
    "validation":  ["slide3", "slide4", ...]
    }

Output structure

project_dir/
├── bags/           # Extracted tile features
├── models/         # Trained model checkpoints  
├── ensemble/       # Ensemble predictions
├── annotations.csv # Processed annotations
└── results.json    # Performance metrics

Source code in automil/cli.py

@AutoMIL.command(
    name="run-pipeline", 
    context_settings=CONTEXT_SETTINGS,
    no_args_is_help=True,
    help=RUN_PIPELINE_HELP
)
@click.argument("slide_dir",        type=click.Path(exists=True, file_okay=False))
@click.argument("annotation_file",  type=click.Path(exists=True, file_okay=True))
@click.argument("project_dir",      type=click.Path(file_okay=False))
@click.option(
    "-pc", "--patient_column", type=str, default="patient",
    help="Name of the column containing patient IDs"
)
@click.option(
    "-lc", "--label_column", type=str, default="label",
    help="Name of the column containing labels"
)
@click.option(
    "-sc", "--slide_column", type=str, default=None,
    help="Name of the column containing slide names"
)
@click.option(
    "-r", "--resolutions",
    type=str,
    default="Low",
    help=f"Comma-separated list of resolution presets to train on. "
         f"Available: {', '.join([choice for choice in RESOLUTION_CHOICES])} "
         f"(e.g., 'Low,High')"
)
@click.option(
    "-m", "--model",
    type=(model_choice := click.Choice([choice for choice in MODEL_CHOICES])),
    default=model_choice.choices[0],
    help=f"Model type to train and evaluate"
)
@click.option(
    "-k", type=int, default=3,
    help="number of folds to train per resolution level"
)
@click.option(
    "--split-file", type=click.Path(file_okay=True), default="split.json",
    help="Path to a .json file defining train-test splits"
)
@click.option("-t", "--transform_labels", is_flag=True, help="Transforms labels to float values (0.0, 1.0, ...)")
@click.option("-p", "--is-pretiled",      is_flag=True, help="Indicated that the input format is pretiled slides")
@click.option("-v", "--verbose",          is_flag=True, help="Enables additional logging messages")
def run_pipeline(
    slide_dir:       str | Path,
    annotation_file: str | Path,
    project_dir:     str | Path,
    patient_column:  str,
    label_column:    str,
    slide_column:    str | None,
    resolutions:     str,
    model:           str,
    k:               int,
    split_file:      str | None,
    transform_labels: bool,
    is_pretiled:      bool,
    verbose:          bool
    ):
    """
    Execute the complete AutoMIL pipeline for whole slide image analysis.

    This command runs the full AutoMIL workflow, including project setup,
    dataset preparation, model training with k-fold cross-validation,
    evaluation, and result visualization.

    Pipeline stages:

    1. Project setup and configuration
    2. Dataset preparation and tile extraction
    3. Model training with k-fold cross-validation
    4. Model evaluation and ensemble creation
    5. Result visualization

    Args:
        slide_dir (str | Path):
            Directory containing whole-slide images or pre-extracted tiles.

        annotation_file (str | Path):
            CSV file containing slide- or patient-level annotations and labels.

        project_dir (str | Path):
            Output directory where trained models and intermediate files
            will be written.

        patient_column (str):
            Name of the column containing patient identifiers.

        label_column (str):
            Name of the column containing class labels.

        slide_column (str | None):
            Name of the column containing slide identifiers.

        resolutions (str):
            Comma-separated list of resolution presets to train on.

        model (str):
            Model architecture to train.

        k (int):
            Number of folds used for k-fold cross-validation.

        is_pretiled (bool):
            Indicates that the input slides are already tiled.

        transform_labels (bool):
            If enabled, transforms labels to floating-point values.

        verbose (bool):
            Enables verbose logging output.

    ### Examples

      Basic usage with default settings:

        automil run-pipeline /data/slides /data/annotations.csv ./results

      Multi-resolution training with verbose output:

        automil run-pipeline -r "Low,High" -v /data/slides /data/annotations.csv ./results

      Custom model and k-fold settings:

        automil run-pipeline -m TransMIL -k 5 /data/slides /data/annotations.csv ./results

      Skip tiling if tiles are pre-extracted:

        automil run-pipeline -p /data/slides /data/annotations.csv ./results

      Custom column names in the annotation file:

        automil run-pipeline -pc "patient_name" -lc "diagnosis" -sc "slide_name" /data/slides /data/annotations.csv ./results

      Provide a predefined train-test split:

        automil run-pipeline --split-file /data/split.json /data/slides /data/annotations.csv ./results

    ### Annotation file requirements

    The annotation file must be a CSV file containing at least the following columns:

    - Patient identifiers (default column name: `patient`)
    - Slide identifiers (default column name: `slide`; optional)
    - Class labels (default column name: `label`)

    By default, AutoMIL looks for columns named `patient`, `slide`, and `label`.
    These defaults can be overridden using the `--patient_column`,
    `--slide_column`, and `--label_column` options.

    ### Minimal annotation file example
        patient,slide,label
        001,001_1,0
        001,001_2,0
        002,002,1
        003,003,1

    ### Expected slide directory structure
      `SLIDE_DIR` should contain whole slide images in supported formats
      such as .svs, .tiff, or .png.
      Example structure:

        /data/slides/
        |-- slide1.svs
        |-- slide2.tiff
        |-- slide3.tiff

    ??? Note "PNG Slide Handling"
        If slides are in PNG, AutoMIL will first convert them to TIFF for easier processing.

    ### Using pretiled data
      If tiles have already been extracted from the slides, use the `--is_pretiled` flag.
      In the case of pretiled data, AutoMIL expects the following directory structure for `SLIDE_DIR`:

        /data/slides/
        |-- slide1/
        |    |-- tile_0_0.png
        |    |-- tile_0_1.png
        |    |-- ...
        |-- slide2/
        |    |-- tile_0_0.png
        |    |-- tile_0_1.png
        |    |-- ...

    ??? Note "Slide name matching"
        Tile names are arbitrary but slide subdirectories must match the slide names in ANNOTATION_FILE.

    ### Providing a train-test split
      Use the `--split-file` option to provide a JSON file defining train-test splits.
      The JSON file will have the following structure:

            {
            "train": ["slide1", "slide2", ...],
            "test":  ["slide3", "slide4", ...]
            }

      or:

            {
            "train": ["slide1", "slide2", ...],
            "validation":  ["slide3", "slide4", ...]
            }

    ### Output structure

        project_dir/
        ├── bags/           # Extracted tile features
        ├── models/         # Trained model checkpoints  
        ├── ensemble/       # Ensemble predictions
        ├── annotations.csv # Processed annotations
        └── results.json    # Performance metrics

    """
    import slideflow as sf

    from .dataset import Dataset
    from .evaluation import Evaluator
    from .project import Project
    from .trainer import Trainer
    from .util import (INFO_CLR, RESOLUTION_PRESETS, LogLevel, ModelType,
                       get_vlog)
    from .util.backend import configure_image_backend, has_png_slides
    from .util.pretiled import is_input_pretiled

    # Getting a verbose logger
    vlog = get_vlog(verbose)
    sf.setLoggingLevel(20) # INFO: 20, DEBUG: 10

    # Logging the executed command
    command = " ".join(sys.argv)
    vlog(f"Executing command: [{INFO_CLR}]{command}[/]")

    # Define some paths
    bags_dir = Path(project_dir) / "bags"
    models_dir = Path(project_dir) / "models"
    ensemble_dir = Path(project_dir) / "ensemble"

    # Some type coercion
    slide_dir = Path(slide_dir)
    annotation_file = Path(annotation_file)
    project_dir = Path(project_dir)

    try:

        # === 1. Parsing === #
        # Parse given string resolutions into list of RESOLUTION_PRESETS
        resolution_presets: list[RESOLUTION_PRESETS] = []
        for res in [r.strip() for r in resolutions.split(',')]: resolution_presets.append(RESOLUTION_PRESETS[res])
        vlog(f"Using resolution presets: [{INFO_CLR}]{[preset.name for preset in resolution_presets]}[/]")

        # Parse the model type
        model_type = ModelType[model]
        vlog(f"Using model type: [{INFO_CLR}]{model_type.name}[/]")

        # === 2. Image Backend Configuration === #
        png_slides_present = has_png_slides(slide_dir)

        tiff_conversion = configure_image_backend(
            slide_dir=slide_dir,
            needs_png_conversion=png_slides_present,
            verbose=verbose,
        )

        # === 3. Project Creation And Setup === #
        project_setup = Project(
            Path(project_dir),
            Path(annotation_file),
            Path(slide_dir),
            patient_column,
            label_column,
            slide_column,
            transform_labels,
            verbose,
        )

        # Prepare slideflow project object
        project = project_setup.prepare_project()
        # We'll need the label map and slide ids for the dataset setup
        label_map = project_setup.label_map
        slide_ids = project_setup.slide_ids

        project_setup.summary()

        # === 4. Setup Dataset Sources ===
        # Determine if the slide_dir has pretiled slides
        if not is_pretiled: # is_pretiled == False means the flag was not set
            is_pretiled = is_input_pretiled(
                slide_dir,
                slide_ids
            )

        datasets: dict[str, sf.Dataset] = {}
        for preset in resolution_presets:
            vlog(f"Setting up dataset for resolution preset: [{INFO_CLR}]{preset.name}[/]")

            dataset = Dataset(
                project,
                preset,
                label_map,
                slide_dir=Path(slide_dir),
                bags_dir=Path(project_dir) / "bags",
                is_pretiled=is_pretiled,
                tiff_conversion=tiff_conversion,
                verbose=verbose
            )
            dataset.summary()
            datasets[preset.name] = dataset.prepare_dataset_source()
            vlog(f"Dataset setup complete for resolution preset: [{INFO_CLR}]{preset.name}[/]")

        # === 5. Prepare (or Load) Train/Test Split === #
        dataset = datasets[resolution_presets[0].name]
        train, test = dataset.split(
            labels="label",
            val_fraction=0.2,
            splits=split_file
        )
        # Save base train split
        base_train = train

        # === 6. Model Training === #
        for resolution in resolution_presets:
            vlog(f"Train/Test split for resolution preset [{INFO_CLR}]{resolution.name}[/]: "
                 f"[{INFO_CLR}]{len(train.slides())}[/] train slides"
            )

            train, val = base_train.split(
                labels="label",
                val_fraction=0.2
            )

            trainer = Trainer(
                bags_dir,
                project,
                train,
                val,
                model=model_type,
                k=k,
                epochs=300
            )
            trainer.train_k_fold()
            trainer.summary()

        # === 7. Model Evaluation === #
        evaluator = Evaluator(
            test,
            models_dir,
            ensemble_dir,
            bags_dir,
            verbose=verbose
        )

        evaluator.evaluate_models(generate_attention_heatmaps=True)
        evaluator.create_ensemble_predictions(
            output_path=Path(project.root) / "ensemble_predictions.csv"
        )

        evaluator.compare_predictions()
        evaluator.generate_plots(
            save_path=Path(project.root) / "figures",
            model_paths=None
        )

    except Exception as e:
        tb = traceback.format_exc()
        vlog(tb, LogLevel.ERROR)
        vlog(f"Error: {e}", LogLevel.ERROR)
        return

train

train(
    slide_dir: str | Path,
    annotation_file: str | Path,
    project_dir: str | Path,
    patient_column: str,
    label_column: str,
    slide_column: str | None,
    resolutions: str,
    model: str,
    k: int,
    is_pretiled: bool,
    transform_labels: bool,
    verbose: bool,
)

Train one or more MIL models on a given dataset.

This command initializes an AutoMIL project, prepares the dataset, and trains MIL models using k-fold cross-validation. Training can be performed at one or multiple resolution presets.

Pipeline stages:

Project setup and configuration
Dataset preparation and tile extraction
Model training with k-fold cross-validation

Parameters:

Name	Type	Description	Default
`slide_dir`	`str \| Path`	Directory containing whole-slide images or pre-extracted tiles.	required
`annotation_file`	`str \| Path`	CSV file containing slide- or patient-level annotations and labels.	required
`project_dir`	`str \| Path`	Output directory where trained models and intermediate files will be written.	required
`patient_column`	`str`	Name of the column containing patient identifiers.	required
`label_column`	`str`	Name of the column containing class labels.	required
`slide_column`	`str \| None`	Name of the column containing slide identifiers.	required
`resolutions`	`str`	Comma-separated list of resolution presets to train on.	required
`model`	`str`	Model architecture to train.	required
`k`	`int`	Number of folds used for k-fold cross-validation.	required
`is_pretiled`	`bool`	Indicates that the input slides are already tiled.	required
`transform_labels`	`bool`	If enabled, transforms labels to floating-point values.	required
`verbose`	`bool`	Enables verbose logging output.	required

Examples

Basic usage with default settings:

automil train /data/slides /data/annotations.csv ./results

Multi-resolution training with verbose output::

automil train -r "Low,High" -v /data/slides /data/annotations.csv ./results

Custom model and 5-fold configuration:

automil train -m TransMIL -k 5 /data/slides /data/annotations.csv ./results

Using pre-tiled slides::

automil train -p /data/slides /data/annotations.csv ./results

Annotation file requirements

The annotation file must be a CSV file containing at least the following columns:

Patient identifiers (default column name: patient)
Slide identifiers (default column name: slide; optional)
Class labels (default column name: label)

By default, AutoMIL looks for columns named patient, slide, and label. These defaults can be overridden using the --patient_column, --slide_column, and --label_column options.

Minimal annotation file example

patient,slide,label
001,001_1,0
001,001_2,0
002,002,1
003,003,1

Expected slide directory structure

SLIDE_DIR should contain whole slide images in supported formats such as .svs, .tiff, or .png. Example structure:

/data/slides/
|-- slide1.svs
|-- slide2.tiff
|-- slide3.tiff

PNG Slide Handling

If slides are in PNG, AutoMIL will first convert them to TIFF for easier processing.

Using pretiled data

If tiles have already been extracted from the slides, use the --is_pretiled flag. In the case of pretiled data, AutoMIL expects the following directory structure for SLIDE_DIR:

/data/slides/
|-- slide1/
|    |-- tile_0_0.png
|    |-- tile_0_1.png
|    |-- ...
|-- slide2/
|    |-- tile_0_0.png
|    |-- tile_0_1.png
|    |-- ...

Slide name matching

Tile names are arbitrary but slide subdirectories must match the slide names in ANNOTATION_FILE.

Providing a train-test split

Use the --split-file option to provide a JSON file defining train-test splits. The JSON file will have the following structure:

    {
    "train": ["slide1", "slide2", ...],
    "test":  ["slide3", "slide4", ...]
    }

or:

    {
    "train": ["slide1", "slide2", ...],
    "validation":  ["slide3", "slide4", ...]
    }

Output structure

project_dir/
├── bags/           # Extracted tile features
├── models/         # Trained model checkpoints  
├── ensemble/       # Ensemble predictions
├── annotations.csv # Processed annotations
└── results.json    # Performance metrics

Source code in automil/cli.py

@AutoMIL.command(
    name="train",
    context_settings=CONTEXT_SETTINGS,
    no_args_is_help=True,
    help=TRAIN_HELP
)
@click.argument("slide_dir",        type=click.Path(exists=True, file_okay=False))
@click.argument("annotation_file",  type=click.Path(exists=True, file_okay=True))
@click.argument("project_dir",      type=click.Path(file_okay=False))
@click.option(
    "-pc", "--patient_column", type=str, default="patient",
    help="Name of the column containing patient IDs"
)
@click.option(
    "-lc", "--label_column", type=str, default="label",
    help="Name of the column containing labels"
)
@click.option(
    "-sc", "--slide_column", type=str, default=None,
    help="Name of the column containing slide names"
)
@click.option(
    "-r", "--resolutions",
    type=str,
    default="Low",
    help=f"Comma-separated list of resolution presets to train on. "
         f"Available: {', '.join([choice for choice in RESOLUTION_CHOICES])} "
         f"(e.g., 'Low,High')"
)
@click.option(
    "-m", "--model",
    type=(model_choice := click.Choice([choice for choice in MODEL_CHOICES])),
    default=model_choice.choices[0],
    help=f"Model type to train and evaluate"
)
@click.option(
    "-k", type=int, default=3,
    help="number of folds to train per resolution level"
)
@click.option("-p", "--is-pretiled",      is_flag=True, help="Indicated that the input format is pretiled slides")
@click.option("-t", "--transform_labels", is_flag=True, help="Transforms labels to float values (0.0, 1.0, ...)")
@click.option("-v", "--verbose",          is_flag=True, help="Enables additional logging messages")
def train(
    slide_dir:       str | Path,
    annotation_file: str | Path,
    project_dir:     str | Path,
    patient_column:  str,
    label_column:    str,
    slide_column:    str | None,
    resolutions:     str,
    model:           str,
    k:               int,
    is_pretiled:      bool,
    transform_labels: bool,
    verbose:          bool
):
    """
    Train one or more MIL models on a given dataset.

    This command initializes an AutoMIL project, prepares the dataset,
    and trains MIL models using k-fold cross-validation. Training can be
    performed at one or multiple resolution presets.

    Pipeline stages:

    1. Project setup and configuration
    2. Dataset preparation and tile extraction
    3. Model training with k-fold cross-validation

    Args:
        slide_dir (str | Path):
            Directory containing whole-slide images or pre-extracted tiles.

        annotation_file (str | Path):
            CSV file containing slide- or patient-level annotations and labels.

        project_dir (str | Path):
            Output directory where trained models and intermediate files
            will be written.

        patient_column (str):
            Name of the column containing patient identifiers.

        label_column (str):
            Name of the column containing class labels.

        slide_column (str | None):
            Name of the column containing slide identifiers.

        resolutions (str):
            Comma-separated list of resolution presets to train on.

        model (str):
            Model architecture to train.

        k (int):
            Number of folds used for k-fold cross-validation.

        is_pretiled (bool):
            Indicates that the input slides are already tiled.

        transform_labels (bool):
            If enabled, transforms labels to floating-point values.

        verbose (bool):
            Enables verbose logging output.

    ### Examples
      Basic usage with default settings:

        automil train /data/slides /data/annotations.csv ./results

      Multi-resolution training with verbose output::

        automil train -r "Low,High" -v /data/slides /data/annotations.csv ./results

      Custom model and 5-fold configuration:

        automil train -m TransMIL -k 5 /data/slides /data/annotations.csv ./results

      Using pre-tiled slides::

        automil train -p /data/slides /data/annotations.csv ./results

    ### Annotation file requirements

    The annotation file must be a CSV file containing at least the following columns:

    - Patient identifiers (default column name: `patient`)
    - Slide identifiers (default column name: `slide`; optional)
    - Class labels (default column name: `label`)

    By default, AutoMIL looks for columns named `patient`, `slide`, and `label`.
    These defaults can be overridden using the `--patient_column`,
    `--slide_column`, and `--label_column` options.

    ### Minimal annotation file example
        patient,slide,label
        001,001_1,0
        001,001_2,0
        002,002,1
        003,003,1

    ### Expected slide directory structure
      `SLIDE_DIR` should contain whole slide images in supported formats
      such as .svs, .tiff, or .png.
      Example structure:

        /data/slides/
        |-- slide1.svs
        |-- slide2.tiff
        |-- slide3.tiff

    ??? Note "PNG Slide Handling"
        If slides are in PNG, AutoMIL will first convert them to TIFF for easier processing.

    ### Using pretiled data
      If tiles have already been extracted from the slides, use the `--is_pretiled` flag.
      In the case of pretiled data, AutoMIL expects the following directory structure for `SLIDE_DIR`:

        /data/slides/
        |-- slide1/
        |    |-- tile_0_0.png
        |    |-- tile_0_1.png
        |    |-- ...
        |-- slide2/
        |    |-- tile_0_0.png
        |    |-- tile_0_1.png
        |    |-- ...

    ??? Note "Slide name matching"
        Tile names are arbitrary but slide subdirectories must match the slide names in ANNOTATION_FILE.

    ### Providing a train-test split
      Use the `--split-file` option to provide a JSON file defining train-test splits.
      The JSON file will have the following structure:

            {
            "train": ["slide1", "slide2", ...],
            "test":  ["slide3", "slide4", ...]
            }

      or:

            {
            "train": ["slide1", "slide2", ...],
            "validation":  ["slide3", "slide4", ...]
            }

    ### Output structure

        project_dir/
        ├── bags/           # Extracted tile features
        ├── models/         # Trained model checkpoints  
        ├── ensemble/       # Ensemble predictions
        ├── annotations.csv # Processed annotations
        └── results.json    # Performance metrics

    """

    import slideflow as sf

    from .dataset import Dataset
    from .project import Project
    from .trainer import Trainer
    from .util import (INFO_CLR, RESOLUTION_PRESETS, LogLevel, ModelType,
                       get_vlog)
    from .util.backend import configure_image_backend, has_png_slides
    from .util.pretiled import is_input_pretiled

    # Getting a verbose logger
    vlog = get_vlog(verbose)
    sf.setLoggingLevel(20) # INFO: 20, DEBUG: 10

    # Logging the executed command
    command = " ".join(sys.argv)
    vlog(f"Executing command: [{INFO_CLR}]{command}[/]")

    # Define some paths
    bags_dir = Path(project_dir) / "bags"

    # Some type coercion
    slide_dir = Path(slide_dir)
    annotation_file = Path(annotation_file)
    project_dir = Path(project_dir)

    try:

        # === 1. Parsing === #
        # Parse given string resolutions into list of RESOLUTION_PRESETS
        resolution_presets: list[RESOLUTION_PRESETS] = []
        for res in [r.strip() for r in resolutions.split(',')]: resolution_presets.append(RESOLUTION_PRESETS[res])
        vlog(f"Using resolution presets: [{INFO_CLR}]{[preset.name for preset in resolution_presets]}[/]")

        # Parse the model type
        model_type = ModelType[model]
        vlog(f"Using model type: [{INFO_CLR}]{model_type.name}[/]")

        # === 2. Image Backend Configuration === #
        png_slides_present = has_png_slides(slide_dir)

        tiff_conversion = configure_image_backend(
            slide_dir=slide_dir,
            needs_png_conversion=png_slides_present,
            verbose=verbose,
        )

        # === 3. Project Creation And Setup === #
        project_setup = Project(
            Path(project_dir),
            Path(annotation_file),
            Path(slide_dir),
            patient_column,
            label_column,
            slide_column,
            transform_labels=transform_labels,
            verbose=verbose,
        )
        # Prepare slideflow project object
        project = project_setup.prepare_project()
        # We'll need the label map and slide ids for the dataset setup
        label_map = project_setup.label_map
        slide_ids = project_setup.slide_ids

        project_setup.summary()

        # === 4. Setup Dataset Sources ===
        # Determine if the slide_dir has pretiled slides
        if not is_pretiled: # is_pretiled == False means the flag was not set
            is_pretiled = is_input_pretiled(
                slide_dir,
                slide_ids
            )

        datasets: dict[str, sf.Dataset] = {}
        for resolution in resolution_presets:
            vlog(f"Setting up dataset for resolution preset: [{INFO_CLR}]{resolution.name}[/]")

            dataset = Dataset(
                project,
                resolution,
                label_map,
                slide_dir=Path(slide_dir),
                bags_dir=Path(project_dir) / "bags",
                is_pretiled=is_pretiled,
                tiff_conversion=tiff_conversion,
                verbose=verbose
            )
            dataset.summary()
            datasets[resolution.name] = dataset.prepare_dataset_source()
            vlog(f"Dataset setup complete for resolution preset: [{INFO_CLR}]{resolution.name}[/]")

        # === 5. Model Training === #
        for resolution in resolution_presets:
            dataset = datasets[resolution.name]
            vlog(f"Train/Test split for resolution preset '[{INFO_CLR}]{resolution.name}[/]': "
                 f"[{INFO_CLR}]{len(dataset.slides())}[/] train slides"
            )

            train, val = dataset.split(
                labels="label",
                val_fraction=0.2
            )

            trainer = Trainer(
                bags_dir,
                project,
                train,
                val,
                model=model_type,
                k=k,
                epochs=300
            )
            trainer.train_k_fold()
            trainer.summary()

    except Exception as e:
        tb = traceback.format_exc()
        vlog(tb, LogLevel.ERROR)
        vlog(f"Error: {e}", LogLevel.ERROR)
        return

predict

predict(
    slide_dir: str | Path,
    annotation_file: str | Path,
    bags_dir: str | Path,
    model_dir: str | Path,
    output_dir: str | Path,
    patient_column: str,
    label_column: str,
    slide_column: str | None,
    verbose: bool,
)

Generate predictions using one or more trained MIL models.

This command loads trained model checkpoints and generates predictions for the slides in SLIDE_DIR using precomputed tile feature bags from BAGS_DIR. Predictions are written to the specified output directory.

Parameters:

Name	Type	Description	Default
`slide_dir`	`str \| Path`	Directory containing whole-slide images.	required
`annotation_file`	`str \| Path`	CSV file containing slide- or patient-level annotations and labels.	required
`bags_dir`	`str \| Path`	Directory containing extracted tile feature bags.	required
`model_dir`	`str \| Path`	Directory containing trained model checkpoints.	required
`output_dir`	`str \| Path`	Directory to which prediction files will be written.	required
`patient_column`	`str`	Name of the column containing patient identifiers.	required
`label_column`	`str`	Name of the column containing class labels.	required
`slide_column`	`str \| None`	Name of the column containing slide identifiers.	required
`verbose`	`bool`	Enables verbose logging output.	required

Examples

Basic usage with multiple models:

automil predict /data/slides /data/annotations.csv /data/bags /data/models -o ./predictions

Generate predictions with a single model:

automil predict /data/slides /data/annotations.csv /data/bags /data/models/model_1 -v

Override annotation column names:

automil predict -pc "patient_id" -lc "outcome" -sc "slide_id"             /data/slides /data/annotations.csv /data/bags /data/models/model_1             -o ./predictions

Expected model directory structure

MODEL_DIR may either point to a single model directory or to a parent directory containing multiple model subdirectories.

Single model example:

/data/models/model_1/
|-- best_valid.pth
|-- ...

Multiple models example:

/data/models/
|-- model_1/
|    |-- best_valid.pth
|-- model_2/
|    |-- best_valid.pth
|    |-- ...

Multiple models

When multiple models are provided, AutoMIL generates a separate prediction file for each model.

Annotation file requirements

The annotation file must be a CSV file containing at least the following columns:

Patient identifiers (default column name: patient)
Slide identifiers (default column name: slide; optional)
Class labels (default column name: label)

By default, AutoMIL looks for columns named patient, slide, and label. These defaults can be overridden using the --patient_column, --slide_column, and --label_column options.

Minimal annotation file example

patient,slide,label
001,001_1,0
001,001_2,0
002,002,1
003,003,1

Output directory format

OUTPUT_DIR must be a directory path. Prediction results are saved as separate .csv or .parquet files inside this directory.

When multiple models are used, output files include a suffix indicating the corresponding model.

Source code in automil/cli.py

@AutoMIL.command(
    name="predict",
    context_settings=CONTEXT_SETTINGS,
    no_args_is_help=True,
    help=PREDICT_HELP
)
@click.argument("slide_dir",    type=click.Path(exists=True, file_okay=False))
@click.argument("annotation_file",  type=click.Path(exists=True, file_okay=True))
@click.argument("bags_dir",     type=click.Path(exists=True, file_okay=False))
@click.argument("model_dir",    type=click.Path(exists=True, file_okay=False))
@click.option(
    "-o", "--output-dir", 
    type=click.Path(file_okay=True), default="predictions",
    help="Directory to which to save predictions (should either be .csv or .parquet)"
)
@click.option(
    "-pc", "--patient_column", type=str, default="patient",
    help="Name of the column containing patient IDs"
)
@click.option(
    "-lc", "--label_column", type=str, default="label",
    help="Name of the column containing labels"
)
@click.option(
    "-sc", "--slide_column", type=str, default=None,
    help="Name of the column containing slide names"
)
@click.option("-v", "--verbose", is_flag=True, help="Enables additional logging messages")
def predict(
    slide_dir:   str | Path,
    annotation_file: str | Path,
    bags_dir:    str | Path,
    model_dir:   str | Path,
    output_dir: str | Path,
    patient_column:  str,
    label_column:    str,
    slide_column:    str | None,
    verbose:     bool
):
    """
    Generate predictions using one or more trained MIL models.

    This command loads trained model checkpoints and generates predictions
    for the slides in `SLIDE_DIR` using precomputed tile feature bags from
    `BAGS_DIR`. Predictions are written to the specified output directory.

    Args:
        slide_dir (str | Path):
            Directory containing whole-slide images.

        annotation_file (str | Path):
            CSV file containing slide- or patient-level annotations and labels.

        bags_dir (str | Path):
            Directory containing extracted tile feature bags.

        model_dir (str | Path):
            Directory containing trained model checkpoints.

        output_dir (str | Path):
            Directory to which prediction files will be written.

        patient_column (str):
            Name of the column containing patient identifiers.

        label_column (str):
            Name of the column containing class labels.

        slide_column (str | None):
            Name of the column containing slide identifiers.

        verbose (bool):
            Enables verbose logging output.

    ### Examples

    Basic usage with multiple models:

        automil predict /data/slides /data/annotations.csv /data/bags /data/models -o ./predictions

    Generate predictions with a single model:

        automil predict /data/slides /data/annotations.csv /data/bags /data/models/model_1 -v

    Override annotation column names:

        automil predict -pc "patient_id" -lc "outcome" -sc "slide_id" \
            /data/slides /data/annotations.csv /data/bags /data/models/model_1 \
            -o ./predictions

    ### Expected model directory structure

    `MODEL_DIR` may either point to a single model directory or to a parent
    directory containing multiple model subdirectories.

    Single model example:

        /data/models/model_1/
        |-- best_valid.pth
        |-- ...

    Multiple models example:

        /data/models/
        |-- model_1/
        |    |-- best_valid.pth
        |-- model_2/
        |    |-- best_valid.pth
        |    |-- ...

    ??? Note "Multiple models"
        When multiple models are provided, AutoMIL generates a separate
        prediction file for each model.

    ### Annotation file requirements

    The annotation file must be a CSV file containing at least the following columns:

    - Patient identifiers (default column name: `patient`)
    - Slide identifiers (default column name: `slide`; optional)
    - Class labels (default column name: `label`)

    By default, AutoMIL looks for columns named `patient`, `slide`, and `label`.
    These defaults can be overridden using the `--patient_column`,
    `--slide_column`, and `--label_column` options.

    ### Minimal annotation file example

        patient,slide,label
        001,001_1,0
        001,001_2,0
        002,002,1
        003,003,1

    ### Output directory format

    `OUTPUT_DIR` must be a directory path. Prediction results are saved as
    separate `.csv` or `.parquet` files inside this directory.

    When multiple models are used, output files include a suffix indicating
    the corresponding model.
    """

    import slideflow as sf

    from .evaluation import Evaluator
    from .project import Project
    from .util import INFO_CLR, LogLevel, get_vlog

    # Getting a verbose logger
    vlog = get_vlog(verbose)
    sf.setLoggingLevel(20) # INFO: 20, DEBUG: 10

    # Logging the executed command
    command = " ".join(sys.argv)
    vlog(f"Executing command: [{INFO_CLR}]{command}[/]")

    # Some type coercion
    slide_dir = Path(slide_dir)
    bags_dir =  Path(bags_dir)
    model_dir = Path(model_dir)
    output_dir = Path(output_dir)

    # Setup output folder as project (modifies annotation file)
    project = Project(
        Path(output_dir),
        Path(annotation_file),
        Path(slide_dir),
        patient_column,
        label_column,
        slide_column,
        transform_labels=False,
        verbose=verbose,
    )
    project.setup_project_scaffold()
    annotation_file = project.modified_annotations_file

    # Create a minimal dataset (needed for prediction)
    dataset = sf.Dataset(
        slides=str(slide_dir),
        annotations=str(annotation_file)
    )

    # Generate predictions
    try:
        evaluator = Evaluator(
            dataset,
            model_dir,
            output_dir,
            bags_dir,
            verbose=verbose
        )
        evaluator.generate_predictions()

    except Exception as e:
        tb = traceback.format_exc()
        vlog(tb, LogLevel.ERROR)
        vlog(f"Error: {e}", LogLevel.ERROR)
        return

evaluate

evaluate(
    slide_dir: str | Path,
    annotation_file: str | Path,
    bags_dir: str | Path,
    model_dir: str | Path,
    output_dir: str | Path,
    patient_column: str,
    label_column: str,
    slide_column: str | None,
    verbose: bool,
)

Evaluate one or more trained MIL models on a labeled dataset.

This command generates predictions for the slides in SLIDE_DIR using trained models from MODEL_DIR and corresponding tile feature bags from BAGS_DIR. The predictions are evaluated against the provided annotations, and summary metrics and plots are generated.

Parameters:

Name	Type	Description	Default
`slide_dir`	`str \| Path`	Directory containing whole-slide images.	required
`annotation_file`	`str \| Path`	CSV file containing slide- or patient-level annotations and labels.	required
`bags_dir`	`str \| Path`	Directory containing extracted tile feature bags.	required
`model_dir`	`str \| Path`	Directory containing trained model checkpoints.	required
`output_dir`	`str \| Path`	Directory to which evaluation results will be written.	required
`patient_column`	`str`	Name of the column containing patient identifiers.	required
`label_column`	`str`	Name of the column containing class labels.	required
`slide_column`	`str \| None`	Name of the column containing slide identifiers.	required
`verbose`	`bool`	Enables verbose logging output.	required

Examples

Evaluate a single model:

automil evaluate /data/slides /data/annotations.csv /data/bags /data/models/model_1 -o ./results

Evaluate multiple models:

automil evaluate /data/slides /data/annotations.csv /data/bags /data/models -v

Override annotation column names:

automil evaluate -pc "patient_id" -lc "outcome" -sc "slide_id"             /data/slides /data/annotations.csv /data/bags /data/models/model_1             -o ./results

Expected model directory structure

MODEL_DIR may refer either to a single model directory or to a parent directory containing multiple model subdirectories.

Single model example:

/data/models/model_1/
|-- best_valid.pth
|-- ...

Multiple models example:

/data/models/
|-- model_1/
|    |-- best_valid.pth
|-- model_2/
|    |-- best_valid.pth
|    |-- ...

Multiple models

When multiple models are evaluated, AutoMIL generates separate evaluation results for each model and compares their performance.

Annotation file requirements

The annotation file must be a CSV file containing at least the following columns:

Patient identifiers (default column name: patient)
Slide identifiers (default column name: slide; optional)
Class labels (default column name: label)

By default, AutoMIL looks for columns named patient, slide, and label. These defaults can be overridden using the --patient_column, --slide_column, and --label_column options.

Minimal annotation file example

patient,slide,label
001,001_1,0
001,001_2,0
002,002,1
003,003,1

Output directory format

OUTPUT_DIR must be a directory path. Evaluation results, metrics, and plots are written to this directory.

When multiple models are evaluated, output files include a suffix indicating the corresponding model.

Source code in automil/cli.py

@AutoMIL.command(
    name="evaluate",
    context_settings=CONTEXT_SETTINGS,
    no_args_is_help=True,
    help=EVALUATE_HELP
)
@click.argument("slide_dir",    type=click.Path(exists=True, file_okay=False))
@click.argument("annotation_file",  type=click.Path(exists=True, file_okay=True))
@click.argument("bags_dir",     type=click.Path(exists=True, file_okay=False))
@click.argument("model_dir",    type=click.Path(exists=True, file_okay=False))
@click.option(
    "-o", "--output-dir", 
    type=click.Path(file_okay=True), default="evaluation",
    help="Directory to which to save evaluation results"
)
@click.option(
    "-pc", "--patient_column", type=str, default="patient",
    help="Name of the column containing patient IDs"
)
@click.option(
    "-lc", "--label_column", type=str, default="label",
    help="Name of the column containing labels"
)
@click.option(
    "-sc", "--slide_column", type=str, default=None,
    help="Name of the column containing slide names"
)
@click.option("-v", "--verbose", is_flag=True, help="Enables additional logging messages")
def evaluate(
    slide_dir:   str | Path,
    annotation_file: str | Path,
    bags_dir:    str | Path,
    model_dir:   str | Path,
    output_dir: str | Path,
    patient_column:  str,
    label_column:    str,
    slide_column:    str | None,
    verbose:     bool
):
    """
    Evaluate one or more trained MIL models on a labeled dataset.

    This command generates predictions for the slides in `SLIDE_DIR` using
    trained models from `MODEL_DIR` and corresponding tile feature bags from
    `BAGS_DIR`. The predictions are evaluated against the provided annotations,
    and summary metrics and plots are generated.

    Args:
        slide_dir (str | Path):
            Directory containing whole-slide images.

        annotation_file (str | Path):
            CSV file containing slide- or patient-level annotations and labels.

        bags_dir (str | Path):
            Directory containing extracted tile feature bags.

        model_dir (str | Path):
            Directory containing trained model checkpoints.

        output_dir (str | Path):
            Directory to which evaluation results will be written.

        patient_column (str):
            Name of the column containing patient identifiers.

        label_column (str):
            Name of the column containing class labels.

        slide_column (str | None):
            Name of the column containing slide identifiers.

        verbose (bool):
            Enables verbose logging output.

    ### Examples

    Evaluate a single model:

        automil evaluate /data/slides /data/annotations.csv /data/bags /data/models/model_1 -o ./results

    Evaluate multiple models:

        automil evaluate /data/slides /data/annotations.csv /data/bags /data/models -v

    Override annotation column names:

        automil evaluate -pc "patient_id" -lc "outcome" -sc "slide_id" \
            /data/slides /data/annotations.csv /data/bags /data/models/model_1 \
            -o ./results

    ### Expected model directory structure

    `MODEL_DIR` may refer either to a single model directory or to a parent
    directory containing multiple model subdirectories.

    Single model example:

        /data/models/model_1/
        |-- best_valid.pth
        |-- ...

    Multiple models example:

        /data/models/
        |-- model_1/
        |    |-- best_valid.pth
        |-- model_2/
        |    |-- best_valid.pth
        |    |-- ...

    ??? Note "Multiple models"
        When multiple models are evaluated, AutoMIL generates separate
        evaluation results for each model and compares their performance.

    ### Annotation file requirements

    The annotation file must be a CSV file containing at least the following columns:

    - Patient identifiers (default column name: `patient`)
    - Slide identifiers (default column name: `slide`; optional)
    - Class labels (default column name: `label`)

    By default, AutoMIL looks for columns named `patient`, `slide`, and `label`.
    These defaults can be overridden using the `--patient_column`,
    `--slide_column`, and `--label_column` options.

    ### Minimal annotation file example

        patient,slide,label
        001,001_1,0
        001,001_2,0
        002,002,1
        003,003,1

    ### Output directory format

    `OUTPUT_DIR` must be a directory path. Evaluation results, metrics, and plots
    are written to this directory.

    When multiple models are evaluated, output files include a suffix indicating
    the corresponding model.
    """
    import slideflow as sf

    from .evaluation import Evaluator
    from .project import Project
    from .util import INFO_CLR, LogLevel, get_vlog

    # Getting a verbose logger
    vlog = get_vlog(verbose)
    sf.setLoggingLevel(20) # INFO: 20, DEBUG: 10

    # Logging the executed command
    command = " ".join(sys.argv)
    vlog(f"Executing command: [{INFO_CLR}]{command}[/]")

    # Some type coercion
    slide_dir =  Path(slide_dir)
    bags_dir =   Path(bags_dir)
    model_dir =  Path(model_dir)
    output_dir = Path(output_dir)

    vlog(f"Evaluating models in: [{INFO_CLR}]{model_dir}[/]")

    # Setup output folder as project (modifies annotation file)
    project = Project(
        Path(output_dir),
        Path(annotation_file),
        Path(slide_dir),
        patient_column,
        label_column,
        slide_column,
        transform_labels=False,
        verbose=verbose,
    )
    project.setup_project_scaffold()
    annotation_file = project.modified_annotations_file

    # Create a minimal dataset (needed for prediction)
    dataset = sf.Dataset(
        slides=str(slide_dir),
        annotations=str(annotation_file)
    )

    # Evaluate models
    try:
        evaluator = Evaluator(
            dataset,
            model_dir,
            output_dir,
            bags_dir,
            verbose=verbose
        )
        evaluator.evaluate_models(generate_attention_heatmaps=True)
        evaluator.compare_predictions()
        evaluator.generate_plots()

    except Exception as e:
        tb = traceback.format_exc()
        vlog(tb, LogLevel.ERROR)
        vlog(f"Error: {e}", LogLevel.ERROR)
        return

create_split

create_split(
    slide_dir: str | Path,
    annotation_file: str | Path,
    output_file: str | Path,
    test_fraction: float,
    read_only: bool,
    verbose: bool,
)

Create a train–test split file from dataset annotations.

This command reads the provided annotation file and generates a train–test split, which is saved as a JSON file. The resulting split can be reused for reproducible training and evaluation.

Parameters:

Name	Type	Description	Default
`slide_dir`	`str \| Path`	Directory containing whole-slide images.	required
`annotation_file`	`str \| Path`	CSV file containing slide- or patient-level annotations and labels.	required
`output_file`	`str \| Path`	Path to which the split JSON file will be written.	required
`test_fraction`	`float`	Fraction of samples to assign to the test set.	required
`read_only`	`bool`	If enabled, an existing split file will not be overwritten.	required
`verbose`	`bool`	Enables verbose logging output.	required

Examples

Create a split with default settings:

automil create-split /data/slides /data/annotations.csv -o split.json

Create a split without overwriting an existing file:

automil create-split /data/slides /data/annotations.csv -o split.json --read-only

Output file format

The output JSON file contains slide identifiers grouped by split name.

Example structure:

{
"train": ["slide1", "slide2", ...],
"test":  ["slide3", "slide4", ...]
}

Depending on the configuration, a validation split may be generated instead of or in addition to a test split.

Source code in automil/cli.py

@AutoMIL.command(
    "create-split",
    context_settings=CONTEXT_SETTINGS,
    no_args_is_help=True,
    help=CREATE_SPLIT_HELP
)
@click.argument("slide_dir",        type=click.Path(exists=True, file_okay=False))
@click.argument("annotation_file",  type=click.Path(exists=True, file_okay=True))
@click.option(
    "-o", "--output-file", type=click.Path(file_okay=True), default="split.json",
    help="Path to which to save the split .json file"
)
@click.option("-f", "--test-fraction", type=float, default=0.2, help="Fraction of slides to include in the test set")
@click.option("-r", "--read-only", is_flag=True, help="If set, existing split file will not be overwritten")
@click.option("-v", "--verbose", is_flag=True, help="Enables additional logging messages")
def create_split(
    slide_dir:       str | Path,
    annotation_file: str | Path,
    output_file:     str | Path,
    test_fraction:   float,
    read_only:       bool,
    verbose:         bool
):
    """
    Create a train–test split file from dataset annotations.

    This command reads the provided annotation file and generates a train–test
    split, which is saved as a JSON file. The resulting split can be reused
    for reproducible training and evaluation.

    Args:
        slide_dir (str | Path):
            Directory containing whole-slide images.

        annotation_file (str | Path):
            CSV file containing slide- or patient-level annotations and labels.

        output_file (str | Path):
            Path to which the split JSON file will be written.

        test_fraction (float):
            Fraction of samples to assign to the test set.

        read_only (bool):
            If enabled, an existing split file will not be overwritten.

        verbose (bool):
            Enables verbose logging output.

    ### Examples

    Create a split with default settings:

        automil create-split /data/slides /data/annotations.csv -o split.json

    Create a split without overwriting an existing file:

        automil create-split /data/slides /data/annotations.csv -o split.json --read-only

    ### Output file format

    The output JSON file contains slide identifiers grouped by split name.

    Example structure:

        {
        "train": ["slide1", "slide2", ...],
        "test":  ["slide3", "slide4", ...]
        }

    Depending on the configuration, a `validation` split may be generated
    instead of or in addition to a `test` split.
    """

    import slideflow as sf

    from .util import INFO_CLR, LogLevel, get_vlog

    # Getting a verbose logger
    vlog = get_vlog(verbose)
    sf.setLoggingLevel(20) # INFO: 20, DEBUG: 10

    # Logging the executed command
    command = " ".join(sys.argv)
    vlog(f"Executing command: [{INFO_CLR}]{command}[/]")

    # Some type coercion
    slide_dir = Path(slide_dir)
    annotation_file = Path(annotation_file)
    output_file = Path(output_file)

    try:
        # Minimal dataset for splitting
        dataset = sf.Dataset(
            slides=str(slide_dir),
            annotations=str(annotation_file)
        )
        # Create the split and save it
        _, _ = dataset.split(
            labels="label",
            val_fraction=test_fraction,
            splits=str(output_file),
            read_only=read_only
        )

    except Exception as e:
        tb = traceback.format_exc()
        vlog(tb, LogLevel.ERROR)
        vlog(f"Error: {e}", LogLevel.ERROR)
        return