Skip to content

Io directory

directory_loader(path_imagedir, allowed_image_formats, training=True) ยค

Data Input Interface for loading a dataset in a directory-based structure.

This internal function allows simple parsing of class annotations encoded in subdirectories.

Input Formats
Format Directory - Training:
    - Class annotations are encoded via subdirectories
    - Images are provided in subdirectories

Format Directory - Testing:
    - All images are provided in the directory
    - No class annotations

Expected structure for training:

images_dir/                     # path_imagedir = "dataset/images_dir"
    class_A/
        sample001.png
        sample002.png
        ...
        sample050.png
    class_B/                    # Directory / class names can be any String
        sample051.png           # like "diabetes", "cancer", ...
        sample052.png
        ...
        sample100.png
    ...
    class_C/
        sample101.png           # Sample names (indicies) should be unique!
        sample102.png
        ...
        sample150.png

Expected structure for testing:

images_dir/                     # path_imagedir = "dataset/images_dir"
    sample001.png
    sample002.png
    ...
    sample100.png

Parameters:

Name Type Description Default
path_imagedir str

Path to the directory containing the images or the subdirectories.

required
allowed_image_formats list of str

List of allowed imaging formats. (provided by IO_Interface)

required
training bool

Boolean option whether annotation data is available.

True

Returns:

Name Type Description
index_list list of str

List of sample/index encoded as Strings. Required in DataGenerator as samples.

class_ohe numpy.ndarray

Classification list as One-Hot encoding. Required in DataGenerator as labels.

class_n int

Number of classes. Required in NeuralNetwork for Architecture design as n_labels.

class_names list of str

List of names for corresponding classes. Used for later prediction storage or evaluation.

image_format str

Image format to add at the end of the sample index for image loading. Required in DataGenerator.

Source code in aucmedi/data_processing/io_interfaces/io_directory.py
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def directory_loader(path_imagedir, allowed_image_formats, training=True):
    """ Data Input Interface for loading a dataset in a directory-based structure.

    This **internal** function allows simple parsing of class annotations encoded in subdirectories.

    ???+ info "Input Formats"
        ```
        Format Directory - Training:
            - Class annotations are encoded via subdirectories
            - Images are provided in subdirectories

        Format Directory - Testing:
            - All images are provided in the directory
            - No class annotations
        ```

    **Expected structure for training:**
    ```
    images_dir/                     # path_imagedir = "dataset/images_dir"
        class_A/
            sample001.png
            sample002.png
            ...
            sample050.png
        class_B/                    # Directory / class names can be any String
            sample051.png           # like "diabetes", "cancer", ...
            sample052.png
            ...
            sample100.png
        ...
        class_C/
            sample101.png           # Sample names (indicies) should be unique!
            sample102.png
            ...
            sample150.png
    ```

    **Expected structure for testing:**
    ```
    images_dir/                     # path_imagedir = "dataset/images_dir"
        sample001.png
        sample002.png
        ...
        sample100.png
    ```

    Args:
        path_imagedir (str):                    Path to the directory containing the images or the subdirectories.
        allowed_image_formats (list of str):    List of allowed imaging formats. (provided by IO_Interface)
        training (bool):                        Boolean option whether annotation data is available.

    Returns:
        index_list (list of str):               List of sample/index encoded as Strings. Required in DataGenerator as `samples`.
        class_ohe (numpy.ndarray):              Classification list as One-Hot encoding. Required in DataGenerator as `labels`.
        class_n (int):                          Number of classes. Required in NeuralNetwork for Architecture design as `n_labels`.
        class_names (list of str):              List of names for corresponding classes. Used for later prediction storage or evaluation.
        image_format (str):                     Image format to add at the end of the sample index for image loading. Required in DataGenerator.
    """
    # Initialize some variables
    image_format = None
    index_list = []
    # Format - including class annotations encoded via subdirectories
    if training:
        class_names = []
        classes_sparse = []
        # Iterate over subdirectories
        for c, subdirectory in enumerate(sorted(os.listdir(path_imagedir))):
            # Skip items which are not a directory (metadata)
            if not os.path.isdir(os.path.join(path_imagedir, subdirectory)):
                continue
            class_names.append(subdirectory)
            # Iterate over each sample
            path_sd = os.path.join(path_imagedir, subdirectory)
            for file in sorted(os.listdir(path_sd)):
                sample = os.path.join(subdirectory, file)
                index_list.append(sample)
                classes_sparse.append(c)
        # Parse sparse categorical annotations to One-Hot Encoding
        class_n = len(class_names)
        class_ohe = pd.get_dummies(classes_sparse).to_numpy()
        # Return parsing
        return index_list, class_ohe, class_n, class_names, image_format
    # Format - excluding class annotations -> only testing images
    else:
        # Iterate over all images
        for file in sorted(os.listdir(path_imagedir)):
            # Identify image format by peaking first image
            if image_format is None:
                format = file.split(".")[-1]
                if format.lower() in allowed_image_formats or \
                   format.upper() in allowed_image_formats:
                   image_format = format
            # Add sample to list
            index_list.append(file[:-(len(format)+1)])
        # Raise Exception if image format is unknown
        if image_format is None:
            raise Exception("Unknown image format.", path_imagedir)
        # Return parsing
        return index_list, None, None, None, image_format