Skip to content

Io data

ACCEPTABLE_IMAGE_FORMATS = ['jpeg', 'jpg', 'tif', 'tiff', 'png', 'bmp', 'gif', 'npy', 'nii', 'gz', 'mha'] module-attribute ¤

List of accepted image formats.

input_interface(interface, path_imagedir, path_data=None, training=True, ohe=False, image_format=None, **kwargs) ¤

Data Input Interface for all automatically extract various information of dataset structures.

Different image file structures and annotation information are processed by corresponding format interfaces. These extracted information can be parsed to the DataGenerator and the NeuralNetwork.

The input_interface() function is the first of the three pillars of AUCMEDI.

Pillars of AUCMEDI

Basically a wrapper function for calling the correct format interface, which loads a dataset from disk via the associated format parser.

Possible format interfaces: ["csv", "json", "directory"]

Format Interfaces
Interface Internal Function Description
"csv" io_csv() Storing class annotations in a CSV file.
"directory" io_directory() Storing class annotations in subdirectories.
"json" io_json() Storing class annotations in a JSON file.
Example
# AUCMEDI library
from aucmedi import *

# Initialize input data reader
ds = input_interface(interface="csv",                       # Interface type
                     path_imagedir="dataset/images/",
                     path_data="dataset/annotations.csv",
                     ohe=False, col_sample="ID", col_class="diagnosis")
(index_list, class_ohe, nclasses, class_names, image_format) = ds

# Pass variables to other AUCMEDI pillars like DataGenerator
datagen = DataGenerator(samples=index_list,                 # from input_interface()
                        path_imagedir="dataset/images/",
                        labels=class_ohe,                   # from input_interface()
                        image_format=image_format)          # from input_interface()

Parameters:

Name Type Description Default
path_imagedir str

Path to the directory containing the images.

required
interface str

String defining format interface for loading/storing data.

required
path_data str

Path to the index/class annotation file if required. (csv/json)

None
training bool

Boolean option whether annotation data is available.

True
ohe bool

Boolean option whether annotation data is sparse categorical or one-hot encoded.

False
image_format str

Force to use a specific image format. By default, image format is determined automatically.

None
**kwargs dict

Additional parameters for the format interfaces.

{}

Returns:

Name Type Description
index_list list of str

List of sample/index encoded as Strings. Required in DataGenerator as samples.

class_ohe numpy.ndarray

Classification list as One-Hot encoding. Required in DataGenerator as labels.

class_n int

Number of classes. Required in NeuralNetwork for Architecture design as n_labels.

class_names list of str

List of names for corresponding classes. Used for later prediction storage or evaluation.

image_format str

Image format to add at the end of the sample index for image loading. Required in DataGenerator.

Source code in aucmedi/data_processing/io_data.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def input_interface(interface, path_imagedir, path_data=None, training=True,
                    ohe=False, image_format=None, **kwargs):
    """ Data Input Interface for all automatically extract various information of dataset structures.

    Different image file structures and annotation information are processed by
    corresponding format interfaces. These extracted information can be parsed to the
    [DataGenerator][aucmedi.data_processing.data_generator.DataGenerator] and the
    [NeuralNetwork][aucmedi.neural_network.model.NeuralNetwork].

    The input_interface() function is the first of the three pillars of AUCMEDI.

    ??? info "Pillars of AUCMEDI"
        - [aucmedi.data_processing.io_data.input_interface][]
        - [aucmedi.data_processing.data_generator.DataGenerator][]
        - [aucmedi.neural_network.model.NeuralNetwork][]

    Basically a wrapper function for calling the correct format interface,
    which loads a dataset from disk via the associated format parser.

    Possible format interfaces: `["csv", "json", "directory"]`

    ???+ info "Format Interfaces"
        | Interface      | Internal Function                                                    | Description                                  |
        | -------------- | -------------------------------------------------------------------- | -------------------------------------------- |
        |  `"csv"`       | [io_csv()][aucmedi.data_processing.io_interfaces.io_csv]             | Storing class annotations in a CSV file.     |
        |  `"directory"` | [io_directory()][aucmedi.data_processing.io_interfaces.io_directory] | Storing class annotations in subdirectories. |
        |  `"json"`      | [io_json()][aucmedi.data_processing.io_interfaces.io_json]           | Storing class annotations in a JSON file.    |

    ???+ example
        ```python
        # AUCMEDI library
        from aucmedi import *

        # Initialize input data reader
        ds = input_interface(interface="csv",                       # Interface type
                             path_imagedir="dataset/images/",
                             path_data="dataset/annotations.csv",
                             ohe=False, col_sample="ID", col_class="diagnosis")
        (index_list, class_ohe, nclasses, class_names, image_format) = ds

        # Pass variables to other AUCMEDI pillars like DataGenerator
        datagen = DataGenerator(samples=index_list,                 # from input_interface()
                                path_imagedir="dataset/images/",
                                labels=class_ohe,                   # from input_interface()
                                image_format=image_format)          # from input_interface()
        ```

    Args:
        path_imagedir (str):            Path to the directory containing the images.
        interface (str):                String defining format interface for loading/storing data.
        path_data (str):                Path to the index/class annotation file if required. (csv/json)
        training (bool):                Boolean option whether annotation data is available.
        ohe (bool):                     Boolean option whether annotation data is sparse categorical or one-hot encoded.
        image_format (str):             Force to use a specific image format. By default, image format is determined automatically.
        **kwargs (dict):                Additional parameters for the format interfaces.

    Returns:
        index_list (list of str):       List of sample/index encoded as Strings. Required in DataGenerator as `samples`.
        class_ohe (numpy.ndarray):      Classification list as One-Hot encoding. Required in DataGenerator as `labels`.
        class_n (int):                  Number of classes. Required in NeuralNetwork for Architecture design as `n_labels`.
        class_names (list of str):      List of names for corresponding classes. Used for later prediction storage or evaluation.
        image_format (str):             Image format to add at the end of the sample index for image loading. Required in DataGenerator.
    """
    # Transform selected interface to lower case
    interface = interface.lower()
    # Pass image format if provided
    if image_format != None : allowed_image_formats = [image_format]
    else : allowed_image_formats = ACCEPTABLE_IMAGE_FORMATS
    # Verify if provided interface is valid
    if interface not in ["csv", "json", "directory"]:
        raise Exception("Unknown interface code provided.", interface)
    # Verify that annotation file is available if CSV/JSON interface is used
    if interface in ["csv", "json"] and path_data is None:
        raise Exception("No annotation file provided for CSV/JSON interface!")

    # Initialize parameter dictionary
    parameters = {"path_data": path_data,
                  "path_imagedir": path_imagedir,
                  "allowed_image_formats": allowed_image_formats,
                  "training": training, "ohe": ohe}
    # Identify correct dataset loader and parameters for CSV format
    if interface == "csv":
        ds_loader = io.csv_loader
        additional_parameters = ["ohe_range", "col_sample", "col_class"]
        for para in additional_parameters:
            if para in kwargs : parameters[para] = kwargs[para]
    # Identify correct dataset loader and parameters for JSON format
    elif interface == "json" : ds_loader = io.json_loader
    # Identify correct dataset loader and parameters for directory format
    elif interface == "directory":
        ds_loader = io.directory_loader
        del parameters["ohe"]
        del parameters["path_data"]

    # Load the dataset with the selected format interface and return results
    return ds_loader(**parameters)