Skip to content

Bagging

Bagging ¤

A Bagging class providing functionality for cross-validation based ensemble learning.

Homogeneous model ensembles can be defined as multiple models consisting of the same algorithm, hyperparameters, or architecture. The Bagging technique is based on improved training dataset sampling and a popular homogeneous ensemble learning technique. In contrast to a standard single training/validation split, which results in a single model, Bagging consists of training multiple models on randomly drawn subsets from the dataset.

In AUCMEDI, a k-fold cross-validation is applied on the dataset resulting in k models.

Example
# Initialize NeuralNetwork model
model = NeuralNetwork(n_labels=4, channels=3, architecture="2D.ResNet50")

# Initialize Bagging object for 3-fold cross-validation
el = Bagging(model, k_fold=3)


# Initialize training DataGenerator for complete training data
datagen = DataGenerator(samples_train, "images_dir/",
                        labels=train_labels_ohe, batch_size=3,
                        resize=model.meta_input,
                        standardize_mode=model.meta_standardize)
# Train models
el.train(datagen, epochs=100)


# Initialize testing DataGenerator for testing data
test_gen = DataGenerator(samples_test, "images_dir/",
                         resize=model.meta_input,
                         standardize_mode=model.meta_standardize)
# Run Inference with majority vote aggregation
preds = el.predict(test_gen, aggregate="majority_vote")

Training Time Increase

Bagging sequentially performs fitting processes for multiple models (commonly k_fold=3 up to k_fold=10), which will drastically increase training time.

DataGenerator re-initialization

The passed DataGenerator for the train() and predict() function of the Bagging class will be re-initialized!

This can result in redundant image preparation if prepare_images=True.

NeuralNetwork re-initialization

The passed NeuralNetwork for the train() and predict() function of the Composite class will be re-initialized!

Attention: Metrics are not passed to the processes due to pickling issues.

Technical Details

For the training and inference process, each model will create an individual process via the Python multiprocessing package.

This is crucial as TensorFlow does not fully support the VRAM memory garbage collection in GPUs, which is why more and more redundant data pile up with an increasing number of k-fold.

Via separate processes, it is possible to clean up the TensorFlow environment and rebuild it again for the next fold model.

Reference for Ensemble Learning Techniques

Dominik Müller, Iñaki Soto-Rey and Frank Kramer. (2022). An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks. arXiv e-print: https://arxiv.org/abs/2201.11440

Source code in aucmedi/ensemble/bagging.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
class Bagging:
    """ A Bagging class providing functionality for cross-validation based ensemble learning.

    Homogeneous model ensembles can be defined as multiple models consisting of the same algorithm, hyperparameters,
    or architecture. The Bagging technique is based on improved training dataset sampling and a popular homogeneous
    ensemble learning technique. In contrast to a standard single training/validation split, which results in a single
    model, Bagging consists of training multiple models on randomly drawn subsets from the dataset.

    In AUCMEDI, a k-fold cross-validation is applied on the dataset resulting in k models.

    ???+ example
        ```python
        # Initialize NeuralNetwork model
        model = NeuralNetwork(n_labels=4, channels=3, architecture="2D.ResNet50")

        # Initialize Bagging object for 3-fold cross-validation
        el = Bagging(model, k_fold=3)


        # Initialize training DataGenerator for complete training data
        datagen = DataGenerator(samples_train, "images_dir/",
                                labels=train_labels_ohe, batch_size=3,
                                resize=model.meta_input,
                                standardize_mode=model.meta_standardize)
        # Train models
        el.train(datagen, epochs=100)


        # Initialize testing DataGenerator for testing data
        test_gen = DataGenerator(samples_test, "images_dir/",
                                 resize=model.meta_input,
                                 standardize_mode=model.meta_standardize)
        # Run Inference with majority vote aggregation
        preds = el.predict(test_gen, aggregate="majority_vote")
        ```

    !!! warning "Training Time Increase"
        Bagging sequentially performs fitting processes for multiple models (commonly `k_fold=3` up to `k_fold=10`),
        which will drastically increase training time.

    ??? warning "DataGenerator re-initialization"
        The passed DataGenerator for the train() and predict() function of the Bagging class will be re-initialized!

        This can result in redundant image preparation if `prepare_images=True`.

    ??? warning "NeuralNetwork re-initialization"
        The passed NeuralNetwork for the train() and predict() function of the Composite class will be re-initialized!

        Attention: Metrics are not passed to the processes due to pickling issues.

    ??? info "Technical Details"
        For the training and inference process, each model will create an individual process via the Python multiprocessing package.

        This is crucial as TensorFlow does not fully support the VRAM memory garbage collection in GPUs,
        which is why more and more redundant data pile up with an increasing number of k-fold.

        Via separate processes, it is possible to clean up the TensorFlow environment and rebuild it again for the next fold model.

    ??? reference "Reference for Ensemble Learning Techniques"
        Dominik Müller, Iñaki Soto-Rey and Frank Kramer. (2022).
        An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks.
        arXiv e-print: [https://arxiv.org/abs/2201.11440](https://arxiv.org/abs/2201.11440)
    """
    def __init__(self, model, k_fold=3):
        """ Initialization function for creating a Bagging object.

        Args:
            model (NeuralNetwork):         Instance of an AUCMEDI neural network class.
            k_fold (int):                   Number of folds (k) for the Cross-Validation. Must be at least 2.
        """
        # Cache class variables
        self.model_template = model
        self.k_fold = k_fold
        self.cache_dir = None

        # Set multiprocessing method to spawn
        mp.set_start_method("spawn", force=True)

    def train(self, training_generator, epochs=20, iterations=None,
              callbacks=[], class_weights=None, transfer_learning=False):
        """ Training function for the Bagging models which performs a k-fold cross-validation model fitting.

        The training data will be sampled according to a k-fold cross-validation in which a validation
        [DataGenerator][aucmedi.data_processing.data_generator.DataGenerator] will be automatically created.

        It is also possible to pass custom Callback classes in order to obtain more information.

        For more information on the fitting process, check out [NeuralNetwork.train()][aucmedi.neural_network.model.NeuralNetwork.train].

        Args:
            training_generator (DataGenerator):     A data generator which will be used for training (will be split according to k-fold sampling).
            epochs (int):                           Number of epochs. A single epoch is defined as one iteration through
                                                    the complete data set.
            iterations (int):                       Number of iterations (batches) in a single epoch.
            callbacks (list of Callback classes):   A list of Callback classes for custom evaluation.
            class_weights (dictionary or list):     A list or dictionary of float values to handle class imbalance.
            transfer_learning (bool):               Option whether a transfer learning training should be performed.

        Returns:
            history (dict):                   A history dictionary from a Keras history object which contains several logs.
        """
        temp_dg = training_generator    # Template DataGenerator variable for faster access
        history_bagging = {}            # Final history dictionary

        # Create temporary model directory
        self.cache_dir = tempfile.TemporaryDirectory(prefix="aucmedi.tmp.",
                                                     suffix=".bagging")

        # Obtain training data
        x = training_generator.samples
        y = training_generator.labels
        m = training_generator.metadata

        # Apply cross-validaton sampling
        cv_sampling = sampling_kfold(x, y, m, n_splits=self.k_fold,
                                     stratified=True, iterative=True)

        # Sequentially iterate over all folds
        for i, fold in enumerate(cv_sampling):
            # Pack data into a tuple
            if len(fold) == 4:
                (train_x, train_y, test_x, test_y) = fold
                data = (train_x, train_y, None, test_x, test_y, None)
            else : data = fold

            # Create model specific callback list
            callbacks_model = callbacks.copy()
            # Extend Callback list
            cb_mc = ModelCheckpoint(os.path.join(self.cache_dir.name,
                                                 "cv_" + str(i) + \
                                                 ".model.hdf5"),
                                    monitor="val_loss", verbose=1,
                                    save_best_only=True, mode="min")
            cb_cl = CSVLogger(os.path.join(self.cache_dir.name,
                                                 "cv_" + str(i) + \
                                                 ".logs.csv"),
                              separator=',', append=True)
            callbacks_model.extend([cb_mc, cb_cl])

            # Gather NeuralNetwork parameters
            model_paras = {
                "n_labels": self.model_template.n_labels,
                "channels": self.model_template.channels,
                "input_shape": self.model_template.input_shape,
                "architecture": self.model_template.architecture,
                "pretrained_weights": self.model_template.pretrained_weights,
                "loss": self.model_template.loss,
                "metrics": None,
                "activation_output": self.model_template.activation_output,
                "fcl_dropout": self.model_template.fcl_dropout,
                "meta_variables": self.model_template.meta_variables,
                "learning_rate": self.model_template.learning_rate,
                "batch_queue_size": self.model_template.batch_queue_size,
                "workers": self.model_template.workers,
                "multiprocessing": self.model_template.multiprocessing,
            }

            # Gather DataGenerator parameters
            datagen_paras = {"path_imagedir": temp_dg.path_imagedir,
                             "batch_size": temp_dg.batch_size,
                             "data_aug": temp_dg.data_aug,
                             "seed": temp_dg.seed,
                             "subfunctions": temp_dg.subfunctions,
                             "shuffle": temp_dg.shuffle,
                             "standardize_mode": temp_dg.standardize_mode,
                             "resize": temp_dg.resize,
                             "grayscale": temp_dg.grayscale,
                             "prepare_images": temp_dg.prepare_images,
                             "sample_weights": temp_dg.sample_weights,
                             "image_format": temp_dg.image_format,
                             "loader": temp_dg.sample_loader,
                             "workers": temp_dg.workers,
                             "kwargs": temp_dg.kwargs
            }

            # Gather training parameters
            parameters_training = {"epochs": epochs,
                                   "iterations": iterations,
                                   "callbacks": callbacks_model,
                                   "class_weights": class_weights,
                                   "transfer_learning": transfer_learning
            }

            # Start training process
            process_queue = mp.Queue()
            process_train = mp.Process(target=__training_process__,
                                       args=(process_queue,
                                             model_paras,
                                             data,
                                             datagen_paras,
                                             parameters_training))
            process_train.start()
            process_train.join()
            cv_history = process_queue.get()
            # Combine logged history objects
            hcv = {"cv_" + str(i) + "." + k: v for k, v in cv_history.items()}
            history_bagging = {**history_bagging, **hcv}

        # Return Bagging history object
        return history_bagging

    def predict(self, prediction_generator, aggregate="mean",
                return_ensemble=False):
        """ Prediction function for the Bagging models.

        The fitted models will predict classifications for the provided [DataGenerator][aucmedi.data_processing.data_generator.DataGenerator].

        The inclusion of the Aggregate function can be achieved in multiple ways:

        - self-initialization with an AUCMEDI Aggregate function,
        - use a string key to call an AUCMEDI Aggregate function by name, or
        - implementing a custom Aggregate function by extending the [AUCMEDI base class for Aggregate functions][aucmedi.ensemble.aggregate.agg_base]

        !!! info
            Description and list of implemented Aggregate functions can be found here:
            [Aggregate][aucmedi.ensemble.aggregate]

        Args:
            prediction_generator (DataGenerator):   A data generator which will be used for inference.
            aggregate (str or aggregate Function):  Aggregate function class instance or a string for an AUCMEDI Aggregate function.
            return_ensemble (bool):                 Option, whether gathered ensemble of predictions should be returned.

        Returns:
            preds (numpy.ndarray):                  A NumPy array of predictions formatted with shape (n_samples, n_labels).
            ensemble (numpy.ndarray):               Optional ensemble of predictions: Will be only passed if `return_ensemble=True`.
                                                    Shape (n_models, n_samples, n_labels).
        """
        # Verify if there is a linked cache dictionary
        con_tmp = (isinstance(self.cache_dir, tempfile.TemporaryDirectory) and \
                   os.path.exists(self.cache_dir.name))
        con_var = (self.cache_dir is not None and \
                   not isinstance(self.cache_dir, tempfile.TemporaryDirectory) \
                   and os.path.exists(self.cache_dir))
        if not con_tmp and not con_var:
            raise FileNotFoundError("Bagging does not have a valid model cache directory!")

        # Initialize aggregate function if required
        if isinstance(aggregate, str) and aggregate in aggregate_dict:
            agg_fun = aggregate_dict[aggregate]()
        else : agg_fun = aggregate

        # Initialize some variables
        temp_dg = prediction_generator
        preds_ensemble = []
        preds_final = []

        # Gather DataGenerator parameters
        datagen_paras = {"samples": temp_dg.samples,
                         "metadata": temp_dg.metadata,
                         "path_imagedir": temp_dg.path_imagedir,
                         "batch_size": temp_dg.batch_size,
                         "data_aug": temp_dg.data_aug,
                         "seed": temp_dg.seed,
                         "subfunctions": temp_dg.subfunctions,
                         "shuffle": temp_dg.shuffle,
                         "standardize_mode": temp_dg.standardize_mode,
                         "resize": temp_dg.resize,
                         "grayscale": temp_dg.grayscale,
                         "prepare_images": temp_dg.prepare_images,
                         "sample_weights": temp_dg.sample_weights,
                         "image_format": temp_dg.image_format,
                         "loader": temp_dg.sample_loader,
                         "workers": temp_dg.workers,
                         "kwargs": temp_dg.kwargs
        }

        # Identify path to model directory
        if isinstance(self.cache_dir, tempfile.TemporaryDirectory):
            path_model_dir = self.cache_dir.name
        else : path_model_dir = self.cache_dir

        # Sequentially iterate over all fold models
        for i in range(self.k_fold):
            # Identify path to fitted model
            path_model = os.path.join(path_model_dir,
                                      "cv_" + str(i) + ".model.hdf5")

            # Gather NeuralNetwork parameters
            model_paras = {
                "n_labels": self.model_template.n_labels,
                "channels": self.model_template.channels,
                "input_shape": self.model_template.input_shape,
                "architecture": self.model_template.architecture,
                "pretrained_weights": self.model_template.pretrained_weights,
                "loss": self.model_template.loss,
                "metrics": None,
                "activation_output": self.model_template.activation_output,
                "fcl_dropout": self.model_template.fcl_dropout,
                "meta_variables": self.model_template.meta_variables,
                "learning_rate": self.model_template.learning_rate,
                "batch_queue_size": self.model_template.batch_queue_size,
                "workers": self.model_template.workers,
                "multiprocessing": self.model_template.multiprocessing,
            }

            # Start inference process for fold i
            process_queue = mp.Queue()
            process_pred = mp.Process(target=__prediction_process__,
                                      args=(process_queue,
                                            model_paras,
                                            path_model,
                                            datagen_paras))
            process_pred.start()
            process_pred.join()
            preds = process_queue.get()

            # Append to prediction ensemble
            preds_ensemble.append(preds)

        # Aggregate predictions
        preds_ensemble = np.array(preds_ensemble)
        for i in range(0, len(temp_dg.samples)):
            pred_sample = agg_fun.aggregate(preds_ensemble[:,i,:])
            preds_final.append(pred_sample)

        # Convert prediction list to NumPy
        preds_final = np.asarray(preds_final)

        # Return ensembled predictions
        if return_ensemble : return preds_final, preds_ensemble
        else : return preds_final

    # Dump model to file
    def dump(self, directory_path):
        """ Store temporary Bagging model directory permanently to disk at desired location.

        If the model directory is a provided path which is already persistent on the disk,
        the directory is copied in order to keep original data persistent.

        Args:
            directory_path (str):       Path to store the model directory on disk.
        """
        if self.cache_dir is None:
            raise FileNotFoundError("Bagging does not have a valid model cache directory!")
        elif isinstance(self.cache_dir, tempfile.TemporaryDirectory):
            shutil.copytree(self.cache_dir.name, directory_path,
                            dirs_exist_ok=True)
            self.cache_dir.cleanup()
            self.cache_dir = directory_path
        else:
            shutil.copytree(self.cache_dir, directory_path, dirs_exist_ok=True)
            self.cache_dir = directory_path

    # Load model from file
    def load(self, directory_path):
        """ Load a Bagging model directory which can be used for aggregated inference.

        Args:
            directory_path (str):       Input path, from which the Bagging models will be loaded.
        """
        # Check directory existence
        if not os.path.exists(directory_path):
            raise FileNotFoundError("Provided model directory path does not exist!",
                                    directory_path)
        # Check model existence
        for i in range(self.k_fold):
            path_model = os.path.join(directory_path,
                                      "cv_" + str(i) + ".model.hdf5")
            if not os.path.exists(path_model):
                raise FileNotFoundError("Bagging model for fold " + str(i) + \
                                        " does not exist!", path_model)
        # Update model directory
        self.cache_dir = directory_path

__init__(model, k_fold=3) ¤

Initialization function for creating a Bagging object.

Parameters:

Name Type Description Default
model NeuralNetwork

Instance of an AUCMEDI neural network class.

required
k_fold int

Number of folds (k) for the Cross-Validation. Must be at least 2.

3
Source code in aucmedi/ensemble/bagging.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
def __init__(self, model, k_fold=3):
    """ Initialization function for creating a Bagging object.

    Args:
        model (NeuralNetwork):         Instance of an AUCMEDI neural network class.
        k_fold (int):                   Number of folds (k) for the Cross-Validation. Must be at least 2.
    """
    # Cache class variables
    self.model_template = model
    self.k_fold = k_fold
    self.cache_dir = None

    # Set multiprocessing method to spawn
    mp.set_start_method("spawn", force=True)

dump(directory_path) ¤

Store temporary Bagging model directory permanently to disk at desired location.

If the model directory is a provided path which is already persistent on the disk, the directory is copied in order to keep original data persistent.

Parameters:

Name Type Description Default
directory_path str

Path to store the model directory on disk.

required
Source code in aucmedi/ensemble/bagging.py
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
def dump(self, directory_path):
    """ Store temporary Bagging model directory permanently to disk at desired location.

    If the model directory is a provided path which is already persistent on the disk,
    the directory is copied in order to keep original data persistent.

    Args:
        directory_path (str):       Path to store the model directory on disk.
    """
    if self.cache_dir is None:
        raise FileNotFoundError("Bagging does not have a valid model cache directory!")
    elif isinstance(self.cache_dir, tempfile.TemporaryDirectory):
        shutil.copytree(self.cache_dir.name, directory_path,
                        dirs_exist_ok=True)
        self.cache_dir.cleanup()
        self.cache_dir = directory_path
    else:
        shutil.copytree(self.cache_dir, directory_path, dirs_exist_ok=True)
        self.cache_dir = directory_path

load(directory_path) ¤

Load a Bagging model directory which can be used for aggregated inference.

Parameters:

Name Type Description Default
directory_path str

Input path, from which the Bagging models will be loaded.

required
Source code in aucmedi/ensemble/bagging.py
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
def load(self, directory_path):
    """ Load a Bagging model directory which can be used for aggregated inference.

    Args:
        directory_path (str):       Input path, from which the Bagging models will be loaded.
    """
    # Check directory existence
    if not os.path.exists(directory_path):
        raise FileNotFoundError("Provided model directory path does not exist!",
                                directory_path)
    # Check model existence
    for i in range(self.k_fold):
        path_model = os.path.join(directory_path,
                                  "cv_" + str(i) + ".model.hdf5")
        if not os.path.exists(path_model):
            raise FileNotFoundError("Bagging model for fold " + str(i) + \
                                    " does not exist!", path_model)
    # Update model directory
    self.cache_dir = directory_path

predict(prediction_generator, aggregate='mean', return_ensemble=False) ¤

Prediction function for the Bagging models.

The fitted models will predict classifications for the provided DataGenerator.

The inclusion of the Aggregate function can be achieved in multiple ways:

  • self-initialization with an AUCMEDI Aggregate function,
  • use a string key to call an AUCMEDI Aggregate function by name, or
  • implementing a custom Aggregate function by extending the AUCMEDI base class for Aggregate functions

Info

Description and list of implemented Aggregate functions can be found here: Aggregate

Parameters:

Name Type Description Default
prediction_generator DataGenerator

A data generator which will be used for inference.

required
aggregate str or aggregate Function

Aggregate function class instance or a string for an AUCMEDI Aggregate function.

'mean'
return_ensemble bool

Option, whether gathered ensemble of predictions should be returned.

False

Returns:

Name Type Description
preds numpy.ndarray

A NumPy array of predictions formatted with shape (n_samples, n_labels).

ensemble numpy.ndarray

Optional ensemble of predictions: Will be only passed if return_ensemble=True. Shape (n_models, n_samples, n_labels).

Source code in aucmedi/ensemble/bagging.py
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
def predict(self, prediction_generator, aggregate="mean",
            return_ensemble=False):
    """ Prediction function for the Bagging models.

    The fitted models will predict classifications for the provided [DataGenerator][aucmedi.data_processing.data_generator.DataGenerator].

    The inclusion of the Aggregate function can be achieved in multiple ways:

    - self-initialization with an AUCMEDI Aggregate function,
    - use a string key to call an AUCMEDI Aggregate function by name, or
    - implementing a custom Aggregate function by extending the [AUCMEDI base class for Aggregate functions][aucmedi.ensemble.aggregate.agg_base]

    !!! info
        Description and list of implemented Aggregate functions can be found here:
        [Aggregate][aucmedi.ensemble.aggregate]

    Args:
        prediction_generator (DataGenerator):   A data generator which will be used for inference.
        aggregate (str or aggregate Function):  Aggregate function class instance or a string for an AUCMEDI Aggregate function.
        return_ensemble (bool):                 Option, whether gathered ensemble of predictions should be returned.

    Returns:
        preds (numpy.ndarray):                  A NumPy array of predictions formatted with shape (n_samples, n_labels).
        ensemble (numpy.ndarray):               Optional ensemble of predictions: Will be only passed if `return_ensemble=True`.
                                                Shape (n_models, n_samples, n_labels).
    """
    # Verify if there is a linked cache dictionary
    con_tmp = (isinstance(self.cache_dir, tempfile.TemporaryDirectory) and \
               os.path.exists(self.cache_dir.name))
    con_var = (self.cache_dir is not None and \
               not isinstance(self.cache_dir, tempfile.TemporaryDirectory) \
               and os.path.exists(self.cache_dir))
    if not con_tmp and not con_var:
        raise FileNotFoundError("Bagging does not have a valid model cache directory!")

    # Initialize aggregate function if required
    if isinstance(aggregate, str) and aggregate in aggregate_dict:
        agg_fun = aggregate_dict[aggregate]()
    else : agg_fun = aggregate

    # Initialize some variables
    temp_dg = prediction_generator
    preds_ensemble = []
    preds_final = []

    # Gather DataGenerator parameters
    datagen_paras = {"samples": temp_dg.samples,
                     "metadata": temp_dg.metadata,
                     "path_imagedir": temp_dg.path_imagedir,
                     "batch_size": temp_dg.batch_size,
                     "data_aug": temp_dg.data_aug,
                     "seed": temp_dg.seed,
                     "subfunctions": temp_dg.subfunctions,
                     "shuffle": temp_dg.shuffle,
                     "standardize_mode": temp_dg.standardize_mode,
                     "resize": temp_dg.resize,
                     "grayscale": temp_dg.grayscale,
                     "prepare_images": temp_dg.prepare_images,
                     "sample_weights": temp_dg.sample_weights,
                     "image_format": temp_dg.image_format,
                     "loader": temp_dg.sample_loader,
                     "workers": temp_dg.workers,
                     "kwargs": temp_dg.kwargs
    }

    # Identify path to model directory
    if isinstance(self.cache_dir, tempfile.TemporaryDirectory):
        path_model_dir = self.cache_dir.name
    else : path_model_dir = self.cache_dir

    # Sequentially iterate over all fold models
    for i in range(self.k_fold):
        # Identify path to fitted model
        path_model = os.path.join(path_model_dir,
                                  "cv_" + str(i) + ".model.hdf5")

        # Gather NeuralNetwork parameters
        model_paras = {
            "n_labels": self.model_template.n_labels,
            "channels": self.model_template.channels,
            "input_shape": self.model_template.input_shape,
            "architecture": self.model_template.architecture,
            "pretrained_weights": self.model_template.pretrained_weights,
            "loss": self.model_template.loss,
            "metrics": None,
            "activation_output": self.model_template.activation_output,
            "fcl_dropout": self.model_template.fcl_dropout,
            "meta_variables": self.model_template.meta_variables,
            "learning_rate": self.model_template.learning_rate,
            "batch_queue_size": self.model_template.batch_queue_size,
            "workers": self.model_template.workers,
            "multiprocessing": self.model_template.multiprocessing,
        }

        # Start inference process for fold i
        process_queue = mp.Queue()
        process_pred = mp.Process(target=__prediction_process__,
                                  args=(process_queue,
                                        model_paras,
                                        path_model,
                                        datagen_paras))
        process_pred.start()
        process_pred.join()
        preds = process_queue.get()

        # Append to prediction ensemble
        preds_ensemble.append(preds)

    # Aggregate predictions
    preds_ensemble = np.array(preds_ensemble)
    for i in range(0, len(temp_dg.samples)):
        pred_sample = agg_fun.aggregate(preds_ensemble[:,i,:])
        preds_final.append(pred_sample)

    # Convert prediction list to NumPy
    preds_final = np.asarray(preds_final)

    # Return ensembled predictions
    if return_ensemble : return preds_final, preds_ensemble
    else : return preds_final

train(training_generator, epochs=20, iterations=None, callbacks=[], class_weights=None, transfer_learning=False) ¤

Training function for the Bagging models which performs a k-fold cross-validation model fitting.

The training data will be sampled according to a k-fold cross-validation in which a validation DataGenerator will be automatically created.

It is also possible to pass custom Callback classes in order to obtain more information.

For more information on the fitting process, check out NeuralNetwork.train().

Parameters:

Name Type Description Default
training_generator DataGenerator

A data generator which will be used for training (will be split according to k-fold sampling).

required
epochs int

Number of epochs. A single epoch is defined as one iteration through the complete data set.

20
iterations int

Number of iterations (batches) in a single epoch.

None
callbacks list of Callback classes

A list of Callback classes for custom evaluation.

[]
class_weights dictionary or list

A list or dictionary of float values to handle class imbalance.

None
transfer_learning bool

Option whether a transfer learning training should be performed.

False

Returns:

Name Type Description
history dict

A history dictionary from a Keras history object which contains several logs.

Source code in aucmedi/ensemble/bagging.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
def train(self, training_generator, epochs=20, iterations=None,
          callbacks=[], class_weights=None, transfer_learning=False):
    """ Training function for the Bagging models which performs a k-fold cross-validation model fitting.

    The training data will be sampled according to a k-fold cross-validation in which a validation
    [DataGenerator][aucmedi.data_processing.data_generator.DataGenerator] will be automatically created.

    It is also possible to pass custom Callback classes in order to obtain more information.

    For more information on the fitting process, check out [NeuralNetwork.train()][aucmedi.neural_network.model.NeuralNetwork.train].

    Args:
        training_generator (DataGenerator):     A data generator which will be used for training (will be split according to k-fold sampling).
        epochs (int):                           Number of epochs. A single epoch is defined as one iteration through
                                                the complete data set.
        iterations (int):                       Number of iterations (batches) in a single epoch.
        callbacks (list of Callback classes):   A list of Callback classes for custom evaluation.
        class_weights (dictionary or list):     A list or dictionary of float values to handle class imbalance.
        transfer_learning (bool):               Option whether a transfer learning training should be performed.

    Returns:
        history (dict):                   A history dictionary from a Keras history object which contains several logs.
    """
    temp_dg = training_generator    # Template DataGenerator variable for faster access
    history_bagging = {}            # Final history dictionary

    # Create temporary model directory
    self.cache_dir = tempfile.TemporaryDirectory(prefix="aucmedi.tmp.",
                                                 suffix=".bagging")

    # Obtain training data
    x = training_generator.samples
    y = training_generator.labels
    m = training_generator.metadata

    # Apply cross-validaton sampling
    cv_sampling = sampling_kfold(x, y, m, n_splits=self.k_fold,
                                 stratified=True, iterative=True)

    # Sequentially iterate over all folds
    for i, fold in enumerate(cv_sampling):
        # Pack data into a tuple
        if len(fold) == 4:
            (train_x, train_y, test_x, test_y) = fold
            data = (train_x, train_y, None, test_x, test_y, None)
        else : data = fold

        # Create model specific callback list
        callbacks_model = callbacks.copy()
        # Extend Callback list
        cb_mc = ModelCheckpoint(os.path.join(self.cache_dir.name,
                                             "cv_" + str(i) + \
                                             ".model.hdf5"),
                                monitor="val_loss", verbose=1,
                                save_best_only=True, mode="min")
        cb_cl = CSVLogger(os.path.join(self.cache_dir.name,
                                             "cv_" + str(i) + \
                                             ".logs.csv"),
                          separator=',', append=True)
        callbacks_model.extend([cb_mc, cb_cl])

        # Gather NeuralNetwork parameters
        model_paras = {
            "n_labels": self.model_template.n_labels,
            "channels": self.model_template.channels,
            "input_shape": self.model_template.input_shape,
            "architecture": self.model_template.architecture,
            "pretrained_weights": self.model_template.pretrained_weights,
            "loss": self.model_template.loss,
            "metrics": None,
            "activation_output": self.model_template.activation_output,
            "fcl_dropout": self.model_template.fcl_dropout,
            "meta_variables": self.model_template.meta_variables,
            "learning_rate": self.model_template.learning_rate,
            "batch_queue_size": self.model_template.batch_queue_size,
            "workers": self.model_template.workers,
            "multiprocessing": self.model_template.multiprocessing,
        }

        # Gather DataGenerator parameters
        datagen_paras = {"path_imagedir": temp_dg.path_imagedir,
                         "batch_size": temp_dg.batch_size,
                         "data_aug": temp_dg.data_aug,
                         "seed": temp_dg.seed,
                         "subfunctions": temp_dg.subfunctions,
                         "shuffle": temp_dg.shuffle,
                         "standardize_mode": temp_dg.standardize_mode,
                         "resize": temp_dg.resize,
                         "grayscale": temp_dg.grayscale,
                         "prepare_images": temp_dg.prepare_images,
                         "sample_weights": temp_dg.sample_weights,
                         "image_format": temp_dg.image_format,
                         "loader": temp_dg.sample_loader,
                         "workers": temp_dg.workers,
                         "kwargs": temp_dg.kwargs
        }

        # Gather training parameters
        parameters_training = {"epochs": epochs,
                               "iterations": iterations,
                               "callbacks": callbacks_model,
                               "class_weights": class_weights,
                               "transfer_learning": transfer_learning
        }

        # Start training process
        process_queue = mp.Queue()
        process_train = mp.Process(target=__training_process__,
                                   args=(process_queue,
                                         model_paras,
                                         data,
                                         datagen_paras,
                                         parameters_training))
        process_train.start()
        process_train.join()
        cv_history = process_queue.get()
        # Combine logged history objects
        hcv = {"cv_" + str(i) + "." + k: v for k, v in cv_history.items()}
        history_bagging = {**history_bagging, **hcv}

    # Return Bagging history object
    return history_bagging