Skip to content

Dataset

evaluate_dataset(samples, labels, out_path, class_names=None, show=False, plot_barplot=False, plot_heatmap=False, suffix=None) ยค

Function for dataset evaluation (descriptive statistics).

Example
# Import libraries
from aucmedi import *
from aucmedi.evaluation import *

# Peak data information via the first pillar of AUCMEDI
ds = input_interface(interface="csv",                       # Interface type
                     path_imagedir="dataset/images/",
                     path_data="dataset/annotations.csv",
                     ohe=False, col_sample="ID", col_class="diagnosis")
(samples, class_ohe, nclasses, class_names, image_format) = ds

# Pass information to the evaluation function
evaluate_dataset(samples, class_ohe, out_path="./", class_names=class_names)

Created files in directory of out_path:

  • "plot.dataset.barplot.png"
  • "plot.dataset.heatmap.png"
Preview for Bar Plot

Evaluation_Dataset_Barplot

Based on dataset: ISIC 2019 Challenge.

Preview for Heatmap

Evaluation_Dataset_Heatmap

Based on first 50 samples from dataset: ISIC 2019 Challenge.

Parameters:

Name Type Description Default
samples list of str

List of sample/index encoded as Strings. Provided by input_interface.

required
labels numpy.ndarray

Classification list with One-Hot Encoding. Provided by input_interface.

required
out_path str

Path to directory in which plotted figures are stored.

required
class_names list of str

List of names for corresponding classes. Used for evaluation. Provided by input_interface. If not provided (None provided), class indices will be used.

None
show bool

Option, whether to also display the generated charts.

False
plot_barplot bool

Option, whether to generate a bar plot of class distribution.

False
plot_heatmap bool

Option, whether to generate a heatmap of class overview. Only recommended for subsets of ~50 samples.

False
suffix str

Special suffix to add in the created figure filename.

None

Returns:

Name Type Description
df_cf pandas.DataFrame

Dataframe containing the class distribution of the dataset.

Source code in aucmedi/evaluation/dataset.py
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def evaluate_dataset(samples,
                     labels,
                     out_path,
                     class_names=None,
                     show=False,
                     plot_barplot=False,
                     plot_heatmap=False,
                     suffix=None):
    """ Function for dataset evaluation (descriptive statistics).

    ???+ example
        ```python
        # Import libraries
        from aucmedi import *
        from aucmedi.evaluation import *

        # Peak data information via the first pillar of AUCMEDI
        ds = input_interface(interface="csv",                       # Interface type
                             path_imagedir="dataset/images/",
                             path_data="dataset/annotations.csv",
                             ohe=False, col_sample="ID", col_class="diagnosis")
        (samples, class_ohe, nclasses, class_names, image_format) = ds

        # Pass information to the evaluation function
        evaluate_dataset(samples, class_ohe, out_path="./", class_names=class_names)
        ```

    Created files in directory of `out_path`:

    - "plot.dataset.barplot.png"
    - "plot.dataset.heatmap.png"

    ???+ info "Preview for Bar Plot"
        ![Evaluation_Dataset_Barplot](../../images/evaluation.plot.dataset.barplot.png)

        Based on dataset: [ISIC 2019 Challenge](https://challenge.isic-archive.com/landing/2019/).

    ???+ info "Preview for Heatmap"
        ![Evaluation_Dataset_Heatmap](../../images/evaluation.plot.dataset.heatmap.png)

        Based on first 50 samples from dataset: [ISIC 2019 Challenge](https://challenge.isic-archive.com/landing/2019/).

    Args:
        samples (list of str):              List of sample/index encoded as Strings. Provided by
                                            [input_interface][aucmedi.data_processing.io_data.input_interface].
        labels (numpy.ndarray):             Classification list with One-Hot Encoding. Provided by
                                            [input_interface][aucmedi.data_processing.io_data.input_interface].
        out_path (str):                     Path to directory in which plotted figures are stored.
        class_names (list of str):          List of names for corresponding classes. Used for evaluation. Provided by
                                            [input_interface][aucmedi.data_processing.io_data.input_interface].
                                            If not provided (`None` provided), class indices will be used.
        show (bool):                        Option, whether to also display the generated charts.
        plot_barplot (bool):                Option, whether to generate a bar plot of class distribution.
        plot_heatmap (bool):                Option, whether to generate a heatmap of class overview. Only recommended for subsets of ~50 samples.
        suffix (str):                       Special suffix to add in the created figure filename.

    Returns:
        df_cf (pandas.DataFrame):           Dataframe containing the class distribution of the dataset.
    """

    # Generate barplot
    df_cf = evalby_barplot(labels, out_path, class_names, plot_barplot, show,
                           suffix)

    # Generate heatmap
    if plot_heatmap:
        evalby_heatmap(samples, labels, out_path, class_names, show, suffix)

    # Return table with class distribution
    return df_cf