# Utilities¶

## visualization¶

captum.attr._utils.visualization.visualize_image_attr(attr, original_image=None, method='heat_map', sign='absolute_value', plt_fig_axis=None, outlier_perc=2, cmap=None, alpha_overlay=0.5, show_colorbar=False, title=None, fig_size=(6, 6), use_pyplot=True)[source]

Visualizes attribution for a given image by normalizing attribution values of the desired sign (positive, negative, absolute value, or all) and displaying them using the desired mode in a matplotlib figure.

Parameters
• attr (numpy.array) – Numpy array corresponding to attributions to be visualized. Shape must be in the form (H, W, C), with channels as last dimension. Shape must also match that of the original image if provided.

• original_image (numpy.array, optional) – Numpy array corresponding to original image. Shape must be in the form (H, W, C), with channels as the last dimension. Image can be provided either with float values in range 0-1 or int values between 0-255. This is a necessary argument for any visualization method which utilizes the original image. Default: None

• method (string, optional) –

Chosen method for visualizing attribution. Supported options are:

1. heat_map - Display heat map of chosen attributions

2. blended_heat_map - Overlay heat map over greyscale

version of original image. Parameter alpha_overlay corresponds to alpha of heat map.

3. original_image - Only display original image.

5. alpha_scaling - Sets alpha channel of each pixel to be equal to normalized attribution value.

Default: heat_map

• sign (string, optional) –

Chosen sign of attributions to visualize. Supported

options are: 1. positive - Displays only positive pixel attributions. 2. absolute_value - Displays absolute value of

1. negative - Displays only negative pixel attributions.

2. all - Displays both positive and negative attribution

values. This is not supported for masked_image or alpha_scaling modes, since signed information cannot be represented in these modes.

Default: absolute_value

• plt_fig_axis (tuple, optional) – Tuple of matplotlib.pyplot.figure and axis on which to visualize. If None is provided, then a new figure and axis are created. Default: None

• outlier_perc (float, optional) – Top attribution values which correspond to a total of outlier_perc percentage of the total attribution are set to 1 and scaling is performed using the minimum of these values. For sign=all, outliers and scale value are computed using absolute value of attributions. Default: 2

• cmap (string, optional) – String corresponding to desired colormap for heatmap visualization. This defaults to “Reds” for negative sign, “Blues” for absolute value, “Greens” for positive sign, and a spectrum from red to green for all. Note that this argument is only used for visualizations displaying heatmaps. Default: None

• alpha_overlay (float, optional) – Alpha to set for heatmap when using blended_heat_map visualization mode, which overlays the heat map over the greyscaled original image. Default: 0.5

• show_colorbar (boolean, optional) – Displays colorbar for heatmap below the visualization. If given method does not use a heatmap, then a colormap axis is created and hidden. This is necessary for appropriate alignment when visualizing multiple plots, some with colorbars and some without. Default: False

• title (string, optional) – Title string for plot. If None, no title is set. Default: None

• fig_size (tuple, optional) – Size of figure created. Default: (6,6)

• use_pyplot (boolean, optional) – If true, uses pyplot to create and show figure and displays the figure after creating. If False, uses Matplotlib object oriented API and simply returns a figure object without showing. Default: True.

Returns

• figure (matplotlib.pyplot.figure):

Figure object on which visualization is created. If plt_fig_axis argument is given, this is the same figure provided.

• axis (matplotlib.pyplot.axis):

Axis object on which visualization is created. If plt_fig_axis argument is given, this is the same axis provided.

Return type

2-element tuple of figure, axis

Examples:

>>> # ImageClassifier takes a single input tensor of images Nx3x32x32,
>>> # and returns an Nx10 tensor of class probabilities.
>>> net = ImageClassifier()
>>> # Computes integrated gradients for class 3 for a given image .
>>> attribution, delta = ig.attribute(orig_image, target=3)
>>> # Displays blended heat map visualization of computed attributions.
>>> _ = visualize_image_attr(attribution, orig_image, "blended_heat_map")

captum.attr._utils.visualization.visualize_image_attr_multiple(attr, original_image, methods, signs, titles=None, fig_size=(8, 6), use_pyplot=True, **kwargs)[source]

Visualizes attribution using multiple visualization methods displayed in a 1 x k grid, where k is the number of desired visualizations.

Parameters
• attr (numpy.array) – Numpy array corresponding to attributions to be visualized. Shape must be in the form (H, W, C), with channels as last dimension. Shape must also match that of the original image if provided.

• original_image (numpy.array, optional) – Numpy array corresponding to original image. Shape must be in the form (H, W, C), with channels as the last dimension. Image can be provided either with values in range 0-1 or 0-255. This is a necessary argument for any visualization method which utilizes the original image.

• methods (list of strings) – List of strings of length k, defining method for each visualization. Each method must be a valid string argument for method to visualize_image_attr.

• signs (list of strings) – List of strings of length k, defining signs for each visualization. Each sign must be a valid string argument for sign to visualize_image_attr.

• titles (list of strings, optional) – List of strings of length k, providing a title string for each plot. If None is provided, no titles are added to subplots. Default: None

• fig_size (tuple, optional) – Size of figure created. Default: (8, 6)

• use_pyplot (boolean, optional) – If true, uses pyplot to create and show figure and displays the figure after creating. If False, uses Matplotlib object oriented API and simply returns a figure object without showing. Default: True.

• **kwargs (Any, optional) – Any additional arguments which will be passed to every individual visualization. Such arguments include show_colorbar, alpha_overlay, cmap, etc.

Returns

• figure (matplotlib.pyplot.figure):

Figure object on which visualization is created. If plt_fig_axis argument is given, this is the same figure provided.

• axis (matplotlib.pyplot.axis):

Axis object on which visualization is created. If plt_fig_axis argument is given, this is the same axis provided.

Return type

2-element tuple of figure, axis

Examples:

>>> # ImageClassifier takes a single input tensor of images Nx3x32x32,
>>> # and returns an Nx10 tensor of class probabilities.
>>> net = ImageClassifier()
>>> # Computes integrated gradients for class 3 for a given image .
>>> attribution, delta = ig.attribute(orig_image, target=3)
>>> # Displays original image and heat map visualization of
>>> # computed attributions side by side.
>>> _ = visualize_mutliple_image_attr(["original_image", "heat_map"],

class captum.attr._utils.visualization.VisualizationDataRecord(word_attributions, pred_prob, pred_class, target_class, attr_class, attr_score, raw_input, convergence_score)[source]

A data record for storing attribution relevant information

## Interpretable Embedding Base¶

class captum.attr._models.base.InterpretableEmbeddingBase(embedding, full_name)[source]

Since some embedding vectors, e.g. word are created and assigned in the embedding layers of Pytorch models we need a way to access those layers, generate the embeddings and subtract the baseline. To do so, we separate embedding layers from the model, compute the embeddings separately and do all operations needed outside of the model. The original embedding layer is being replaced by InterpretableEmbeddingBase layer which passes already precomputed embedding vectors to the layers below.

forward(input)[source]

The forward function of a wrapper embedding layer that takes and returns embedding layer. It allows embeddings to be created outside of the model and passes them seamlessly to the preceding layers of the model.

Parameters

inputs (tensor) – Input embedding tensor containing the embedding vectors of each word or token in the sequence.

Returns

Returns output tensor which is the same as input tensor. It passes embedding tensors to lower layers without any modifications.

Return type

tensor

indices_to_embeddings(input)[source]

Maps indices to corresponding embedding vectors. E.g. word embeddings

Parameters

input (tensor) – A tensor of input indices. A typical example of an input index is word or token index.

Returns

A tensor of word embeddings corresponding to the indices specified in the input

Return type

tensor

## Token Reference Base¶

class captum.attr._models.base.TokenReferenceBase(reference_token_idx=0)[source]

A base class for creating reference (aka baseline) tensor for a sequence of tokens. A typical example of such token is PAD. Users need to provide the index of the reference token in the vocabulary as an argument to TokenReferenceBase class.

generate_reference(sequence_length, device)[source]

Generated reference tensor of given sequence_length using reference_token_idx.

Parameters
• sequence_length (int) – The length of the reference sequence

• device (torch.device) – The device on which the reference tensor will be created.

Returns

A sequence of reference token with shape:

[sequence_length]

Return type

tensor

captum.attr._models.base._get_deep_layer_name(obj, layer_names)[source]

Traverses through the layer names that are separated by dot in order to access the embedding layer.

captum.attr._models.base._set_deep_layer_value(obj, layer_names, value)[source]

Traverses through the layer names that are separated by dot in order to access the embedding layer and update its value.

captum.attr._models.base.configure_interpretable_embedding_layer(model, embedding_layer_name='embedding')[source]

This method wraps model’s embedding layer with an interpretable embedding layer that allows us to access the embeddings through their indices.

Parameters
• model (torch.nn.Model) – An instance of PyTorch model that contains embeddings.

• embedding_layer_name (str, optional) – The name of the embedding layer in the model that we would like to make interpretable.

Returns

An instance of InterpretableEmbeddingBase

embedding layer that wraps model’s embedding layer that is being accessed through embedding_layer_name.

Return type

interpretable_emb (tensor)

Examples:

>>> # Let's assume that we have a DocumentClassifier model that
>>> # has a word embedding layer named 'embedding'.
>>> # To make that layer interpretable we need to execute the
>>> # following command:
>>> net = DocumentClassifier()
>>> interpretable_emb = configure_interpretable_embedding_layer(net,
>>>    'embedding')
>>> # then we can use interpretable embedding to convert our
>>> # word indices into embeddings.
>>> # Let's assume that we have the following word indices
>>> input_indices = torch.tensor([1, 0, 2])
>>> # we can access word embeddings for those indices with the command
>>> # line stated below.
>>> input_emb = interpretable_emb.indices_to_embeddings(input_indices)
>>> # Let's assume that we want to apply integrated gradients to
>>> # our model and that target attribution class is 3
>>> # after we finish the interpretation we need to remove
>>> # interpretable embedding layer with the following command:
>>> remove_interpretable_embedding_layer(net, interpretable_emb)

captum.attr._models.base.remove_interpretable_embedding_layer(model, interpretable_emb)[source]

Removes interpretable embedding layer and sets back original embedding layer in the model.

Parameters
• model (torch.nn.Module) – An instance of PyTorch model that contains embeddings

• interpretable_emb (tensor) – An instance of InterpretableEmbeddingBase that was originally created in configure_interpretable_embedding_layer function and has to be removed after interpretation is finished.

Examples:

>>> # Let's assume that we have a DocumentClassifier model that
>>> # has a word embedding layer named 'embedding'.
>>> # To make that layer interpretable we need to execute the
>>> # following command:
>>> net = DocumentClassifier()
>>> interpretable_emb = configure_interpretable_embedding_layer(net,
>>>    'embedding')
>>> # then we can use interpretable embedding to convert our
>>> # word indices into embeddings.
>>> # Let's assume that we have the following word indices
>>> input_indices = torch.tensor([1, 0, 2])
>>> # we can access word embeddings for those indices with the command
>>> # line stated below.
>>> input_emb = interpretable_emb.indices_to_embeddings(input_indices)
>>> # Let's assume that we want to apply integrated gradients to
>>> # our model and that target attribution class is 3
>>> # after we finish the interpretation we need to remove
>>> # interpretable embedding layer with the following command:
>>> remove_interpretable_embedding_layer(net, interpretable_emb)


class captum.attr._utils.attribution.Attribution[source]

All attribution algorithms extend this class. It enforces its child classes to extend and override core attribute method.

attribute(inputs, **kwargs)[source]

This method computes and returns the attribution values for each input tensor. Deriving classes are responsible for implementing its logic accordingly.

Parameters
• inputs (tensor or tuple of tensors) – Input for which attribution is computed. It can be provided as a single tensor or a tuple of multiple tensors. If multiple input tensors are provided, the batch sizes must be aligned accross all tensors.

• **kwargs (Any, optional) – Arbitrary keyword arguments used by specific attribution algorithms that extend this class.

Returns

• attributions (tensor or tuple of tensors):

Attribution values for each input tensor. The attributions have the same shape and dimensionality as the inputs. If a single tensor is provided as inputs, a single tensor is returned. If a tuple is provided for inputs, a tuple of corresponding sized tensors is returned.

Return type

tensor or tuple of tensors of attributions

compute_convergence_delta(attributions, *args)[source]

The attribution algorithms which derive Attribution class and provide convergence delta (aka approximation error) should implement this method. Convergence delta can be computed based on certain properties of the attribution alogrithms.

Parameters
• attributions (tensor or tuple of tensors) – Attribution scores that are precomputed by an attribution algorithm. Attributions can be provided in form of a single tensor or a tuple of those. It is assumed that attribution tensor’s dimension 0 corresponds to the number of examples, and if multiple input tensors are provided, the examples must be aligned appropriately.

• *args (optional) – Additonal arguments that are used by the sub-classes depending on the specific implementation of compute_convergence_delta.

Returns

• deltas (tensor):

Depending on specific implementaion of sub-classes, convergence delta can be returned per sample in form of a tensor or it can be aggregated across multuple samples and returned in form of a single floating point tensor.

Return type

tensor of deltas

has_convergence_delta()[source]

This method informs the user whether the attribution algorithm provides a convergence delta (aka an approximation error) or not. Convergence delta may serve as a proxy of correctness of attribution algorithm’s approximation. If deriving attribution class provides a compute_convergence_delta method, it should override both compute_convergence_delta and has_convergence_delta methods.

Returns

Returns whether the attribution algorithm provides a convergence delta (aka approximation error) or not.

Return type

bool

class captum.attr._utils.attribution.LayerAttribution(forward_func, layer, device_ids=None)[source]

Layer attribution provides attribution values for the given layer, quanitfying the importance of each neuron within the given layer’s output. The output attribution of calling attribute on a LayerAttribution object always matches the size of the layer output.

Parameters
• forward_func (callable or torch.nn.Module) – This can either be an instance of pytorch model or any modification of model’s forward function.

• layer (torch.nn.Module) – Layer for which output attributions are computed. Output size of attribute matches that of layer output.

• device_ids (list(int)) – Device ID list, necessary only if forward_func applies a DataParallel model, which allows reconstruction of intermediate outputs from batched results across devices. If forward_func is given as the DataParallel model itself, then it is not neccesary to provide this argument.

interpolate(interpolate_dims, interpolate_mode='nearest')[source]

Interpolates given 3D, 4D or 5D layer attribution to given dimensions. This is often utilized to upsample the attribution of a convolutional layer to the size of an input, which allows visualizing in the input space.

Parameters

• interpolate_dims (int or tuple) – Upsampled dimensions. The number of elements must be the number of dimensions of layer_attribution - 2, since the first dimension corresponds to number of examples and the second is assumed to correspond to the number of channels.

• interpolate_mode (str) – Method for interpolation, which must be a valid input interpolation mode for torch.nn.functional. These methods are “nearest”, “area”, “linear” (3D-only), “bilinear” (4D-only), “bicubic” (4D-only), “trilinear” (5D-only) based on the number of dimensions of the given layer attribution.

Returns

Upsampled layer attributions with first 2 dimensions matching slayer_attribution and remaining dimensions given by interpolate_dims.

Return type

class captum.attr._utils.attribution.NeuronAttribution(forward_func, layer, device_ids=None)[source]

Neuron attribution provides input attribution for a given neuron, quanitfying the importance of each input feature in the activation of a particular neuron. Calling attribute on a NeuronAttribution object requires also providing the index of the neuron in the output of the given layer for which attributions are required. The output attribution of calling attribute on a NeuronAttribution object always matches the size of the input.

Parameters
• forward_func (callable or torch.nn.Module) – This can either be an instance of pytorch model or any modification of model’s forward function.

• layer (torch.nn.Module) – Layer for which output attributions are computed. Output size of attribute matches that of layer output.

• device_ids (list(int)) – Device ID list, necessary only if forward_func applies a DataParallel model, which allows reconstruction of intermediate outputs from batched results across devices. If forward_func is given as the DataParallel model itself, then it is not neccesary to provide this argument.

attribute(inputs, neuron_index, **kwargs)[source]

This method computes and returns the neuron attribution values for each input tensor. Deriving classes are responsible for implementing its logic accordingly.

Parameters
• inputs – A single high dimensional input tensor or a tuple of them.

• neuron_index (int or tuple) – Tuple providing index of neuron in output of given layer for which attribution is desired. Length of this tuple must be one less than the number of dimensions in the output of the given layer (since dimension 0 corresponds to number of examples).

Returns

• attributions (tensor or tuple of tensors):

Attribution values for each input vector. The attributions have the dimensionality of inputs.

Return type

tensor or tuple of tensors of attributions

class captum.attr._utils.attribution.GradientAttribution(forward_func)[source]

All gradient based attribution algorithms extend this class. It requires a forward function, which most commonly is the forward function of the model that we want to interpret or the model itself.

Parameters

forward_func (callable or torch.nn.Module) – This can either be an instance of pytorch model or any modification of model’s forward function.

compute_convergence_delta(attributions, start_point, end_point, target=None, additional_forward_args=None)[source]

Here we provide a specific implementation for compute_convergence_delta which is based on a common property among gradient-based attribution algorithms. In the literature sometimes it is also called completeness axiom. Completeness axiom states that the sum of the attribution must be equal to the differences of NN Models’s function at its end and start points. In other words: sum(attributions) - (F(end_point) - F(start_point)) is close to zero. Returned delta of this method is defined as above stated difference.

This implementation assumes that both the start_point and end_point have the same shape and dimensionality. It also assumes that the target must have the same number of examples as the start_point and the end_point in case it is provided in form of a list or a non-singleton tensor.

Parameters
• attributions (tensor or tuple of tensors) – Precomputed attribution scores. The user can compute those using any attribution algorithm. It is assumed the the shape and the dimensionality of attributions must match the shape and the dimensionality of start_point and end_point. It also assumes that the attribution tensor’s dimension 0 corresponds to the number of examples, and if multiple input tensors are provided, the examples must be aligned appropriately.

• start_point (tensor or tuple of tensors, optional) – start_point is passed as an input to model’s forward function. It is the starting point of attributions’ approximation. It is assumed that both start_point and end_point have the same shape and dimensionality.

• end_point (tensor or tuple of tensors) – end_point is passed as an input to model’s forward function. It is the end point of attributions’ approximation. It is assumed that both start_point and end_point have the same shape and dimensionality.

• target (int, tuple, tensor or list, optional) –

Output indices for which gradients are computed (for classification cases, this is usually the target class). If the network returns a scalar value per example, no target index is necessary. For general 2D outputs, targets can be either:

• a single integer or a tensor containing a single

integer, which is applied to all input examples

• a list of integers or a 1D tensor, with length matching

the number of examples in inputs (dim 0). Each integer is applied as the target for the corresponding example.

For outputs with > 2 dimensions, targets can be either:

• A single tuple, which contains #output_dims - 1

elements. This target index is applied to all examples.

• A list of tuples with length equal to the number of

examples in inputs (dim 0), and each tuple containing #output_dims - 1 elements. Each tuple is applied as the target for the corresponding example.

Default: None

• additional_forward_args (tuple, optional) – If the forward function requires additional arguments other than the inputs for which attributions should not be computed, this argument can be provided. It must be either a single additional argument of a Tensor or arbitrary (non-tuple) type or a tuple containing multiple additional arguments including tensors or any arbitrary python types. These arguments are provided to forward_func in order following the arguments in inputs. For a tensor, the first dimension of the tensor must correspond to the number of examples. additional_forward_args is used both for start_point and end_point when computing the forward pass. Default: None

Returns

• deltas (tensor):

This implementation returns convergence delta per sample. Deriving sub-classes may do any type of aggregation of those values, if necessary.

Return type

tensor of deltas