LRP

class captum.attr.LRP(model)[source]

Layer-wise relevance propagation is based on a backward propagation mechanism applied sequentially to all layers of the model. Here, the model output score represents the initial relevance which is decomposed into values for each neuron of the underlying layers. The decomposition is defined by rules that are chosen for each layer, involving its weights and activations. Details on the model can be found in the original paper [https://doi.org/10.1371/journal.pone.0130140]. The implementation is inspired by the tutorial of the same group [https://doi.org/10.1016/j.dsp.2017.10.011] and the publication by Ancona et al. [https://openreview.net/forum?id=Sy21R9JAW].

Parameters:

model (Module) – The forward function of the model or any modification of it. Custom rules for a given layer need to be defined as attribute module.rule and need to be of type PropagationRule. If no rule is specified for a layer, a pre-defined default rule for the module type is used.

attribute(inputs, target=None, additional_forward_args=None, return_convergence_delta=False, verbose=False)[source]
Parameters:
  • inputs (Tensor or tuple[Tensor, ...]) – Input for which relevance is propagated. If model takes a single tensor as input, a single input tensor should be provided. If model takes multiple tensors as input, a tuple of the input tensors should be provided. It is assumed that for all given input tensors, dimension 0 corresponds to the number of examples, and if multiple input tensors are provided, the examples must be aligned appropriately.

  • target (int, tuple, Tensor, or list, optional) –

    Output indices for which gradients are computed (for classification cases, this is usually the target class). If the network returns a scalar value per example, no target index is necessary. For general 2D outputs, targets can be either:

    • a single integer or a tensor containing a single

      integer, which is applied to all input examples

    • a list of integers or a 1D tensor, with length matching

      the number of examples in inputs (dim 0). Each integer is applied as the target for the corresponding example.

    For outputs with > 2 dimensions, targets can be either:

    • A single tuple, which contains #output_dims - 1

      elements. This target index is applied to all examples.

    • A list of tuples with length equal to the number of

      examples in inputs (dim 0), and each tuple containing #output_dims - 1 elements. Each tuple is applied as the target for the corresponding example.

    Default: None

  • additional_forward_args (tuple, optional) – If the forward function requires additional arguments other than the inputs for which attributions should not be computed, this argument can be provided. It must be either a single additional argument of a Tensor or arbitrary (non-tuple) type or a tuple containing multiple additional arguments including tensors or any arbitrary python types. These arguments are provided to model in order, following the arguments in inputs. Note that attributions are not computed with respect to these arguments. Default: None

  • return_convergence_delta (bool, optional) – Indicates whether to return convergence delta or not. If return_convergence_delta is set to True convergence delta will be returned in a tuple following attributions. Default: False

  • verbose (bool, optional) – Indicates whether information on application of rules is printed during propagation.

Return type:

Union[TypeVar(TensorOrTupleOfTensorsGeneric, Tensor, Tuple[Tensor, ...]), Tuple[TypeVar(TensorOrTupleOfTensorsGeneric, Tensor, Tuple[Tensor, ...]), Tensor]]

Returns:

Tensor or tuple[Tensor, …] of attributions or 2-element tuple of attributions, delta:

  • attributions (Tensor or tuple[Tensor, …]):

    The propagated relevance values with respect to each input feature. The values are normalized by the output score value (sum(relevance)=1). To obtain values comparable to other methods or implementations these values need to be multiplied by the output score. Attributions will always be the same size as the provided inputs, with each value providing the attribution of the corresponding input index. If a single tensor is provided as inputs, a single tensor is returned. If a tuple is provided for inputs, a tuple of corresponding sized tensors is returned. The sum of attributions is one and not corresponding to the prediction score as in other implementations.

  • delta (Tensor, returned if return_convergence_delta=True):

    Delta is calculated per example, meaning that the number of elements in returned delta tensor is equal to the number of of examples in the inputs.

Examples:

>>> # ImageClassifier takes a single input tensor of images Nx3x32x32,
>>> # and returns an Nx10 tensor of class probabilities. It has one
>>> # Conv2D and a ReLU layer.
>>> net = ImageClassifier()
>>> lrp = LRP(net)
>>> input = torch.randn(3, 3, 32, 32)
>>> # Attribution size matches input size: 3x3x32x32
>>> attribution = lrp.attribute(input, target=5)
compute_convergence_delta(attributions, output)[source]

Here, we use the completeness property of LRP: The relevance is conserved during the propagation through the models’ layers. Therefore, the difference between the sum of attribution (relevance) values and model output is taken as the convergence delta. It should be zero for functional attribution. However, when rules with an epsilon value are used for stability reasons, relevance is absorbed during propagation and the convergence delta is non-zero.

Parameters:
  • attributions (Tensor or tuple[Tensor, ...]) – Attribution scores that are precomputed by an attribution algorithm. Attributions can be provided in form of a single tensor or a tuple of those. It is assumed that attribution tensor’s dimension 0 corresponds to the number of examples, and if multiple input tensors are provided, the examples must be aligned appropriately.

  • output (Tensor) – The output value with respect to which the attribution values are computed. This value corresponds to the target score of a classification model. The given tensor should only have a single element.

Returns:

  • delta Difference of relevance in output layer and input layer.

Return type:

Tensor

has_convergence_delta()[source]

This method informs the user whether the attribution algorithm provides a convergence delta (aka an approximation error) or not. Convergence delta may serve as a proxy of correctness of attribution algorithm’s approximation. If deriving attribution class provides a compute_convergence_delta method, it should override both compute_convergence_delta and has_convergence_delta methods.

Returns:

Returns whether the attribution algorithm provides a convergence delta (aka approximation error) or not.

Return type:

bool