LLM Attribution Classes

LLMAttribution

class captum.attr.LLMAttribution(attr_method, tokenizer, attr_target='log_prob')[source]

Attribution class for large language models. It wraps a perturbation-based attribution algorthm to produce commonly interested attribution results for the use case of text generation. The wrapped instance will calculate attribution in the same way as configured in the original attribution algorthm, but it will provide a new “attribute” function which accepts text-based inputs and returns LLMAttributionResult

Parameters:
  • attr_method (Attribution) – Instance of a supported perturbation attribution Supported methods include FeatureAblation, ShapleyValueSampling, ShapleyValues, Lime, and KernelShap. Lime and KernelShap do not support per-token attribution and will only return attribution for the full target sequence. class created with the llm model that follows huggingface style interface convention

  • tokenizer (Tokenizer) – tokenizer of the llm model used in the attr_method

  • attr_target (str) – attribute towards log probability or probability. Available values [“log_prob”, “prob”] Default: “log_prob”

attribute(inp, target=None, num_trials=1, gen_args=None, use_cached_outputs=True, _inspect_forward=None, forward_in_tokens=True, **kwargs)[source]
Parameters:
  • inp (InterpretableInput) – input prompt for which attributions are computed

  • target (str or Tensor, optional) – target response with respect to which attributions are computed. If None, it uses the model to generate the target based on the input and gen_args. Default: None

  • num_trials (int, optional) – number of trials to run. Return is the average attributions over all the trials. Defaults: 1.

  • gen_args (dict, optional) – arguments for generating the target. Only used if target is not given. When None, the default arguments are used, {“max_new_tokens”: 25, “do_sample”: False, “temperature”: None, “top_p”: None} Defaults: None

  • use_cached_outputs (bool, optional) – whether to use cached outputs when generating tokens in sequence. Only support huggingface GenerationMixin, since this functionality has to depend on the actual APIs of the model Defaults: True.

  • forward_in_tokens (bool, optional) – whether to use token-by-token forward or sequence-level forward. When True, it decodes tokens one by one to replicate the actual generation process authentically. When False, it concatenates the input and target tokens and forwards them in one pass, which is more efficient but may produce slightly different logits due to modern LLMs’ internal mechanisms like cache. Defaults: True.

  • **kwargs (Any) – any extra keyword arguments passed to the call of the underlying attribute function of the given attribution instance

Returns:

Attribution result. token_attr will be None

if attr method is Lime or KernelShap.

Return type:

attr (LLMAttributionResult)

attribute_future()[source]

This method is not implemented for LLMAttribution.

Return type:

Callable[[], LLMAttributionResult]

LLMGradientAttribution

class captum.attr.LLMGradientAttribution(attr_method, tokenizer)[source]

Attribution class for large language models. It wraps a gradient-based attribution algorthm to produce commonly interested attribution results for the use case of text generation. The wrapped instance will calculate attribution in the same way as configured in the original attribution algorthm, with respect to the log probabilities of each generated token and the whole sequence. It will provide a new “attribute” function which accepts text-based inputs and returns LLMAttributionResult

Parameters:
  • attr_method (Attribution) – instance of a supported perturbation attribution class created with the llm model that follows huggingface style interface convention

  • tokenizer (Tokenizer) – tokenizer of the llm model used in the attr_method

attribute(inp, target=None, gen_args=None, **kwargs)[source]
Parameters:
  • inp (InterpretableInput) – input prompt for which attributions are computed

  • target (str or Tensor, optional) – target response with respect to which attributions are computed. If None, it uses the model to generate the target based on the input and gen_args. Default: None

  • gen_args (dict, optional) – arguments for generating the target. Only used if target is not given. When None, the default arguments are used, {“max_new_tokens”: 25, “do_sample”: False, “temperature”: None, “top_p”: None} Defaults: None

  • **kwargs (Any) – any extra keyword arguments passed to the call of the underlying attribute function of the given attribution instance

Returns:

attribution result

Return type:

attr (LLMAttributionResult)

attribute_future()[source]

This method is not implemented for LLMGradientAttribution.

Return type:

Callable[[], LLMAttributionResult]

LLMAttributionResult

class captum.attr.LLMAttributionResult(*, input_tokens, output_tokens, seq_attr, token_attr=None, output_probs=None, inp=None)[source]

Data class for the return result of LLMAttribution, which includes the necessary properties of the attribution. It also provides utilities to help present and plot the result in different forms.

plot_image_heatmap(show=False, target_token_pos=None, border_width=2, show_legends=True)[source]

Plot the image in the input with the overlay of salience based on the attribution. Only available for certain multi-modality input types.

Parameters:
  • show (bool) – whether to show the plot directly or return the figure and axis Default: False

  • target_token_pos (int or tuple[int, int] or None) – target token positions. Compute salience w.r.t the target tokens of the specified positions. If None, use the sequence attribuition. If int, use the attribution of the token at the given index. If tuple[int, int], like (m, n), use the summed token attribution of tokens from m to n (noninclusive) Default: None

  • border_width (int) – Width of the border around each mask segment in pixels. Set to 0 to disable borders. Only used when input has mask_list. Default: 2

  • show_legends (bool) – If True, display the mask id for each segment at its centroid. Only used when input has mask_list. Default: True

Returns:

If show is True, displays the plot and returns

None. If show is False, returns a tuple of (figure, axes) for further customization.

Return type:

None or tuple[Figure, Axes]

plot_seq_attr(show=False)[source]

Generate a matplotlib plot for visualising the attribution of the output sequence.

Parameters:

show (bool) – whether to show the plot directly or return the figure and axis Default: False

Returns:

If show is True, displays the plot and returns

None. If show is False, returns a tuple of (figure, axes) for further customization.

Return type:

None or tuple[Figure, Axes]

plot_token_attr(show=False)[source]

Generate a matplotlib plot for visualising the attribution of the output tokens.

Parameters:

show (bool) – whether to show the plot directly or return the figure and axis Default: False

Returns:

If show is True, displays the plot and returns

None. If show is False, returns a tuple of (figure, axes) for further customization.

Return type:

None or tuple[Figure, Axes]