LLM Attribution Classes¶
LLMAttribution¶
- class captum.attr.LLMAttribution(attr_method, tokenizer, attr_target='log_prob')[source]¶
Attribution class for large language models. It wraps a perturbation-based attribution algorthm to produce commonly interested attribution results for the use case of text generation. The wrapped instance will calculate attribution in the same way as configured in the original attribution algorthm, but it will provide a new “attribute” function which accepts text-based inputs and returns LLMAttributionResult
- Parameters:
attr_method (Attribution) – Instance of a supported perturbation attribution Supported methods include FeatureAblation, ShapleyValueSampling, ShapleyValues, Lime, and KernelShap. Lime and KernelShap do not support per-token attribution and will only return attribution for the full target sequence. class created with the llm model that follows huggingface style interface convention
tokenizer (Tokenizer) – tokenizer of the llm model used in the attr_method
attr_target (str) – attribute towards log probability or probability. Available values [“log_prob”, “prob”] Default: “log_prob”
- attribute(inp, target=None, num_trials=1, gen_args=None, use_cached_outputs=True, _inspect_forward=None, forward_in_tokens=True, **kwargs)[source]¶
- Parameters:
inp (InterpretableInput) – input prompt for which attributions are computed
target (str or Tensor, optional) – target response with respect to which attributions are computed. If None, it uses the model to generate the target based on the input and gen_args. Default: None
num_trials (int, optional) – number of trials to run. Return is the average attributions over all the trials. Defaults: 1.
gen_args (dict, optional) – arguments for generating the target. Only used if target is not given. When None, the default arguments are used, {“max_new_tokens”: 25, “do_sample”: False, “temperature”: None, “top_p”: None} Defaults: None
use_cached_outputs (bool, optional) – whether to use cached outputs when generating tokens in sequence. Only support huggingface GenerationMixin, since this functionality has to depend on the actual APIs of the model Defaults: True.
forward_in_tokens (bool, optional) – whether to use token-by-token forward or sequence-level forward. When True, it decodes tokens one by one to replicate the actual generation process authentically. When False, it concatenates the input and target tokens and forwards them in one pass, which is more efficient but may produce slightly different logits due to modern LLMs’ internal mechanisms like cache. Defaults: True.
**kwargs (Any) – any extra keyword arguments passed to the call of the underlying attribute function of the given attribution instance
- Returns:
- Attribution result. token_attr will be None
if attr method is Lime or KernelShap.
- Return type:
attr (LLMAttributionResult)
LLMGradientAttribution¶
- class captum.attr.LLMGradientAttribution(attr_method, tokenizer)[source]¶
Attribution class for large language models. It wraps a gradient-based attribution algorthm to produce commonly interested attribution results for the use case of text generation. The wrapped instance will calculate attribution in the same way as configured in the original attribution algorthm, with respect to the log probabilities of each generated token and the whole sequence. It will provide a new “attribute” function which accepts text-based inputs and returns LLMAttributionResult
- Parameters:
attr_method (Attribution) – instance of a supported perturbation attribution class created with the llm model that follows huggingface style interface convention
tokenizer (Tokenizer) – tokenizer of the llm model used in the attr_method
- attribute(inp, target=None, gen_args=None, **kwargs)[source]¶
- Parameters:
inp (InterpretableInput) – input prompt for which attributions are computed
target (str or Tensor, optional) – target response with respect to which attributions are computed. If None, it uses the model to generate the target based on the input and gen_args. Default: None
gen_args (dict, optional) – arguments for generating the target. Only used if target is not given. When None, the default arguments are used, {“max_new_tokens”: 25, “do_sample”: False, “temperature”: None, “top_p”: None} Defaults: None
**kwargs (Any) – any extra keyword arguments passed to the call of the underlying attribute function of the given attribution instance
- Returns:
attribution result
- Return type:
attr (LLMAttributionResult)
LLMAttributionResult¶
- class captum.attr.LLMAttributionResult(*, input_tokens, output_tokens, seq_attr, token_attr=None, output_probs=None, inp=None)[source]¶
Data class for the return result of LLMAttribution, which includes the necessary properties of the attribution. It also provides utilities to help present and plot the result in different forms.
- plot_image_heatmap(show=False, target_token_pos=None, border_width=2, show_legends=True)[source]¶
Plot the image in the input with the overlay of salience based on the attribution. Only available for certain multi-modality input types.
- Parameters:
show (bool) – whether to show the plot directly or return the figure and axis Default: False
target_token_pos (int or tuple[int, int] or None) – target token positions. Compute salience w.r.t the target tokens of the specified positions. If None, use the sequence attribuition. If int, use the attribution of the token at the given index. If tuple[int, int], like (m, n), use the summed token attribution of tokens from m to n (noninclusive) Default: None
border_width (int) – Width of the border around each mask segment in pixels. Set to 0 to disable borders. Only used when input has mask_list. Default: 2
show_legends (bool) – If True, display the mask id for each segment at its centroid. Only used when input has mask_list. Default: True
- Returns:
- If show is True, displays the plot and returns
None. If show is False, returns a tuple of (figure, axes) for further customization.
- Return type:
None or tuple[Figure, Axes]
- plot_seq_attr(show=False)[source]¶
Generate a matplotlib plot for visualising the attribution of the output sequence.