LLM Attribution Classes¶
LLMAttribution¶
- class captum.attr.LLMAttribution(attr_method, tokenizer, attr_target='log_prob')[source]¶
Attribution class for large language models. It wraps a perturbation-based attribution algorthm to produce commonly interested attribution results for the use case of text generation. The wrapped instance will calculate attribution in the same way as configured in the original attribution algorthm, but it will provide a new “attribute” function which accepts text-based inputs and returns LLMAttributionResult
- Parameters:
attr_method (Attribution) – Instance of a supported perturbation attribution Supported methods include FeatureAblation, ShapleyValueSampling, ShapleyValues, Lime, and KernelShap. Lime and KernelShap do not support per-token attribution and will only return attribution for the full target sequence. class created with the llm model that follows huggingface style interface convention
tokenizer (Tokenizer) – tokenizer of the llm model used in the attr_method
attr_target (str) – attribute towards log probability or probability. Available values [“log_prob”, “prob”] Default: “log_prob”
- attribute(inp, target=None, skip_tokens=None, num_trials=1, gen_args=None, use_cached_outputs=True, _inspect_forward=None, **kwargs)[source]¶
- Parameters:
inp (InterpretableInput) – input prompt for which attributions are computed
target (str or Tensor, optional) – target response with respect to which attributions are computed. If None, it uses the model to generate the target based on the input and gen_args. Default: None
skip_tokens (List[int] or List[str], optional) – the tokens to skip in the the output’s interpretable representation. Use this argument to define uninterested tokens, commonly like special tokens, e.g., sos, and unk. It can be a list of strings of the tokens or a list of integers of the token ids. Default: None
num_trials (int, optional) – number of trials to run. Return is the average attributions over all the trials. Defaults: 1.
gen_args (dict, optional) – arguments for generating the target. Only used if target is not given. When None, the default arguments are used, {“max_new_tokens”: 25, “do_sample”: False, “temperature”: None, “top_p”: None} Defaults: None
**kwargs (Any) – any extra keyword arguments passed to the call of the underlying attribute function of the given attribution instance
- Returns:
- Attribution result. token_attr will be None
if attr method is Lime or KernelShap.
- Return type:
attr (LLMAttributionResult)
LLMGradientAttribution¶
- class captum.attr.LLMGradientAttribution(attr_method, tokenizer)[source]¶
Attribution class for large language models. It wraps a gradient-based attribution algorthm to produce commonly interested attribution results for the use case of text generation. The wrapped instance will calculate attribution in the same way as configured in the original attribution algorthm, with respect to the log probabilities of each generated token and the whole sequence. It will provide a new “attribute” function which accepts text-based inputs and returns LLMAttributionResult
- Parameters:
attr_method (Attribution) – instance of a supported perturbation attribution class created with the llm model that follows huggingface style interface convention
tokenizer (Tokenizer) – tokenizer of the llm model used in the attr_method
- attribute(inp, target=None, skip_tokens=None, gen_args=None, **kwargs)[source]¶
- Parameters:
inp (InterpretableInput) – input prompt for which attributions are computed
target (str or Tensor, optional) – target response with respect to which attributions are computed. If None, it uses the model to generate the target based on the input and gen_args. Default: None
skip_tokens (List[int] or List[str], optional) – the tokens to skip in the the output’s interpretable representation. Use this argument to define uninterested tokens, commonly like special tokens, e.g., sos, and unk. It can be a list of strings of the tokens or a list of integers of the token ids. Default: None
gen_args (dict, optional) – arguments for generating the target. Only used if target is not given. When None, the default arguments are used, {“max_new_tokens”: 25, “do_sample”: False, “temperature”: None, “top_p”: None} Defaults: None
**kwargs (Any) – any extra keyword arguments passed to the call of the underlying attribute function of the given attribution instance
- Returns:
attribution result
- Return type:
attr (LLMAttributionResult)
LLMAttributionResult¶
- class captum.attr.LLMAttributionResult(seq_attr, token_attr, input_tokens, output_tokens)[source]¶
Data class for the return result of LLMAttribution, which includes the necessary properties of the attribution. It also provides utilities to help present and plot the result in different forms.