Harness#

There are two main abstractions used in the prompttools library: Experiments and Harnesses. Occasionally, you may want to use a harness, because it abstracts away more details.

A harness is built on top of an experiment, and manages abstractions over inputs. For example, the PromptTemplateExperimentationHarness freezes one set of model arguments and varies the prompt input based on prompt templates and user inputs. It then constructs a corresponding experiment, and keeps track of the templates and inputs used for each prompt.

class prompttools.harness.ExperimentationHarness#

Base class for experimentation harnesses. This should not be used directly, please use the subclasses instead.

evaluate(metric_name, eval_fn, static_eval_fn_kwargs={}, **eval_fn_kwargs)#

Uses the given eval_fn to evaluate the results of the underlying experiment.

Parameters:
  • metric_name (str) –

  • eval_fn (Callable) –

  • static_eval_fn_kwargs (dict) –

Return type:

None

classmethod load_experiment(experiment_id)#

experiment_id (str): experiment ID of the experiment that you wish to load.

Parameters:

experiment_id (str) –

classmethod load_revision(revision_id)#

revision_id (str): revision ID of the experiment that you wish to load.

Parameters:

revision_id (str) –

prepare()#

Prepares the underlying experiment.

Return type:

None

rank(metric_name, is_average=False)#

Scores and ranks the experiment inputs using the pivot columns, e.g. prompt templates or system prompts.

Parameters:
  • metric_name (str) –

  • is_average (bool) –

Return type:

dict[str, float]

run(clear_previous_results=False)#

Runs the underlying experiment.

Parameters:

clear_previous_results (bool) –

Return type:

None

save_experiment(name=None)#
name (str, optional): Name of the experiment. This is optional if you have previously loaded an experiment

into this object.

Parameters:

name (Optional[str]) –

visualize(pivot=False)#

Displays a visualization of the experiment results.

Parameters:

pivot (bool) –

Return type:

None

class prompttools.harness.ChatHistoryExperimentationHarness(model_name, chat_histories, model_arguments=None)#

An experimentation harness used for compare multiple chat histories.

Parameters:
  • model_name (str) – The name of the model.

  • chat_histories (List[List[Dict[str, str]]]) – A list of chat histories that will be fed into the model.

  • model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to None.

class prompttools.harness.ChatModelComparisonHarness(model_names, chat_histories, runs=1, model_arguments=None)#

An experimentation harness used for comparing chat models. Multi-model version of ChatHistoryExperimentationHarness.

Parameters:
  • model_names (List[str]) – The names of the models that you would like to compare

  • chat_histories (List[List[Dict[str, str]]]) – A list of chat histories that will be fed into the models.

  • runs (int) – Number of runs to execute. Defaults to 1.

  • model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to None.

class prompttools.harness.MultiExperimentHarness(experiments)#

This is designed to run experiments across multiple model providers. The underlying APIs for different models (e.g. LlamaCpp and OpenAI) are different, this provides a way to manage that complexity. This will run experiments for different providers, and combine the results into a single table.

The notebook “examples/notebooks/GPT4vsLlama2.ipynb” provides a good example how this can used to test prompts across different models.

Parameters:

experiments (list[Experiment]) – The list of experiments that you would like to execute (e.g. prompttools.experiment.OpenAICompletionExperiment)

class prompttools.harness.PromptTemplateExperimentationHarness(experiment, model_name, prompt_templates, user_inputs, model_arguments=None)#

An experimentation harness used to test various prompt templates. We use jinja templates, e.g. “Answer the following question: {{input}}”.

Parameters:
  • experiment (Type[Experiment]) – The experiment constructor that you would like to execute within the harness (e.g. prompttools.experiment.OpenAICompletionExperiment)

  • model_name (str) – The name of the model.

  • prompt_templates (List[str]) – A list of prompt jinja-styled templates.

  • user_inputs (List[Dict[str, str]]) – A list of dictionaries representing user inputs.

  • model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to None.

class prompttools.harness.SystemPromptExperimentationHarness(experiment, model_name, system_prompts, human_messages, model_arguments=None)#

An experimentation harness used to test various system prompts.

Parameters:
  • experiment (Type[Experiment]) – The experiment that you would like to execute (e.g. prompttools.experiment.OpenAICompletionExperiment)

  • model_name (str) – The name of the model.

  • system_prompts (List[str]) – A list of system prompts for the model

  • human_messages (List[str]) – A list of human (user) messages to pass into the model

  • model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to None. Note that the values are not lists.