prompttools.harness package#
Submodules#
prompttools.harness.chat_history_harness module#
- class prompttools.harness.chat_history_harness.ChatHistoryExperimentationHarness(model_name, chat_histories, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used for compare multiple chat histories.
- Parameters:
- prepare()#
Initializes and prepares the experiment.
- Return type:
None
- run()#
Runs the underlying experiment.
prompttools.harness.chat_model_comparison_harness module#
- class prompttools.harness.chat_model_comparison_harness.ChatModelComparisonHarness(model_names, chat_histories, runs=1, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used for comparing chat models. Multi-model version of
ChatHistoryExperimentationHarness.- Parameters:
model_names (List[str]) – The names of the models that you would like to compare
chat_histories (List[List[Dict[str, str]]]) – A list of chat histories that will be fed into the models.
runs (int) – Number of runs to execute. Defaults to
1.model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to
None.
- compare()#
- prepare()#
Initializes and prepares the experiment.
- Return type:
None
- run()#
Runs the underlying experiment.
prompttools.harness.document_retrieval_harness module#
prompttools.harness.function_call_harness module#
prompttools.harness.harness module#
- class prompttools.harness.harness.ExperimentationHarness#
Bases:
objectBase class for experimentation harnesses. This should not be used directly, please use the subclasses instead.
- evaluate(metric_name, eval_fn, static_eval_fn_kwargs={}, **eval_fn_kwargs)#
Uses the given eval_fn to evaluate the results of the underlying experiment.
- experiment: Experiment#
- property full_df#
- classmethod load_experiment(experiment_id)#
experiment_id (str): experiment ID of the experiment that you wish to load.
- Parameters:
experiment_id (str) –
- classmethod load_revision(revision_id)#
revision_id (str): revision ID of the experiment that you wish to load.
- Parameters:
revision_id (str) –
- property partial_df#
- prepare()#
Prepares the underlying experiment.
- Return type:
None
- rank(metric_name, is_average=False)#
Scores and ranks the experiment inputs using the pivot columns, e.g. prompt templates or system prompts.
- run(clear_previous_results=False)#
Runs the underlying experiment.
- Parameters:
clear_previous_results (bool) –
- Return type:
None
- save_experiment(name=None)#
- name (str, optional): Name of the experiment. This is optional if you have previously loaded an experiment
into this object.
- property score_df#
prompttools.harness.multi_experiment_harness module#
- class prompttools.harness.multi_experiment_harness.MultiExperimentHarness(experiments)#
Bases:
objectThis is designed to run experiments across multiple model providers. The underlying APIs for different models (e.g. LlamaCpp and OpenAI) are different, this provides a way to manage that complexity. This will run experiments for different providers, and combine the results into a single table.
The notebook “examples/notebooks/GPT4vsLlama2.ipynb” provides a good example how this can used to test prompts across different models.
- Parameters:
experiments (list[Experiment]) – The list of experiments that you would like to execute (e.g.
prompttools.experiment.OpenAICompletionExperiment)
- evaluate(metric_name, eval_fn)#
- gather_feedback()#
- Return type:
None
- prepare()#
- rank(metric_name, is_average=False)#
- run()#
prompttools.harness.prompt_template_harness module#
- class prompttools.harness.prompt_template_harness.PromptTemplateExperimentationHarness(experiment, model_name, prompt_templates, user_inputs, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used to test various prompt templates. We use jinja templates, e.g. “Answer the following question: {{input}}”.
- Parameters:
experiment (Type[Experiment]) – The experiment constructor that you would like to execute within the harness (e.g.
prompttools.experiment.OpenAICompletionExperiment)model_name (str) – The name of the model.
prompt_templates (List[str]) – A list of prompt
jinja-styled templates.user_inputs (List[Dict[str, str]]) – A list of dictionaries representing user inputs.
model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to
None.
- prepare()#
Creates prompts from templates to use for the experiment, and then initializes and prepares the experiment.
- Return type:
None
- run()#
Runs the underlying experiment.
prompttools.harness.system_prompt_harness module#
- class prompttools.harness.system_prompt_harness.SystemPromptExperimentationHarness(experiment, model_name, system_prompts, human_messages, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used to test various system prompts.
- Parameters:
experiment (Type[Experiment]) – The experiment that you would like to execute (e.g.
prompttools.experiment.OpenAICompletionExperiment)model_name (str) – The name of the model.
system_prompts (List[str]) – A list of system prompts for the model
human_messages (List[str]) – A list of human (user) messages to pass into the model
model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to
None. Note that the values are not lists.
- prepare()#
Creates messages to use for the experiment, and then initializes and prepares the experiment.
- Return type:
None
Module contents#
- class prompttools.harness.ChatHistoryExperimentationHarness(model_name, chat_histories, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used for compare multiple chat histories.
- Parameters:
- prepare()#
Initializes and prepares the experiment.
- Return type:
None
- run()#
Runs the underlying experiment.
- class prompttools.harness.ChatModelComparisonHarness(model_names, chat_histories, runs=1, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used for comparing chat models. Multi-model version of
ChatHistoryExperimentationHarness.- Parameters:
model_names (List[str]) – The names of the models that you would like to compare
chat_histories (List[List[Dict[str, str]]]) – A list of chat histories that will be fed into the models.
runs (int) – Number of runs to execute. Defaults to
1.model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to
None.
- compare()#
- experiment: Experiment#
- prepare()#
Initializes and prepares the experiment.
- Return type:
None
- run()#
Runs the underlying experiment.
- class prompttools.harness.ChatPromptTemplateExperimentationHarness(experiment, model_name, message_templates, user_inputs, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used to test various prompt templates for chat models. We use jinja templates, e.g. “Answer the following question: {{input}}”.
- Parameters:
experiment (Type[Experiment]) – The experiment constructor that you would like to execute within the harness (e.g.
prompttools.experiment.OpenAICompletionExperiment)model_name (str) – The name of the model.
message_templates (List[str]) – A list of prompt
jinja-styled templates. Each template should have two messages inside (first system prompt and second a user message).user_inputs (List[Dict[str, str]]) – A list of dictionaries representing user inputs.
model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to
None. Note that the values are not lists.
- prepare()#
Creates prompts from templates to use for the experiment, and then initializes and prepares the experiment.
- Return type:
None
- class prompttools.harness.ExperimentationHarness#
Bases:
objectBase class for experimentation harnesses. This should not be used directly, please use the subclasses instead.
- evaluate(metric_name, eval_fn, static_eval_fn_kwargs={}, **eval_fn_kwargs)#
Uses the given eval_fn to evaluate the results of the underlying experiment.
- experiment: Experiment#
- property full_df#
- classmethod load_experiment(experiment_id)#
experiment_id (str): experiment ID of the experiment that you wish to load.
- Parameters:
experiment_id (str) –
- classmethod load_revision(revision_id)#
revision_id (str): revision ID of the experiment that you wish to load.
- Parameters:
revision_id (str) –
- property partial_df#
- prepare()#
Prepares the underlying experiment.
- Return type:
None
- rank(metric_name, is_average=False)#
Scores and ranks the experiment inputs using the pivot columns, e.g. prompt templates or system prompts.
- run(clear_previous_results=False)#
Runs the underlying experiment.
- Parameters:
clear_previous_results (bool) –
- Return type:
None
- save_experiment(name=None)#
- name (str, optional): Name of the experiment. This is optional if you have previously loaded an experiment
into this object.
- property score_df#
- class prompttools.harness.ModelComparisonHarness(model_names, system_prompts, user_messages, model_arguments=[], runs=1)#
Bases:
ExperimentationHarnessAn experimentation harness used for comparing models.
- Parameters:
model_names (List[str]) – The names of the models that you would like to compare
system_prompts (List[str]) – A list of system messages, one for each model.
model_arguments (List[Optional[Dict]]) – A list of model arguments, one for each model.
user_messages (List[str]) –
runs (int) – Number of runs to execute. Defaults to
1.
- evaluate(metric_name, eval_fn, static_eval_fn_kwargs={}, **eval_fn_kwargs)#
Uses the given eval_fn to evaluate the results of the underlying experiment.
- property full_df#
- property partial_df#
- prepare()#
Initializes and prepares the experiment.
- Return type:
None
- run(clear_previous_results=False)#
Runs the underlying experiment.
- Parameters:
clear_previous_results (bool) –
- property score_df#
- class prompttools.harness.MultiExperimentHarness(experiments)#
Bases:
objectThis is designed to run experiments across multiple model providers. The underlying APIs for different models (e.g. LlamaCpp and OpenAI) are different, this provides a way to manage that complexity. This will run experiments for different providers, and combine the results into a single table.
The notebook “examples/notebooks/GPT4vsLlama2.ipynb” provides a good example how this can used to test prompts across different models.
- Parameters:
experiments (list[Experiment]) – The list of experiments that you would like to execute (e.g.
prompttools.experiment.OpenAICompletionExperiment)
- evaluate(metric_name, eval_fn)#
- gather_feedback()#
- Return type:
None
- prepare()#
- rank(metric_name, is_average=False)#
- run()#
- class prompttools.harness.PromptTemplateExperimentationHarness(experiment, model_name, prompt_templates, user_inputs, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used to test various prompt templates. We use jinja templates, e.g. “Answer the following question: {{input}}”.
- Parameters:
experiment (Type[Experiment]) – The experiment constructor that you would like to execute within the harness (e.g.
prompttools.experiment.OpenAICompletionExperiment)model_name (str) – The name of the model.
prompt_templates (List[str]) – A list of prompt
jinja-styled templates.user_inputs (List[Dict[str, str]]) – A list of dictionaries representing user inputs.
model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to
None.
- experiment: Experiment#
- prepare()#
Creates prompts from templates to use for the experiment, and then initializes and prepares the experiment.
- Return type:
None
- run()#
Runs the underlying experiment.
- class prompttools.harness.RetrievalAugmentedGenerationExperimentationHarness(vector_db_experiment, llm_experiment_cls, llm_arguments, extract_document_fn, extract_query_metadata_fn, prompt_template='Given these documents:{{documents}}\n\n{{prompt}}\n')#
Bases:
ExperimentationHarnessAn experimentation harness used to test the Retrieval-Augmented Generation process, which involves a vector DB and a LLM at the same time.
- Parameters:
vector_db_experiment (Experiment) – An initialized vector DB experiment.
llm_experiment_cls (Type[Experiment]) – The experiment constructor that you would like to execute within the harness (e.g.
prompttools.experiment.OpenAICompletionExperiment)llm_arguments (dict[str, list]) – Dictionary of arguments for the LLM.
extract_document_fn (Callable) – A function, when given a row of results from the vector DB experiment, extract the relevant documents (
list[str]) that will be inserted into the template.extract_query_metadata_fn (Callable) – A function, when given a row of results from the vector DB experiment, extract the relevant metadata and return a
strthat will be shown for visualization in the final result tableprompt_template (str) – A
jinja-styled templates, where documents and prompt will be inserted.
- run()#
Runs the underlying experiment.
- Return type:
None
- visualize()#
Displays a visualization of the experiment results.
- Return type:
None
- class prompttools.harness.SystemPromptExperimentationHarness(experiment, model_name, system_prompts, human_messages, model_arguments=None)#
Bases:
ExperimentationHarnessAn experimentation harness used to test various system prompts.
- Parameters:
experiment (Type[Experiment]) – The experiment that you would like to execute (e.g.
prompttools.experiment.OpenAICompletionExperiment)model_name (str) – The name of the model.
system_prompts (List[str]) – A list of system prompts for the model
human_messages (List[str]) – A list of human (user) messages to pass into the model
model_arguments (Optional[Dict[str, object]], optional) – Additional arguments for the model. Defaults to
None. Note that the values are not lists.
- experiment: Experiment#
- prepare()#
Creates messages to use for the experiment, and then initializes and prepares the experiment.
- Return type:
None