prompttools.experiment.experiments package#

Submodules#

prompttools.experiment.experiments.anthropic_claude_experiment module#

prompttools.experiment.experiments.chromadb_experiment module#

class prompttools.experiment.experiments.chromadb_experiment.ChromaDBExperiment(chroma_client, collection_name, use_existing_collection, query_collection_params, embedding_fns=[None], embedding_fn_names=['default'], add_to_collection_params=None)#

Bases: Experiment

Perform an experiment with ChromaDB to test different embedding functions or retrieval arguments. You can query from an existing collection, or create a new one (and insert documents into it) during the experiment. If you choose to create a new collection, it will be automatically cleaned up as the experiment ends.

Parameters:

chroma_client (chromadb.Client) – ChromaDB client to interact with your database
collection_name (str) – the collection that you will get or create
use_existing_collection (bool) – determines whether to create a new collection or use an existing one
query_collection_params (dict[str, list]) – parameters used to query the collection Each value is expected to be a list to create all possible combinations
embedding_fns (list[Callable]) – embedding functions to test in the experiment by default only uses the default one in ChromaDB
embedding_fn_names (list[str]) – names of the embedding functions
add_to_collection_params (Optional[dict]) – documents or embeddings that will be added to the newly created collection

PARAMETER_NAMES = ['chroma_client']#

all_args: Dict#

argument_combos: list[dict]#

chroma_client: chromadb.Client#

chromadb_completion_fn(collection, **query_params)#

ChromaDB helper function to make request

Parameters:

collection (chromadb.api.Collection) –
query_params (Dict[str, Any]) –

completion_fn: Callable#

classmethod initialize(test_parameters, frozen_parameters)#

An alternate way to initialize an experiment by specifying which parameters should be tested and which ones should be frozen. If a parameter is not specified, the default value (if exists) for the parameter will be used.

This allows you to easily initialize an experiment without wrapping every parameter in a list.

Note

For a given experiment, some parameters must be specified (e.g. the model parameter for OpenAI Chat Experiment). See the experiment’s __init__ method.
Each of test_parameters’s values should be a list, but not for frozen_parameters.

Parameters:

test_parameters (dict[str, list]) – parameters that are being tested. A list of multiple test values should be the value (e.g. {model: ["gpt-3.5-turbo", "gpt-4"], temperature: [0,0. 1.0]})
frozen_parameters (dict) – parameters that are intended to be frozen across different configuration. There is no need to wrap the value in a list. (e.g. {top_p: 1.0, presence_penalty: 0.0})

Example

>>> from prompttools.experiment import OpenAIChatExperiment
>>> test_parameters = {"model": ["gpt-3.5-turbo", "gpt-4"]}
>>> messages = [{"role": "user", "content": "Who was the first president?"}]
>>> frozen_parameters = {"top_p": 1.0, "messages": messages}
>>> experiment = OpenAIChatExperiment.initialize(test_parameters, frozen_parameters)

prepare()#

Creates argument combinations by taking the cartesian product of all inputs.

Return type:: None

run(runs=1)#

Create tuples of input and output for every possible combination of arguments.

Note

If you overwrite this method in a subclass, make sure your method calls _construct_result_dfs in order to save the results from your run as DataFrames. Then, they can later be used for evaluation, aggregation, and persistence.

Parameters:

runs (int) – number of times to execute each possible combination of arguments, defaults to 1.
clear_previous_results (bool) – clear previous results before running

prompttools.experiment.experiments.error module#

exception prompttools.experiment.experiments.error.PromptExperimentException#

Bases: Exception

An exception to throw when something goes wrong with the prompt test setup

prompttools.experiment.experiments.experiment module#

class prompttools.experiment.experiments.experiment.Experiment#

Bases: object

Base class for experiment. This should not be used directly, please use the subclasses instead.

aggregate(metric_name, column_name, is_average=False)#

Aggregates a metric for a given column and displays to the user.

Args:
metric_name (str): metric to aggregate column_name (str): column to base the aggregation on is_average (bool): if True, compute the average for the metric, else compute the total

all_args: Dict#

argument_combos: list[dict]#

completion_fn: Callable#

cv2_image_to_base64(image)#

display_image_html(base64_string)#

evaluate(metric_name, eval_fn, static_eval_fn_kwargs={}, image_experiment=False, **eval_fn_kwargs)#

Using the given evaluation function that accepts a row of data, compute a new column with the evaluation result. Each row of data generally contain inputs, model response, and other previously computed metrics.

Parameters:

metric_name (str) – name of the metric being computed
eval_fn (Callable) – an evaluation function that takes in a row from pd.DataFrame and optional keyword arguments
static_eval_fn_kwargs (dict) – keyword args for eval_fn that are consistent for all rows
eval_fn_kwargs (Optional[list]) – keyword args for eval_fn that may be different for each row. Each value entered here should be a list, and the length of the list should be the same as the number of responses in the experiment’s result. The ``i``th element of the list will be passed to the evaluation function to evaluate the ``i``th row.
image_experiment (bool) –

Return type:

None

Example

>>> from prompttools.utils import validate_json_response
>>> experiment.evaluate("is_json", validate_json_response,
>>>                     static_eval_fn_kwargs={"response_column_name": "response"})

get_table(get_all_cols=False)#

Get the DataFrame in one of two versions: 1. get_all_cols = False - good for visualization. This contains dynamic (non-frozen) input arguments,

the text response, and scores (e.g. latency and metrics generated from evaluation).

get_all_cols = True - good for full result. This contains full data with all
input arguments (including frozen ones), full model response (not just the text response), and scores.

Parameters:: get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
Return type:: DataFrame

classmethod initialize(test_parameters, frozen_parameters)#

This allows you to easily initialize an experiment without wrapping every parameter in a list.

Note

For a given experiment, some parameters must be specified (e.g. the model parameter for OpenAI Chat Experiment). See the experiment’s __init__ method.
Each of test_parameters’s values should be a list, but not for frozen_parameters.

Parameters:

test_parameters (dict[str, list]) – parameters that are being tested. A list of multiple test values should be the value (e.g. {model: ["gpt-3.5-turbo", "gpt-4"], temperature: [0,0. 1.0]})
frozen_parameters (dict) – parameters that are intended to be frozen across different configuration. There is no need to wrap the value in a list. (e.g. {top_p: 1.0, presence_penalty: 0.0})

Example

>>> from prompttools.experiment import OpenAIChatExperiment
>>> test_parameters = {"model": ["gpt-3.5-turbo", "gpt-4"]}
>>> messages = [{"role": "user", "content": "Who was the first president?"}]
>>> frozen_parameters = {"top_p": 1.0, "messages": messages}
>>> experiment = OpenAIChatExperiment.initialize(test_parameters, frozen_parameters)

pivot_table(pivot_columns, response_value_name=None, get_all_cols=False)#

Returns a pivoted DataFrame.

Parameters:

pivot_columns (List[str]) – two column names (first for pivot row, second for pivot column) that serve as indices the pivot table
response_value_name (Optional[str]) – name of the column to aggregate.
get_all_cols (bool) – defaults to False. If True, it will visualize the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.

Return type:

DataFrame

prepare()#

Creates argument combinations by taking the cartesian product of all inputs.

Return type:: None

rank(metric_name, is_average, agg_column, get_all_cols=False)#

Using pivot data, groups the data by the first pivot column to get scores, and sorts descending. For example, using pivot data of (prompt_template, user_input), a metric of latency, and is_average=True, we rank prompt templates by their average latency in the test set.

Parameters:

metric_name (str) – metric to aggregate over
is_average (bool) – if True, compute the average for the metric, else compute the total
agg_column (str) – column to aggregate over
get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.

Return type:

Dict[str, int]

run(runs=1, clear_previous_results=False)#

Create tuples of input and output for every possible combination of arguments.

Note

Parameters:

runs (int) – number of times to execute each possible combination of arguments, defaults to 1.
clear_previous_results (bool) – clear previous results before running

Return type:

None

to_csv(path, get_all_cols=True, **kwargs)#

Export the results to a CSV file. If the experiment has not been executed, it will run.

Parameters:

path (str) – path/buffer to write the CSV output
get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
**kwargs – optional arguments passed to pd.DataFrame.to_csv()

to_json(path=None, get_all_cols=True, **kwargs)#

Export the results to a JSON file. If the experiment has not been executed, it will run.

Parameters:

path (Optional[str]) – path/buffer to write the JSON output, defaults to None which returns the JSON as a dict
get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
**kwargs – optional arguments passed to pd.DataFrame.to_json()

to_lora_json(instruction_extract, input_extract, output_extract, path=None, **kwargs)#

Export the results to a LoRA-format JSON file for fine-tuning. If the experiment has not been executed, it will run.

Parameters:

instruction_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to "instruction" entry in the JSON file
input_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to "input" entry in the JSON file
output_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to "output" entry in the JSON file
path (Optional[str]) – path/buffer to write the JSON output, defaults to None which returns the JSON as a dict
**kwargs – optional arguments passed to pd.DataFrame.to_json()

to_markdown()#

to_mongo_db(mongo_uri, database_name, collection_name)#

Insert the results of the experiment into MongoDB for persistence.

Note

You need to install the pymongo package to use this method.
You need to run a local or remote instance of MongoDB in order to store the data.

Parameters:

mongo_uri (str) – a connection string to the target MongoDB
database_name (str) – name of the MongoDB database
collection_name (str) – name of the MongoDB collection

Return type:

None

to_pandas_df(get_all_cols=True, from_streamlit=False)#

Return the results as a pandas.DataFrame. If the experiment has not been executed, it will run.

Parameters:

get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
from_streamlit (bool) –

visualize(get_all_cols=False, pivot=False, pivot_columns=[])#

Visualize the DataFrame in one of two versions: 1. get_all_cols = False - good for visualization. This contains dynamic (non-frozen) input arguments,

the text response, and scores (e.g. latency and metrics generated from evaluation).

get_all_cols = True - good for full result. This contains full data with all
input arguments (including frozen ones), full model response (not just the text response), and scores.

Parameters:

get_all_cols (bool) – defaults to False. If True, it will visualize the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
pivot (bool) –
pivot_columns (list) –

Return type:

None

prompttools.experiment.experiments.google_palm_experiment module#

class prompttools.experiment.experiments.google_palm_experiment.GooglePaLMCompletionExperiment(model, prompt, temperature=[None], candidate_count=[None], max_output_tokens=[None], top_p=[None], top_k=[None], safety_settings=[None], stop_sequences=[None])#

Bases: Experiment

This class defines an experiment for Google PaLM’s generate text API. It accepts lists for each argument passed into PaLM’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
You should set os.environ["GOOGLE_PALM_API_KEY"] = YOUR_KEY in order to connect with PaLM’s API.

Parameters:

model (list[str]) – Which model to call, as a string or a types.Model (e.g. 'models/text-bison-001').
prompt (list[str]) – Free-form input text given to the model. Given a prompt, the model will generate text that completes the input text.
temperature (list[float]) – Controls the randomness of the output. Must be positive. Typical values are in the range: [0.0, 1.0]. Higher values produce a more random and varied response. A temperature of zero will be deterministic.
candidate_count (list[int]) – The maximum number of generated response messages to return. This value must be between [1, 8], inclusive. If unset, this will default to 1.
max_output_tokens (list[int]) – Maximum number of tokens to include in a candidate. Must be greater than zero. If unset, will default to 64.
top_k (list[float]) – The API uses combined nucleus and top-k sampling. top_k sets the maximum number of tokens to sample from on each step.
top_p (list[float]) – The API uses combined nucleus and top-k sampling. top_p configures the nucleus sampling. It sets the maximum cumulative probability of tokens to sample from.
safety_settings (list[Iterable[palm.types.SafetySettingDict]]) – A list of unique types.SafetySetting instances for blocking unsafe content.
stop_sequences (list[Union[str, Iterable[str]]]) – A set of up to 5 character sequences that will stop output generation. If specified, the API will stop at the first appearance of a stop sequence.

palm_completion_fn(**input_args)#

prompttools.experiment.experiments.huggingface_endpoint_experiment module#

prompttools.experiment.experiments.huggingface_hub_experiment module#

class prompttools.experiment.experiments.huggingface_hub_experiment.HuggingFaceHubExperiment(repo_id, prompt, task=['text-generation'], **kwargs)#

Bases: Experiment

Experiment for Hugging Face Hub’s API. It accepts lists for each argument passed into Hugging Face Hub’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments. For example, kwargs should have string keys, with ``list``s being the values.

Parameters:

repo_id (List[str]) – IDs of repository (e.g. [user/bert-base-uncased]).
prompt (List[str] | List[PromptSelector]) – list of prompts to test
task (List[str]) – List of tasks in strings. Determines whether to force a task instead of using task specified in the repository.
**kwargs (Dict[str, list[object]]) – Keyword parameters used in the call to InferenceApi. The values should be ``list``s.

CALL_PARAMETERS = ['prompt']#

MODEL_PARAMETERS = ['repo_id', 'task']#

all_args: Dict#

argument_combos: list[dict]#

completion_fn: Callable#

hf_completion_fn(**params)#

Local model helper function to make request

Parameters:: params (Dict[str, Any]) –

prepare()#

Creates argument combinations by taking the cartesian product of all inputs.

Return type:: None

run(runs=1)#

Create tuples of input and output for every possible combination of arguments. For each combination, it will execute runs times, default to 1. # TODO This can be done with an async queue

Parameters:: runs (int) –
Return type:: None

prompttools.experiment.experiments.llama_cpp_experiment module#

class prompttools.experiment.experiments.llama_cpp_experiment.LlamaCppExperiment(model_path, prompt, model_params={}, call_params={})#

Bases: Experiment

Used to experiment across parameters for a local model, supported by LlamaCpp and GGML.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments. For example, model_params should have string keys, with ``list``s being the values.

Parameters:

model_path (List[str]) – list of paths to the models that you would like to run
prompt (List[str] | List[PromptSelector]) – list of prompts to test
model_params (Dict[str, list[object]]) – Parameters for initializing the model. The values should be ``list``s.
call_params (Dict[str, list[object]]) – (Dict[str, list[object]]): Parameters for calling the model completion function. The values should be ``list``s.

CALL_PARAMETERS = ('prompt', 'suffix', 'max_tokens', 'temperature', 'top_p', 'logprobs', 'echo', 'stop', 'repeat_penalty', 'top_k')#

DEFAULT = {'echo': [False], 'f16_kv': [True], 'last_n_tokens_size': [64], 'logits_all': [False], 'logprobs': [None], 'lora_base': [None], 'lora_path': [None], 'max_tokens': [128], 'n_batch': [512], 'n_ctx': [512], 'n_parts': [-1], 'n_threads': [None], 'repeat_penalty': [1.1], 'seed': [1337], 'stop': [None], 'suffix': [None], 'temperature': [0.8], 'top_k': [40], 'top_p': [0.95], 'use_mlock': [False], 'use_mmap': [True], 'verbose': [True], 'vocab_only': [False]}#

MODEL_PARAMETERS = ('model_path', 'lora_path', 'lora_base', 'n_ctx', 'n_parts', 'seed', 'f16_kv', 'logits_all', 'vocab_only', 'use_mlock', 'n_threads', 'n_batch', 'use_mmap', 'last_n_tokens_size', 'verbose')#

all_args: Dict#

argument_combos: list[dict]#

completion_fn: Callable#

classmethod initialize(test_parameters, frozen_parameters)#

This allows you to easily initialize an experiment without wrapping every parameter in a list.

Note

For a given experiment, some parameters must be specified (e.g. the model parameter for OpenAI Chat Experiment). See the experiment’s __init__ method.
Each of test_parameters’s values should be a list, but not for frozen_parameters.

Parameters:

test_parameters (dict[str, list]) – parameters that are being tested. A list of multiple test values should be the value (e.g. {model: ["gpt-3.5-turbo", "gpt-4"], temperature: [0,0. 1.0]})
frozen_parameters (dict) – parameters that are intended to be frozen across different configuration. There is no need to wrap the value in a list. (e.g. {top_p: 1.0, presence_penalty: 0.0})

Example

>>> from prompttools.experiment import OpenAIChatExperiment
>>> test_parameters = {"model": ["gpt-3.5-turbo", "gpt-4"]}
>>> messages = [{"role": "user", "content": "Who was the first president?"}]
>>> frozen_parameters = {"top_p": 1.0, "messages": messages}
>>> experiment = OpenAIChatExperiment.initialize(test_parameters, frozen_parameters)

llama_completion_fn(**params)#

Local model helper function to make request

Parameters:: params (Dict[str, Any]) –

prepare()#

Creates argument combinations by taking the cartesian product of all inputs.

Return type:: None

run(runs=1)#

Create tuples of input and output for every possible combination of arguments. For each combination, it will execute runs times, default to 1. For local models we need to run this in a single thread.

Parameters:: runs (int) –
Return type:: None

prompttools.experiment.experiments.openai_chat_experiment module#

class prompttools.experiment.experiments.openai_chat_experiment.OpenAIChatExperiment(model=['gpt-3.5-turbo'], messages=[], temperature=[1.0], top_p=[1.0], n=[1], stream=[False], stop=[None], max_tokens=[inf], presence_penalty=[0.0], frequency_penalty=[0.0], logit_bias=[None], response_format=[None], seed=[None], functions=[None], function_call=[None], azure_openai_service_configs=None)#

Bases: Experiment

This class defines an experiment for OpenAI’s chat completion API. It accepts lists for each argument passed into OpenAI’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
For detailed description of the input arguments, please reference at OpenAI’s chat completion API.

Parameters:

model (list[str]) – list of ID(s) of the model(s) to use, e.g. ["gpt-3.5-turbo", "ft:gpt-3.5-turbo:org_id"] If you are using Azure OpenAI service, put the models’ deployment names here
messages (list[dict]) – A list of messages comprising the conversation so far. Each message is represented as a dictionary with the following keys: role: str, content: str.
temperature (list[float]) – Defaults to [1.0]. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p (list[float]) – Defaults to [1.0]. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n (list[int]) – Defaults to [1]. How many chat completion choices to generate for each input message.
stream (list[bool]) – Defaults to [False]. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
stop (list[list[str]]) – Defaults to [None]. Up to 4 sequences where the API will stop generating further tokens.
max_tokens (list[int]) – Defaults to [inf]. The maximum number of tokens to generate in the chat completion.
presence_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
frequency_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
logit_bias (list[dict]) – Defaults to [None]. Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
functions (list[dict]) – Defaults to [None]. A list of dictionaries, each of which contains the definition of a function the model may generate JSON inputs for.
function_call (list[dict]) – Defaults to [None]. A dictionary containing the name and arguments of a function that should be called, s generated by the model.
response_format (list[Optional[dict]]) – Setting to { type: “json_object” } enables JSON mode, which guarantees the message the model generates is valid JSON.
seed (list[Optional[int]]) – This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
azure_openai_service_configs (Optional[dict]) – Defaults to None. If it is set, the experiment will use Azure OpenAI Service. The input dict should contain these 2 keys (but with values based on your use case and configuration): {"AZURE_OPENAI_ENDPOINT": "https://YOUR_RESOURCE_NAME.openai.azure.com/", "API_VERSION": "2023-05-15"}

get_table(get_all_cols=False)#

Get the DataFrame in one of two versions: 1. get_all_cols = False - good for visualization. This contains dynamic (non-frozen) input arguments,

the text response, and scores (e.g. latency and metrics generated from evaluation).

get_all_cols = True - good for full result. This contains full data with all
input arguments (including frozen ones), full model response (not just the text response), and scores.

Parameters:: get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
Return type:: DataFrame

classmethod load_experiment(experiment_id)#

experiment_id (str): experiment ID of the experiment that you wish to load.

Parameters:: experiment_id (str) –

classmethod load_revision(revision_id)#

revision_id (str): revision ID of the experiment that you wish to load.

Parameters:: revision_id (str) –

run_one(model, messages, temperature=1.0, top_p=1.0, n=1, stream=False, stop=None, max_tokens=inf, presence_penalty=0.0, frequency_penalty=0.0, logit_bias=None, response_format=None, seed=None, functions=None, function_call=None)#

Execute one particular configuration of the experiment and add that to the result DataFrame.

Unlike run_partial, this doesn’t change the argument combination of the experiment.

Parameters:

model (str) –
messages (Union[List[Dict[str, str]], PromptSelector]) –
temperature (Optional[float]) –
top_p (Optional[float]) –
n (Optional[int]) –
stream (Optional[bool]) –
stop (Optional[List[str]]) –
max_tokens (Optional[int]) –
presence_penalty (Optional[float]) –
frequency_penalty (Optional[float]) –
logit_bias (Optional[Dict]) –
response_format (Optional[dict]) –
seed (Optional[int]) –
functions (Optional[Dict]) –
function_call (Optional[Dict[str, str]]) –

run_partial(**kwargs)#

Run experiment with against one parameter, which can be existing or new. The new result will be appended to any existing DataFrames.

If the argument value did not exist before, it will be added to the list of argument combinations that will be executed in the next run.

e.g. experiement.run_partial({model: ‘gpt-4’})

save_experiment(name=None)#

name (str, optional): Name of the experiment. This is optional if you have previously loaded an experiment: into this object.

Parameters:: name (Optional[str]) –

prompttools.experiment.experiments.openai_completion_experiment module#

class prompttools.experiment.experiments.openai_completion_experiment.OpenAICompletionExperiment(model, prompt, suffix=[None], max_tokens=[inf], temperature=[1.0], top_p=[1.0], n=[1], stream=[False], logprobs=[None], echo=[False], stop=[None], presence_penalty=[0], frequency_penalty=[0], best_of=[1], logit_bias=[None], azure_openai_service_configs=None)#

Bases: Experiment

This class defines an experiment for OpenAI’s completion API. It accepts lists for each argument passed into OpenAI’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
For detailed description of the input arguments, please reference at OpenAI’s completion API.

Parameters:

model (list[str]) – list of ID(s) of the model(s) to use, e.g. ["gpt-3.5-turbo", "ft:gpt-3.5-turbo:org_id"] If you are using Azure OpenAI service, put the models’ deployment names here
prompt (list[str]) – the prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.
suffix (Optional[List[str]]) – (list[str]): Defaults to [None]. the suffix(es) that comes after a completion of inserted text.
max_tokens (list[int]) – Defaults to [inf]. The maximum number of tokens to generate in the chat completion.
temperature (list[float]) – Defaults to [1.0]. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p (list[float]) – Defaults to [1.0]. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n (list[int]) – Defaults to [1]. How many chat completion choices to generate for each input message.
stream (list[bool]) – Defaults to [False]. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
logprobs (list[int]) – Defaults to [None]. Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens.
echo (list[bool]) – Echo back the prompt in addition to the completion.
stop (list[list[str]]) – Defaults to [None]. Up to 4 sequences where the API will stop generating further tokens.
presence_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
frequency_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
best_of (list[int]) – Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed.
logit_bias (list[dict]) – Defaults to [None]. Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
azure_openai_service_configs (Optional[dict]) –
Defaults to None. If it is set, the experiment will use Azure OpenAI Service. The input dict should contain these 3 keys (but with values based on your use case and configuration): ``{“AZURE_OPENAI_ENDPOINT”: “https://YOUR_RESOURCE_NAME.openai.azure.com/”,

”API_TYPE”: “azure”, “API_VERSION”: “2023-05-15”``

prompttools.experiment.experiments package#

Submodules#

prompttools.experiment.experiments.anthropic_claude_experiment module#

prompttools.experiment.experiments.chromadb_experiment module#

prompttools.experiment.experiments.error module#

prompttools.experiment.experiments.experiment module#

prompttools.experiment.experiments.google_palm_experiment module#

prompttools.experiment.experiments.huggingface_endpoint_experiment module#

prompttools.experiment.experiments.huggingface_hub_experiment module#

prompttools.experiment.experiments.llama_cpp_experiment module#

prompttools.experiment.experiments.openai_chat_experiment module#

prompttools.experiment.experiments.openai_completion_experiment module#

prompttools.experiment.experiments.openai_function_experiment module#

prompttools.experiment.experiments.vector_database_experiment module#

Module contents#