Experiment#

There are two main abstractions used in the prompttools library: Experiments and Harnesses. Occasionally, you may want to use a harness, because it abstracts away more details.

An experiment is a low level abstraction that takes the Cartesian product of possible inputs to an LLM API. For example, the OpenAIChatExperiment accepts lists of inputs for each parameter of the OpenAI Chat Completion API. Then, it constructs and asynchronously executes requests using those potential inputs. An example of using experiment is here.

There are two ways to initialize an experiment:

Wrap your parameters in lists and pass them into the __init__ method. See each class’s method signature in the “Integrated Experiment APIs” section for details.

Define which parameters should be tested and which ones should be frozen in two dictionaries. Pass the dictionaries to the initialize method. See the classmethod initialize below for details.

The Experiment superclass’s shared API is below.

class prompttools.experiment.Experiment#

Base class for experiment. This should not be used directly, please use the subclasses instead.

aggregate(metric_name, column_name, is_average=False)#

Aggregates a metric for a given column and displays to the user.

Args:
metric_name (str): metric to aggregate column_name (str): column to base the aggregation on is_average (bool): if True, compute the average for the metric, else compute the total

evaluate(metric_name, eval_fn, static_eval_fn_kwargs={}, image_experiment=False, **eval_fn_kwargs)#

Using the given evaluation function that accepts a row of data, compute a new column with the evaluation result. Each row of data generally contain inputs, model response, and other previously computed metrics.

Parameters:

metric_name (str) – name of the metric being computed
eval_fn (Callable) – an evaluation function that takes in a row from pd.DataFrame and optional keyword arguments
static_eval_fn_kwargs (dict) – keyword args for eval_fn that are consistent for all rows
eval_fn_kwargs (Optional[list]) – keyword args for eval_fn that may be different for each row. Each value entered here should be a list, and the length of the list should be the same as the number of responses in the experiment’s result. The ``i``th element of the list will be passed to the evaluation function to evaluate the ``i``th row.
image_experiment (bool) –

Return type:

None

Example

>>> from prompttools.utils import validate_json_response
>>> experiment.evaluate("is_json", validate_json_response,
>>>                     static_eval_fn_kwargs={"response_column_name": "response"})

get_table(get_all_cols=False)#

Get the DataFrame in one of two versions: 1. get_all_cols = False - good for visualization. This contains dynamic (non-frozen) input arguments,

the text response, and scores (e.g. latency and metrics generated from evaluation).

get_all_cols = True - good for full result. This contains full data with all
input arguments (including frozen ones), full model response (not just the text response), and scores.

Parameters:: get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
Return type:: DataFrame

classmethod initialize(test_parameters, frozen_parameters)#

An alternate way to initialize an experiment by specifying which parameters should be tested and which ones should be frozen. If a parameter is not specified, the default value (if exists) for the parameter will be used.

This allows you to easily initialize an experiment without wrapping every parameter in a list.

Note

For a given experiment, some parameters must be specified (e.g. the model parameter for OpenAI Chat Experiment). See the experiment’s __init__ method.
Each of test_parameters’s values should be a list, but not for frozen_parameters.

Parameters:

test_parameters (dict[str, list]) – parameters that are being tested. A list of multiple test values should be the value (e.g. {model: ["gpt-3.5-turbo", "gpt-4"], temperature: [0,0. 1.0]})
frozen_parameters (dict) – parameters that are intended to be frozen across different configuration. There is no need to wrap the value in a list. (e.g. {top_p: 1.0, presence_penalty: 0.0})

Example

>>> from prompttools.experiment import OpenAIChatExperiment
>>> test_parameters = {"model": ["gpt-3.5-turbo", "gpt-4"]}
>>> messages = [{"role": "user", "content": "Who was the first president?"}]
>>> frozen_parameters = {"top_p": 1.0, "messages": messages}
>>> experiment = OpenAIChatExperiment.initialize(test_parameters, frozen_parameters)

pivot_table(pivot_columns, response_value_name=None, get_all_cols=False)#

Returns a pivoted DataFrame.

Parameters:

pivot_columns (List[str]) – two column names (first for pivot row, second for pivot column) that serve as indices the pivot table
response_value_name (Optional[str]) – name of the column to aggregate.
get_all_cols (bool) – defaults to False. If True, it will visualize the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.

Return type:

DataFrame

prepare()#

Creates argument combinations by taking the cartesian product of all inputs.

Return type:: None

rank(metric_name, is_average, agg_column, get_all_cols=False)#

Using pivot data, groups the data by the first pivot column to get scores, and sorts descending. For example, using pivot data of (prompt_template, user_input), a metric of latency, and is_average=True, we rank prompt templates by their average latency in the test set.

Parameters:

metric_name (str) – metric to aggregate over
is_average (bool) – if True, compute the average for the metric, else compute the total
agg_column (str) – column to aggregate over
get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.

Return type:

Dict[str, int]

run(runs=1, clear_previous_results=False)#

Create tuples of input and output for every possible combination of arguments.

Note

If you overwrite this method in a subclass, make sure your method calls _construct_result_dfs in order to save the results from your run as DataFrames. Then, they can later be used for evaluation, aggregation, and persistence.

Parameters:

runs (int) – number of times to execute each possible combination of arguments, defaults to 1.
clear_previous_results (bool) – clear previous results before running

Return type:

None

to_csv(path, get_all_cols=True, **kwargs)#

Export the results to a CSV file. If the experiment has not been executed, it will run.

Parameters:

path (str) – path/buffer to write the CSV output
get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
**kwargs – optional arguments passed to pd.DataFrame.to_csv()

to_json(path=None, get_all_cols=True, **kwargs)#

Export the results to a JSON file. If the experiment has not been executed, it will run.

Parameters:

path (Optional[str]) – path/buffer to write the JSON output, defaults to None which returns the JSON as a dict
get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
**kwargs – optional arguments passed to pd.DataFrame.to_json()

to_lora_json(instruction_extract, input_extract, output_extract, path=None, **kwargs)#

Export the results to a LoRA-format JSON file for fine-tuning. If the experiment has not been executed, it will run.

Parameters:

instruction_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to "instruction" entry in the JSON file
input_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to "input" entry in the JSON file
output_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to "output" entry in the JSON file
path (Optional[str]) – path/buffer to write the JSON output, defaults to None which returns the JSON as a dict
**kwargs – optional arguments passed to pd.DataFrame.to_json()

to_mongo_db(mongo_uri, database_name, collection_name)#

Insert the results of the experiment into MongoDB for persistence.

Note

You need to install the pymongo package to use this method.
You need to run a local or remote instance of MongoDB in order to store the data.

Parameters:

mongo_uri (str) – a connection string to the target MongoDB
database_name (str) – name of the MongoDB database
collection_name (str) – name of the MongoDB collection

Return type:

None

to_pandas_df(get_all_cols=True, from_streamlit=False)#

Return the results as a pandas.DataFrame. If the experiment has not been executed, it will run.

Parameters:

get_all_cols (bool) – defaults to False. If True, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
from_streamlit (bool) –

visualize(get_all_cols=False, pivot=False, pivot_columns=[])#

Visualize the DataFrame in one of two versions: 1. get_all_cols = False - good for visualization. This contains dynamic (non-frozen) input arguments,

the text response, and scores (e.g. latency and metrics generated from evaluation).

get_all_cols = True - good for full result. This contains full data with all
input arguments (including frozen ones), full model response (not just the text response), and scores.

Parameters:

get_all_cols (bool) – defaults to False. If True, it will visualize the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
pivot (bool) –
pivot_columns (list) –

Return type:

None

Integrated Experiment APIs#

LLMs#

class prompttools.experiment.OpenAIChatExperiment(model=['gpt-3.5-turbo'], messages=[], temperature=[1.0], top_p=[1.0], n=[1], stream=[False], stop=[None], max_tokens=[inf], presence_penalty=[0.0], frequency_penalty=[0.0], logit_bias=[None], response_format=[None], seed=[None], functions=[None], function_call=[None], azure_openai_service_configs=None)#

This class defines an experiment for OpenAI’s chat completion API. It accepts lists for each argument passed into OpenAI’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
For detailed description of the input arguments, please reference at OpenAI’s chat completion API.

Parameters:

model (list[str]) – list of ID(s) of the model(s) to use, e.g. ["gpt-3.5-turbo", "ft:gpt-3.5-turbo:org_id"] If you are using Azure OpenAI service, put the models’ deployment names here
messages (list[dict]) – A list of messages comprising the conversation so far. Each message is represented as a dictionary with the following keys: role: str, content: str.
temperature (list[float]) – Defaults to [1.0]. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p (list[float]) – Defaults to [1.0]. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n (list[int]) – Defaults to [1]. How many chat completion choices to generate for each input message.
stream (list[bool]) – Defaults to [False]. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
stop (list[list[str]]) – Defaults to [None]. Up to 4 sequences where the API will stop generating further tokens.
max_tokens (list[int]) – Defaults to [inf]. The maximum number of tokens to generate in the chat completion.
presence_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
frequency_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
logit_bias (list[dict]) – Defaults to [None]. Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
functions (list[dict]) – Defaults to [None]. A list of dictionaries, each of which contains the definition of a function the model may generate JSON inputs for.
function_call (list[dict]) – Defaults to [None]. A dictionary containing the name and arguments of a function that should be called, s generated by the model.
response_format (list[Optional[dict]]) – Setting to { type: “json_object” } enables JSON mode, which guarantees the message the model generates is valid JSON.
seed (list[Optional[int]]) – This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
azure_openai_service_configs (Optional[dict]) – Defaults to None. If it is set, the experiment will use Azure OpenAI Service. The input dict should contain these 2 keys (but with values based on your use case and configuration): {"AZURE_OPENAI_ENDPOINT": "https://YOUR_RESOURCE_NAME.openai.azure.com/", "API_VERSION": "2023-05-15"}

class prompttools.experiment.OpenAICompletionExperiment(model, prompt, suffix=[None], max_tokens=[inf], temperature=[1.0], top_p=[1.0], n=[1], stream=[False], logprobs=[None], echo=[False], stop=[None], presence_penalty=[0], frequency_penalty=[0], best_of=[1], logit_bias=[None], azure_openai_service_configs=None)#

This class defines an experiment for OpenAI’s completion API. It accepts lists for each argument passed into OpenAI’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
For detailed description of the input arguments, please reference at OpenAI’s completion API.

Parameters:

model (list[str]) – list of ID(s) of the model(s) to use, e.g. ["gpt-3.5-turbo", "ft:gpt-3.5-turbo:org_id"] If you are using Azure OpenAI service, put the models’ deployment names here
prompt (list[str]) – the prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.
suffix (Optional[List[str]]) – (list[str]): Defaults to [None]. the suffix(es) that comes after a completion of inserted text.
max_tokens (list[int]) – Defaults to [inf]. The maximum number of tokens to generate in the chat completion.
temperature (list[float]) – Defaults to [1.0]. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p (list[float]) – Defaults to [1.0]. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n (list[int]) – Defaults to [1]. How many chat completion choices to generate for each input message.
stream (list[bool]) – Defaults to [False]. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
logprobs (list[int]) – Defaults to [None]. Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens.
echo (list[bool]) – Echo back the prompt in addition to the completion.
stop (list[list[str]]) – Defaults to [None]. Up to 4 sequences where the API will stop generating further tokens.
presence_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
frequency_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
best_of (list[int]) – Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed.
logit_bias (list[dict]) – Defaults to [None]. Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
azure_openai_service_configs (Optional[dict]) –
Defaults to None. If it is set, the experiment will use Azure OpenAI Service. The input dict should contain these 3 keys (but with values based on your use case and configuration): ``{“AZURE_OPENAI_ENDPOINT”: “https://YOUR_RESOURCE_NAME.openai.azure.com/”,

”API_TYPE”: “azure”, “API_VERSION”: “2023-05-15”``

class prompttools.experiment.AnthropicCompletionExperiment(model, prompt, metadata=[None], max_tokens_to_sample=[1000], stop_sequences=[None], stream=[False], temperature=[None], top_k=[None], top_p=[None], timeout=[600.0])#

This class defines an experiment for Anthropic’s completion API. It accepts lists for each argument passed into Anthropic’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
You should set os.environ["ANTHROPIC_API_KEY"] = YOUR_KEY in order to connect with Anthropic’s API.

Parameters:

max_tokens_to_sample (list[int]) – A list of integers representing The maximum number of tokens to generate before stopping.
model (list[str]) – the model(s) that will complete your prompt (e.g. “claude-2”, “claude-instant-1”)
prompt (list[str]) – Input prompt. For proper response generation you will need to format your prompt as follows: f"{HUMAN_PROMPT} USER_QUESTION {AI_PROMPT}", you can get built-in string by importing from anthropic HUMAN_PROMPT, AI_PROMPT
metadata (list) – list of object(s) describing metadata about the request.
stop_sequences (list[list[str]], optional) – Sequences that will cause the model to stop generating completion text
stream (list[bool], optional) – Whether to incrementally stream the response using server-sent events.
temperature (list[float], optional) – The amount of randomness injected into the response
top_k (list[int], optional) – Only sample from the top K options for each subsequent token.
top_p (list[float], optional) – use nucleus sampling.
timeout (list[float], optional) – Override the client-level default timeout for this request, in seconds. Defaults to [600.0].

class prompttools.experiment.HuggingFaceHubExperiment(repo_id, prompt, task=['text-generation'], **kwargs)#

Experiment for Hugging Face Hub’s API. It accepts lists for each argument passed into Hugging Face Hub’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments. For example, kwargs should have string keys, with ``list``s being the values.

Parameters:

repo_id (List[str]) – IDs of repository (e.g. [user/bert-base-uncased]).
prompt (List[str] | List[PromptSelector]) – list of prompts to test
task (List[str]) – List of tasks in strings. Determines whether to force a task instead of using task specified in the repository.
**kwargs (Dict[str, list[object]]) – Keyword parameters used in the call to InferenceApi. The values should be ``list``s.

class prompttools.experiment.GooglePaLMCompletionExperiment(model, prompt, temperature=[None], candidate_count=[None], max_output_tokens=[None], top_p=[None], top_k=[None], safety_settings=[None], stop_sequences=[None])#

This class defines an experiment for Google PaLM’s generate text API. It accepts lists for each argument passed into PaLM’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
You should set os.environ["GOOGLE_PALM_API_KEY"] = YOUR_KEY in order to connect with PaLM’s API.

Parameters:

model (list[str]) – Which model to call, as a string or a types.Model (e.g. 'models/text-bison-001').
prompt (list[str]) – Free-form input text given to the model. Given a prompt, the model will generate text that completes the input text.
temperature (list[float]) – Controls the randomness of the output. Must be positive. Typical values are in the range: [0.0, 1.0]. Higher values produce a more random and varied response. A temperature of zero will be deterministic.
candidate_count (list[int]) – The maximum number of generated response messages to return. This value must be between [1, 8], inclusive. If unset, this will default to 1.
max_output_tokens (list[int]) – Maximum number of tokens to include in a candidate. Must be greater than zero. If unset, will default to 64.
top_k (list[float]) – The API uses combined nucleus and top-k sampling. top_k sets the maximum number of tokens to sample from on each step.
top_p (list[float]) – The API uses combined nucleus and top-k sampling. top_p configures the nucleus sampling. It sets the maximum cumulative probability of tokens to sample from.
safety_settings (list[Iterable[palm.types.SafetySettingDict]]) – A list of unique types.SafetySetting instances for blocking unsafe content.
stop_sequences (list[Union[str, Iterable[str]]]) – A set of up to 5 character sequences that will stop output generation. If specified, the API will stop at the first appearance of a stop sequence.

class prompttools.experiment.GoogleVertexChatCompletionExperiment(model, message, context=[None], examples=[None], temperature=[None], max_output_tokens=[None], top_p=[None], top_k=[None], stop_sequences=[None])#

This class defines an experiment for Google Vertex AI’s chat API. It accepts lists for each argument passed into Vertex AI’s API, then creates a cartesian product of those arguments, and gets results for each.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments.
You need to set up your Google Vertex AI credentials properly before executing this experiment. One option is to execute on Google Cloud’s Colab.

Parameters:

model (list[str]) – Which model to call, as a string or a types.Model (e.g. 'models/text-bison-001').
message (list[str]) – Message for the chat model to respond.
context (list[str]) – Context shapes how the model responds throughout the conversation. For example, you can use context to specify words the model can or cannot use, topics to focus on or avoid, or the response format or style.
examples (list[list['InputOutputTextPair']]) – Examples for the model to learn how to respond to the conversation.
temperature (list[float]) – Controls the randomness of the output. Must be positive. Typical values are in the range: [0.0, 1.0]. Higher values produce a more random and varied response. A temperature of zero will be deterministic.
max_output_tokens (list[int]) – Maximum number of tokens to include in a candidate. Must be greater than zero. If unset, will default to 64.
top_k (list[float]) – The API uses combined nucleus and top-k sampling. top_k sets the maximum number of tokens to sample from on each step.
top_p (list[float]) – The API uses combined nucleus and top-k sampling. top_p configures the nucleus sampling. It sets the maximum cumulative probability of tokens to sample from.
stop_sequences (list[Union[str, Iterable[str]]]) – A set of up to 5 character sequences that will stop output generation. If specified, the API will stop at the first appearance of a stop sequence.

class prompttools.experiment.LlamaCppExperiment(model_path, prompt, model_params={}, call_params={})#

Used to experiment across parameters for a local model, supported by LlamaCpp and GGML.

Note

All arguments here should be a list, even if you want to keep the argument frozen (i.e. temperature=[1.0]), because the experiment will try all possible combination of the input arguments. For example, model_params should have string keys, with ``list``s being the values.

Parameters:

model_path (List[str]) – list of paths to the models that you would like to run
prompt (List[str] | List[PromptSelector]) – list of prompts to test
model_params (Dict[str, list[object]]) – Parameters for initializing the model. The values should be ``list``s.
call_params (Dict[str, list[object]]) – (Dict[str, list[object]]): Parameters for calling the model completion function. The values should be ``list``s.

class prompttools.experiment.ReplicateExperiment(models, input_kwargs, model_specific_kwargs={}, use_image_model=False)#

Perform an experiment with the Replicate API for both image models and LLMs.

Note

Set your API token to os.environ["REPLICATE_API_TOKEN"]. If you are using an image model, set use_image_model=True as input argument.

Parameters:

models (list[str]) – “stability-ai/stable-diffusion:27b93a2413e”
input_kwargs (dict[str, list]) – keyword arguments that can be used across all models
model_specific_kwargs (dict[str, dict[str, list]]) – model-specific keyword arguments that will only be used by a specific model (e.g. stability-ai/stable-diffusion:27b93a2413
use_image_model (bool) – Defaults to False, must set to True to render output from image models.

Frameworks#

class prompttools.experiment.SequentialChainExperiment(llm, prompt_template, prompt, **kwargs)#

Experiment for testing LangChain’s sequential chains.

Parameters:

llm (list) – list of LLMs
prompt_template (list[list]) – list of prompt templates
prompt (list[str]) – list of prompts
kwargs (dict) – keyword arguments to call the model with

class prompttools.experiment.RouterChainExperiment(llm, prompt_infos, prompt, **kwargs)#

Experiment for testing LangChain’s router chains.

Parameters:

llm (list) – list of LLMs
prompt_infos (list(list[dict(Any)])) – list of list of dicts describing key features of prompt chain
prompt (list[str]) – list of prompts
kwargs (dict) – keyword arguments to call the model with

class prompttools.experiment.MindsDBExperiment(db_connector, **kwargs)#

An experiment class for MindsDB. This accepts combinations of MindsDB inputs to form SQL queries, returning a list of responses.

Parameters:

db_connector (CMySQLConnection) – Connector MindsDB
kwargs (dict) – keyword arguments for the model

Vector DBs#

class prompttools.experiment.ChromaDBExperiment(chroma_client, collection_name, use_existing_collection, query_collection_params, embedding_fns=[None], embedding_fn_names=['default'], add_to_collection_params=None)#

Perform an experiment with ChromaDB to test different embedding functions or retrieval arguments. You can query from an existing collection, or create a new one (and insert documents into it) during the experiment. If you choose to create a new collection, it will be automatically cleaned up as the experiment ends.

Parameters:

chroma_client (chromadb.Client) – ChromaDB client to interact with your database
collection_name (str) – the collection that you will get or create
use_existing_collection (bool) – determines whether to create a new collection or use an existing one
query_collection_params (dict[str, list]) – parameters used to query the collection Each value is expected to be a list to create all possible combinations
embedding_fns (list[Callable]) – embedding functions to test in the experiment by default only uses the default one in ChromaDB
embedding_fn_names (list[str]) – names of the embedding functions
add_to_collection_params (Optional[dict]) – documents or embeddings that will be added to the newly created collection

class prompttools.experiment.WeaviateExperiment(client, class_name, use_existing_data, property_names, text_queries, query_builders={'default': <function default_query_builder>}, vectorizers_and_moduleConfigs=None, property_definitions=None, data_objects=None, distance_metrics=None, vectorIndexConfigs=None)#

Perform an experiment with Weaviate to test different vectorizers or querying functions. You can query from an existing class, or create a new one (and insert data objects into it) during the experiment. If you choose to create a new class, it will be automatically cleaned up as the experiment ends.

Parameters:

client (weaviate.Client) – The Weaviate client instance to interact with the Weaviate server.
class_name (str) – The name of the Weaviate class (equivalent to a collection in ChromaDB).
use_existing_data (bool) – If True, indicates that existing data will be used for the experiment. If False, new data objects will be inserted into Weaviate during the experiment.
property_names (list[str]) – List of property names in the Weaviate class to be used in the experiment.
text_queries (list[str]) – List of text queries to be used for retrieval in the experiment.
query_builders (dict[str, Callable], optional) – A dictionary containing different query builders. The key should be the name of the function for visualization purposes. The value should be a Callable function that constructs and returns a Weaviate query object. Defaults to a built-in query function.
vectorizers_and_moduleConfigs (Optional[list[tuple[str, dict]]], optional) – List of tuples, where each tuple contains the name of the vectorizer and its corresponding moduleConfig as a dictionary. This is used during data insertion (if necessary).
property_definitions (Optional[list[dict]], optional) – List of property definitions for the Weaviate class. Each property definition is a dictionary containing the property name and data type. This is used during data insertion (if necessary).
data_objects (Optional[list], optional) – List of data objects to be inserted into Weaviate during the experiment. Each data object is a dictionary representing the property-value pairs.
distance_metrics (Optional[list[str]], optional) – List of distance metrics to be used in the experiment. These metrics will be used for generating vectorIndexConfig. This is used to define the class object. If necessary, either use distance_metrics or vectorIndexConfigs, not both.
vectorIndexConfigs (Optional[list[dict]], optional) – List of vectorIndexConfig to be used in the experiment to define the class object.

Note

If use_existing_data is False, the experiment will create a new Weaviate class and insert data_objects into it. The class and data_objects will be automatically cleaned up at the end of the experiment.
Either use existing data or specify data_objs and vectorizers for insertion.
Either distance_metrics or vectorIndexConfigs should be provided if necessary, not both.
If you pass in a custom query_builder function, it should accept the same parameters as the default one as seen here.

class prompttools.experiment.LanceDBExperiment(embedding_fns, query_args, uri='lancedb', table_name='table', use_existing_table=False, data=None, text_col_name='text', clean_up=False)#

Perform an experiment with LanceDB to test different embedding functions or retrieval arguments. You can query from an existing table, or create a new one (and insert documents into it) during the experiment.

Parameters:

uri (str) – LanceDB uri to interact with your database. Default is “lancedb”
table_name (str) – the table that you will get or create. Default is “table”
use_existing_table (bool) – determines whether to create a new collection or use an existing one
embedding_fns (list[Callable]) – embedding functions to test in the experiment by default only uses the default one in LanceDB
query_args (dict[str, list]) – parameters used to query the table Each value is expected to be a list to create all possible combinations
data (Optional[list[dict]]) – documents or embeddings that will be added to the newly created table
text_col_name (str) – name of the text column in the table. Default is “text”
clean_up (bool) – determines whether to drop the table after the experiment ends

class prompttools.experiment.QdrantExperiment(client, collection_name, embedding_fn, vector_size, documents, queries, collection_params=None, query_params=None)#

Parameters:

client (qdrant_client.QdrantClient) –
collection_name (str) –
embedding_fn (Callable[[str], List[float]]) –
vector_size (int) –
documents (Iterable[str]) –
queries (Iterable[str]) –
collection_params (Optional[Dict[str, List[Any]]]) –
query_params (Optional[Dict[str, List[Any]]]) –

class prompttools.experiment.PineconeExperiment(index_name, use_existing_index, query_index_params, create_index_params=None, data=None)#

Perform an experiment with Pinecone to test different embedding functions or retrieval arguments. You can query from an existing collection, or create a new one (and insert documents into it) during the experiment. If you choose to create a new collection, it will be automatically cleaned up as the experiment ends.

Parameters:

index_name (str) – the index that you will use or create
use_existing_index (bool) – determines whether to create a new collection or use an existing one
query_index_params (dict[str, list]) – parameters used to query the collection Each value is expected to be a list to create all possible combinations
create_index_params (Optional[dict]) – configuration of the new index (e.g. number of dimensions, distance function)
data (Optional[list]) – documents or embeddings that will be added to the newly created collection

Computer Vision#

class prompttools.experiment.StableDiffusionExperiment(hf_model_path, prompt, compare_images_folder, use_auth_token=False, **kwargs)#

Experiment for experiment with the Stable Diffusion model.

Parameters:

hf_model_path (str) – path to model on hugging face
use_auth_token (bool) – boolean to determine if hf login is necessary [needed without GPU]
kwargs (dict) – keyword arguments to call the model with
prompt (List[str]) –
compare_images_folder (str) –

class prompttools.experiment.ReplicateExperiment(models, input_kwargs, model_specific_kwargs={}, use_image_model=False)#

Perform an experiment with the Replicate API for both image models and LLMs.

Note

Set your API token to os.environ["REPLICATE_API_TOKEN"]. If you are using an image model, set use_image_model=True as input argument.

Parameters:

models (list[str]) – “stability-ai/stable-diffusion:27b93a2413e”
input_kwargs (dict[str, list]) – keyword arguments that can be used across all models
model_specific_kwargs (dict[str, dict[str, list]]) – model-specific keyword arguments that will only be used by a specific model (e.g. stability-ai/stable-diffusion:27b93a2413
use_image_model (bool) – Defaults to False, must set to True to render output from image models.