Experiment#
There are two main abstractions used in the prompttools
library: Experiments and Harnesses.
Occasionally, you may want to use a harness, because it abstracts away more details.
An experiment is a low level abstraction that takes the Cartesian product of possible inputs to
an LLM API. For example, the OpenAIChatExperiment
accepts lists of inputs for each parameter
of the OpenAI Chat Completion API. Then, it constructs and asynchronously executes requests
using those potential inputs. An example of using experiment is here.
There are two ways to initialize an experiment:
Wrap your parameters in
list
s and pass them into the__init__
method. See each class’s method signature in the “Integrated Experiment APIs” section for details.Define which parameters should be tested and which ones should be frozen in two dictionaries. Pass the dictionaries to the
initialize
method. See theclassmethod initialize
below for details.
The Experiment
superclass’s shared API is below.
- class prompttools.experiment.Experiment#
Base class for experiment. This should not be used directly, please use the subclasses instead.
- aggregate(metric_name, column_name, is_average=False)#
Aggregates a metric for a given column and displays to the user.
- Args:
metric_name (str): metric to aggregate column_name (str): column to base the aggregation on is_average (bool): if
True
, compute the average for the metric, else compute the total
- evaluate(metric_name, eval_fn, static_eval_fn_kwargs={}, image_experiment=False, **eval_fn_kwargs)#
Using the given evaluation function that accepts a row of data, compute a new column with the evaluation result. Each row of data generally contain inputs, model response, and other previously computed metrics.
- Parameters:
metric_name (str) – name of the metric being computed
eval_fn (Callable) – an evaluation function that takes in a row from pd.DataFrame and optional keyword arguments
static_eval_fn_kwargs (dict) – keyword args for
eval_fn
that are consistent for all rowseval_fn_kwargs (Optional[list]) – keyword args for
eval_fn
that may be different for each row. Each value entered here should be a list, and the length of the list should be the same as the number of responses in the experiment’s result. The ``i``th element of the list will be passed to the evaluation function to evaluate the ``i``th row.image_experiment (bool) –
- Return type:
None
Example
>>> from prompttools.utils import validate_json_response >>> experiment.evaluate("is_json", validate_json_response, >>> static_eval_fn_kwargs={"response_column_name": "response"})
- get_table(get_all_cols=False)#
Get the DataFrame in one of two versions: 1.
get_all_cols = False
- good for visualization. This contains dynamic (non-frozen) input arguments,the text response, and scores (e.g. latency and metrics generated from evaluation).
get_all_cols = True
- good for full result. This contains full data with allinput arguments (including frozen ones), full model response (not just the text response), and scores.
- Parameters:
get_all_cols (bool) – defaults to
False
. IfTrue
, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.- Return type:
DataFrame
- classmethod initialize(test_parameters, frozen_parameters)#
An alternate way to initialize an experiment by specifying which parameters should be tested and which ones should be frozen. If a parameter is not specified, the default value (if exists) for the parameter will be used.
This allows you to easily initialize an experiment without wrapping every parameter in a list.
Note
For a given experiment, some parameters must be specified (e.g. the
model
parameter for OpenAI Chat Experiment). See the experiment’s__init__
method.Each of
test_parameters
’s values should be alist
, but not forfrozen_parameters
.
- Parameters:
test_parameters (dict[str, list]) – parameters that are being tested. A list of multiple test values should be the value (e.g.
{model: ["gpt-3.5-turbo", "gpt-4"], temperature: [0,0. 1.0]}
)frozen_parameters (dict) – parameters that are intended to be frozen across different configuration. There is no need to wrap the value in a list. (e.g.
{top_p: 1.0, presence_penalty: 0.0}
)
Example
>>> from prompttools.experiment import OpenAIChatExperiment >>> test_parameters = {"model": ["gpt-3.5-turbo", "gpt-4"]} >>> messages = [{"role": "user", "content": "Who was the first president?"}] >>> frozen_parameters = {"top_p": 1.0, "messages": messages} >>> experiment = OpenAIChatExperiment.initialize(test_parameters, frozen_parameters)
- pivot_table(pivot_columns, response_value_name=None, get_all_cols=False)#
Returns a pivoted DataFrame.
- Parameters:
pivot_columns (List[str]) – two column names (first for pivot row, second for pivot column) that serve as indices the pivot table
response_value_name (Optional[str]) – name of the column to aggregate.
get_all_cols (bool) – defaults to
False
. IfTrue
, it will visualize the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
- Return type:
DataFrame
- prepare()#
Creates argument combinations by taking the cartesian product of all inputs.
- Return type:
None
- rank(metric_name, is_average, agg_column, get_all_cols=False)#
Using pivot data, groups the data by the first pivot column to get scores, and sorts descending. For example, using pivot data of (prompt_template, user_input), a metric of latency, and is_average=True, we rank prompt templates by their average latency in the test set.
- Parameters:
metric_name (str) – metric to aggregate over
is_average (bool) – if
True
, compute the average for the metric, else compute the totalagg_column (str) – column to aggregate over
get_all_cols (bool) – defaults to
False
. IfTrue
, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.
- Return type:
- run(runs=1, clear_previous_results=False)#
Create tuples of input and output for every possible combination of arguments.
Note
If you overwrite this method in a subclass, make sure your method calls
_construct_result_dfs
in order to save the results from your run as DataFrames. Then, they can later be used for evaluation, aggregation, and persistence.
- to_csv(path, get_all_cols=True, **kwargs)#
Export the results to a CSV file. If the experiment has not been executed, it will run.
- Parameters:
path (str) – path/buffer to write the CSV output
get_all_cols (bool) – defaults to
False
. IfTrue
, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.**kwargs – optional arguments passed to
pd.DataFrame.to_csv()
- to_json(path=None, get_all_cols=True, **kwargs)#
Export the results to a JSON file. If the experiment has not been executed, it will run.
- Parameters:
path (Optional[str]) – path/buffer to write the JSON output, defaults to
None
which returns the JSON as a dictget_all_cols (bool) – defaults to
False
. IfTrue
, it will return the full data with all input arguments (including frozen ones), full model response (not just the text response), and scores.**kwargs – optional arguments passed to
pd.DataFrame.to_json()
- to_lora_json(instruction_extract, input_extract, output_extract, path=None, **kwargs)#
Export the results to a LoRA-format JSON file for fine-tuning. If the experiment has not been executed, it will run.
- Parameters:
instruction_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to
"instruction"
entry in the JSON fileinput_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to
"input"
entry in the JSON fileoutput_extract (Union[str, Callable]) – column name, or an extractor function that will accept a row of the result table and return a value assigned to
"output"
entry in the JSON filepath (Optional[str]) – path/buffer to write the JSON output, defaults to
None
which returns the JSON as a dict**kwargs – optional arguments passed to
pd.DataFrame.to_json()
- to_mongo_db(mongo_uri, database_name, collection_name)#
Insert the results of the experiment into MongoDB for persistence.
Note
You need to install the
pymongo
package to use this method.You need to run a local or remote instance of MongoDB in order to store the data.
- to_pandas_df(get_all_cols=True, from_streamlit=False)#
Return the results as a
pandas.DataFrame
. If the experiment has not been executed, it will run.
- visualize(get_all_cols=False, pivot=False, pivot_columns=[])#
Visualize the DataFrame in one of two versions: 1.
get_all_cols = False
- good for visualization. This contains dynamic (non-frozen) input arguments,the text response, and scores (e.g. latency and metrics generated from evaluation).
get_all_cols = True
- good for full result. This contains full data with allinput arguments (including frozen ones), full model response (not just the text response), and scores.
Integrated Experiment APIs#
LLMs#
- class prompttools.experiment.OpenAIChatExperiment(model=['gpt-3.5-turbo'], messages=[], temperature=[1.0], top_p=[1.0], n=[1], stream=[False], stop=[None], max_tokens=[inf], presence_penalty=[0.0], frequency_penalty=[0.0], logit_bias=[None], response_format=[None], seed=[None], functions=[None], function_call=[None], azure_openai_service_configs=None)#
This class defines an experiment for OpenAI’s chat completion API. It accepts lists for each argument passed into OpenAI’s API, then creates a cartesian product of those arguments, and gets results for each.
Note
All arguments here should be a
list
, even if you want to keep the argument frozen (i.e.temperature=[1.0]
), because the experiment will try all possible combination of the input arguments.For detailed description of the input arguments, please reference at OpenAI’s chat completion API.
- Parameters:
model (list[str]) – list of ID(s) of the model(s) to use, e.g.
["gpt-3.5-turbo", "ft:gpt-3.5-turbo:org_id"]
If you are using Azure OpenAI service, put the models’ deployment names heremessages (list[dict]) – A list of messages comprising the conversation so far. Each message is represented as a dictionary with the following keys:
role: str
,content: str
.temperature (list[float]) – Defaults to [1.0]. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p (list[float]) – Defaults to [1.0]. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n (list[int]) – Defaults to [1]. How many chat completion choices to generate for each input message.
stream (list[bool]) – Defaults to [False]. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
stop (list[list[str]]) – Defaults to [None]. Up to 4 sequences where the API will stop generating further tokens.
max_tokens (list[int]) – Defaults to [inf]. The maximum number of tokens to generate in the chat completion.
presence_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
frequency_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
logit_bias (list[dict]) – Defaults to [None]. Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
functions (list[dict]) – Defaults to [None]. A list of dictionaries, each of which contains the definition of a function the model may generate JSON inputs for.
function_call (list[dict]) – Defaults to [None]. A dictionary containing the name and arguments of a function that should be called, s generated by the model.
response_format (list[Optional[dict]]) – Setting to { type: “json_object” } enables JSON mode, which guarantees the message the model generates is valid JSON.
seed (list[Optional[int]]) – This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
azure_openai_service_configs (Optional[dict]) – Defaults to
None
. If it is set, the experiment will use Azure OpenAI Service. The input dict should contain these 2 keys (but with values based on your use case and configuration):{"AZURE_OPENAI_ENDPOINT": "https://YOUR_RESOURCE_NAME.openai.azure.com/", "API_VERSION": "2023-05-15"}
- class prompttools.experiment.OpenAICompletionExperiment(model, prompt, suffix=[None], max_tokens=[inf], temperature=[1.0], top_p=[1.0], n=[1], stream=[False], logprobs=[None], echo=[False], stop=[None], presence_penalty=[0], frequency_penalty=[0], best_of=[1], logit_bias=[None], azure_openai_service_configs=None)#
This class defines an experiment for OpenAI’s completion API. It accepts lists for each argument passed into OpenAI’s API, then creates a cartesian product of those arguments, and gets results for each.
Note
All arguments here should be a
list
, even if you want to keep the argument frozen (i.e.temperature=[1.0]
), because the experiment will try all possible combination of the input arguments.For detailed description of the input arguments, please reference at OpenAI’s completion API.
- Parameters:
model (list[str]) – list of ID(s) of the model(s) to use, e.g.
["gpt-3.5-turbo", "ft:gpt-3.5-turbo:org_id"]
If you are using Azure OpenAI service, put the models’ deployment names hereprompt (list[str]) – the prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.
suffix (Optional[List[str]]) – (list[str]): Defaults to [None]. the suffix(es) that comes after a completion of inserted text.
max_tokens (list[int]) – Defaults to [inf]. The maximum number of tokens to generate in the chat completion.
temperature (list[float]) – Defaults to [1.0]. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p (list[float]) – Defaults to [1.0]. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n (list[int]) – Defaults to [1]. How many chat completion choices to generate for each input message.
stream (list[bool]) – Defaults to [False]. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
logprobs (list[int]) – Defaults to [None]. Include the log probabilities on the
logprobs
most likely tokens, as well the chosen tokens.echo (list[bool]) – Echo back the prompt in addition to the completion.
stop (list[list[str]]) – Defaults to [None]. Up to 4 sequences where the API will stop generating further tokens.
presence_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
frequency_penalty (list[float]) – Defaults to [0.0]. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
best_of (list[int]) – Generates
best_of
completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed.logit_bias (list[dict]) – Defaults to
[None]
. Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.azure_openai_service_configs (Optional[dict]) –
Defaults to
None
. If it is set, the experiment will use Azure OpenAI Service. The input dict should contain these 3 keys (but with values based on your use case and configuration): ``{“AZURE_OPENAI_ENDPOINT”: “https://YOUR_RESOURCE_NAME.openai.azure.com/”,”API_TYPE”: “azure”, “API_VERSION”: “2023-05-15”``
- class prompttools.experiment.AnthropicCompletionExperiment(model, prompt, metadata=[None], max_tokens_to_sample=[1000], stop_sequences=[None], stream=[False], temperature=[None], top_k=[None], top_p=[None], timeout=[600.0])#
This class defines an experiment for Anthropic’s completion API. It accepts lists for each argument passed into Anthropic’s API, then creates a cartesian product of those arguments, and gets results for each.
Note
All arguments here should be a
list
, even if you want to keep the argument frozen (i.e.temperature=[1.0]
), because the experiment will try all possible combination of the input arguments.You should set
os.environ["ANTHROPIC_API_KEY"] = YOUR_KEY
in order to connect with Anthropic’s API.
- Parameters:
max_tokens_to_sample (list[int]) – A list of integers representing The maximum number of tokens to generate before stopping.
model (list[str]) – the model(s) that will complete your prompt (e.g. “claude-2”, “claude-instant-1”)
prompt (list[str]) – Input prompt. For proper response generation you will need to format your prompt as follows:
f"{HUMAN_PROMPT} USER_QUESTION {AI_PROMPT}"
, you can get built-in string by importingfrom anthropic HUMAN_PROMPT, AI_PROMPT
metadata (list) – list of object(s) describing metadata about the request.
stop_sequences (list[list[str]], optional) – Sequences that will cause the model to stop generating completion text
stream (list[bool], optional) – Whether to incrementally stream the response using server-sent events.
temperature (list[float], optional) – The amount of randomness injected into the response
top_k (list[int], optional) – Only sample from the top K options for each subsequent token.
timeout (list[float], optional) – Override the client-level default timeout for this request, in seconds. Defaults to [600.0].
- class prompttools.experiment.HuggingFaceHubExperiment(repo_id, prompt, task=['text-generation'], **kwargs)#
Experiment for Hugging Face Hub’s API. It accepts lists for each argument passed into Hugging Face Hub’s API, then creates a cartesian product of those arguments, and gets results for each.
Note
All arguments here should be a
list
, even if you want to keep the argument frozen (i.e.temperature=[1.0]
), because the experiment will try all possible combination of the input arguments. For example,kwargs
should have string keys, with ``list``s being the values.
- Parameters:
repo_id (List[str]) – IDs of repository (e.g. [user/bert-base-uncased]).
prompt (List[str] | List[PromptSelector]) – list of prompts to test
task (List[str]) – List of tasks in strings. Determines whether to force a task instead of using task specified in the repository.
**kwargs (Dict[str, list[object]]) – Keyword parameters used in the call to
InferenceApi
. The values should be ``list``s.
- class prompttools.experiment.GooglePaLMCompletionExperiment(model, prompt, temperature=[None], candidate_count=[None], max_output_tokens=[None], top_p=[None], top_k=[None], safety_settings=[None], stop_sequences=[None])#
This class defines an experiment for Google PaLM’s generate text API. It accepts lists for each argument passed into PaLM’s API, then creates a cartesian product of those arguments, and gets results for each.
Note
All arguments here should be a
list
, even if you want to keep the argument frozen (i.e.temperature=[1.0]
), because the experiment will try all possible combination of the input arguments.You should set
os.environ["GOOGLE_PALM_API_KEY"] = YOUR_KEY
in order to connect with PaLM’s API.
- Parameters:
model (list[str]) – Which model to call, as a string or a
types.Model
(e.g.'models/text-bison-001'
).prompt (list[str]) – Free-form input text given to the model. Given a prompt, the model will generate text that completes the input text.
temperature (list[float]) – Controls the randomness of the output. Must be positive. Typical values are in the range:
[0.0, 1.0]
. Higher values produce a more random and varied response. A temperature of zero will be deterministic.candidate_count (list[int]) – The maximum number of generated response messages to return. This value must be between
[1, 8]
, inclusive. If unset, this will default to1
.max_output_tokens (list[int]) – Maximum number of tokens to include in a candidate. Must be greater than zero. If unset, will default to
64
.top_k (list[float]) – The API uses combined nucleus and top-k sampling.
top_k
sets the maximum number of tokens to sample from on each step.top_p (list[float]) – The API uses combined nucleus and top-k sampling.
top_p
configures the nucleus sampling. It sets the maximum cumulative probability of tokens to sample from.safety_settings (list[Iterable[palm.types.SafetySettingDict]]) – A list of unique
types.SafetySetting
instances for blocking unsafe content.stop_sequences (list[Union[str, Iterable[str]]]) – A set of up to 5 character sequences that will stop output generation. If specified, the API will stop at the first appearance of a stop sequence.
- class prompttools.experiment.GoogleVertexChatCompletionExperiment(model, message, context=[None], examples=[None], temperature=[None], max_output_tokens=[None], top_p=[None], top_k=[None], stop_sequences=[None])#
This class defines an experiment for Google Vertex AI’s chat API. It accepts lists for each argument passed into Vertex AI’s API, then creates a cartesian product of those arguments, and gets results for each.
Note
All arguments here should be a
list
, even if you want to keep the argument frozen (i.e.temperature=[1.0]
), because the experiment will try all possible combination of the input arguments.You need to set up your Google Vertex AI credentials properly before executing this experiment. One option is to execute on Google Cloud’s Colab.
- Parameters:
model (list[str]) – Which model to call, as a string or a
types.Model
(e.g.'models/text-bison-001'
).message (list[str]) – Message for the chat model to respond.
context (list[str]) – Context shapes how the model responds throughout the conversation. For example, you can use context to specify words the model can or cannot use, topics to focus on or avoid, or the response format or style.
examples (list[list['InputOutputTextPair']]) – Examples for the model to learn how to respond to the conversation.
temperature (list[float]) – Controls the randomness of the output. Must be positive. Typical values are in the range:
[0.0, 1.0]
. Higher values produce a more random and varied response. A temperature of zero will be deterministic.max_output_tokens (list[int]) – Maximum number of tokens to include in a candidate. Must be greater than zero. If unset, will default to
64
.top_k (list[float]) – The API uses combined nucleus and top-k sampling.
top_k
sets the maximum number of tokens to sample from on each step.top_p (list[float]) – The API uses combined nucleus and top-k sampling.
top_p
configures the nucleus sampling. It sets the maximum cumulative probability of tokens to sample from.stop_sequences (list[Union[str, Iterable[str]]]) – A set of up to 5 character sequences that will stop output generation. If specified, the API will stop at the first appearance of a stop sequence.
- class prompttools.experiment.LlamaCppExperiment(model_path, prompt, model_params={}, call_params={})#
Used to experiment across parameters for a local model, supported by LlamaCpp and GGML.
Note
All arguments here should be a
list
, even if you want to keep the argument frozen (i.e.temperature=[1.0]
), because the experiment will try all possible combination of the input arguments. For example,model_params
should have string keys, with ``list``s being the values.
- Parameters:
model_path (List[str]) – list of paths to the models that you would like to run
prompt (List[str] | List[PromptSelector]) – list of prompts to test
model_params (Dict[str, list[object]]) – Parameters for initializing the model. The values should be ``list``s.
call_params (Dict[str, list[object]]) – (Dict[str, list[object]]): Parameters for calling the model completion function. The values should be ``list``s.
- class prompttools.experiment.ReplicateExperiment(models, input_kwargs, model_specific_kwargs={}, use_image_model=False)#
Perform an experiment with the Replicate API for both image models and LLMs.
Note
Set your API token to
os.environ["REPLICATE_API_TOKEN"]
. If you are using an image model, setuse_image_model=True
as input argument.- Parameters:
models (list[str]) – “stability-ai/stable-diffusion:27b93a2413e”
input_kwargs (dict[str, list]) – keyword arguments that can be used across all models
model_specific_kwargs (dict[str, dict[str, list]]) – model-specific keyword arguments that will only be used by a specific model (e.g.
stability-ai/stable-diffusion:27b93a2413
use_image_model (bool) – Defaults to
False
, must set toTrue
to render output from image models.
Frameworks#
- class prompttools.experiment.SequentialChainExperiment(llm, prompt_template, prompt, **kwargs)#
Experiment for testing LangChain’s sequential chains.
- class prompttools.experiment.RouterChainExperiment(llm, prompt_infos, prompt, **kwargs)#
Experiment for testing LangChain’s router chains.
- class prompttools.experiment.MindsDBExperiment(db_connector, **kwargs)#
An experiment class for MindsDB. This accepts combinations of MindsDB inputs to form SQL queries, returning a list of responses.
- Parameters:
db_connector (CMySQLConnection) – Connector MindsDB
kwargs (dict) – keyword arguments for the model
Vector DBs#
- class prompttools.experiment.ChromaDBExperiment(chroma_client, collection_name, use_existing_collection, query_collection_params, embedding_fns=[None], embedding_fn_names=['default'], add_to_collection_params=None)#
Perform an experiment with
ChromaDB
to test different embedding functions or retrieval arguments. You can query from an existing collection, or create a new one (and insert documents into it) during the experiment. If you choose to create a new collection, it will be automatically cleaned up as the experiment ends.- Parameters:
chroma_client (chromadb.Client) – ChromaDB client to interact with your database
collection_name (str) – the collection that you will get or create
use_existing_collection (bool) – determines whether to create a new collection or use an existing one
query_collection_params (dict[str, list]) – parameters used to query the collection Each value is expected to be a list to create all possible combinations
embedding_fns (list[Callable]) – embedding functions to test in the experiment by default only uses the default one in ChromaDB
embedding_fn_names (list[str]) – names of the embedding functions
add_to_collection_params (Optional[dict]) – documents or embeddings that will be added to the newly created collection
- class prompttools.experiment.WeaviateExperiment(client, class_name, use_existing_data, property_names, text_queries, query_builders={'default': <function default_query_builder>}, vectorizers_and_moduleConfigs=None, property_definitions=None, data_objects=None, distance_metrics=None, vectorIndexConfigs=None)#
Perform an experiment with Weaviate to test different vectorizers or querying functions. You can query from an existing class, or create a new one (and insert data objects into it) during the experiment. If you choose to create a new class, it will be automatically cleaned up as the experiment ends.
- Parameters:
client (weaviate.Client) – The Weaviate client instance to interact with the Weaviate server.
class_name (str) – The name of the Weaviate class (equivalent to a collection in ChromaDB).
use_existing_data (bool) – If
True
, indicates that existing data will be used for the experiment. IfFalse
, new data objects will be inserted into Weaviate during the experiment.property_names (list[str]) – List of property names in the Weaviate class to be used in the experiment.
text_queries (list[str]) – List of text queries to be used for retrieval in the experiment.
query_builders (dict[str, Callable], optional) – A dictionary containing different query builders. The key should be the name of the function for visualization purposes. The value should be a Callable function that constructs and returns a Weaviate query object. Defaults to a built-in query function.
vectorizers_and_moduleConfigs (Optional[list[tuple[str, dict]]], optional) – List of tuples, where each tuple contains the name of the vectorizer and its corresponding moduleConfig as a dictionary. This is used during data insertion (if necessary).
property_definitions (Optional[list[dict]], optional) – List of property definitions for the Weaviate class. Each property definition is a dictionary containing the property name and data type. This is used during data insertion (if necessary).
data_objects (Optional[list], optional) – List of data objects to be inserted into Weaviate during the experiment. Each data object is a dictionary representing the property-value pairs.
distance_metrics (Optional[list[str]], optional) – List of distance metrics to be used in the experiment. These metrics will be used for generating vectorIndexConfig. This is used to define the class object. If necessary, either use
distance_metrics
orvectorIndexConfigs
, not both.vectorIndexConfigs (Optional[list[dict]], optional) – List of vectorIndexConfig to be used in the experiment to define the class object.
Note
If
use_existing_data
isFalse
, the experiment will create a new Weaviate class and insertdata_objects
into it. The class anddata_objects
will be automatically cleaned up at the end of the experiment.Either use existing data or specify
data_objs
andvectorizers
for insertion.Either
distance_metrics
orvectorIndexConfigs
should be provided if necessary, not both.If you pass in a custom
query_builder
function, it should accept the same parameters as the default one as seen here.
- class prompttools.experiment.LanceDBExperiment(embedding_fns, query_args, uri='lancedb', table_name='table', use_existing_table=False, data=None, text_col_name='text', clean_up=False)#
Perform an experiment with
LanceDB
to test different embedding functions or retrieval arguments. You can query from an existing table, or create a new one (and insert documents into it) during the experiment.- Parameters:
uri (str) – LanceDB uri to interact with your database. Default is “lancedb”
table_name (str) – the table that you will get or create. Default is “table”
use_existing_table (bool) – determines whether to create a new collection or use an existing one
embedding_fns (list[Callable]) – embedding functions to test in the experiment by default only uses the default one in LanceDB
query_args (dict[str, list]) – parameters used to query the table Each value is expected to be a list to create all possible combinations
data (Optional[list[dict]]) – documents or embeddings that will be added to the newly created table
text_col_name (str) – name of the text column in the table. Default is “text”
clean_up (bool) – determines whether to drop the table after the experiment ends
- class prompttools.experiment.QdrantExperiment(client, collection_name, embedding_fn, vector_size, documents, queries, collection_params=None, query_params=None)#
- class prompttools.experiment.PineconeExperiment(index_name, use_existing_index, query_index_params, create_index_params=None, data=None)#
Perform an experiment with
Pinecone
to test different embedding functions or retrieval arguments. You can query from an existing collection, or create a new one (and insert documents into it) during the experiment. If you choose to create a new collection, it will be automatically cleaned up as the experiment ends.- Parameters:
index_name (str) – the index that you will use or create
use_existing_index (bool) – determines whether to create a new collection or use an existing one
query_index_params (dict[str, list]) – parameters used to query the collection Each value is expected to be a list to create all possible combinations
create_index_params (Optional[dict]) – configuration of the new index (e.g. number of dimensions, distance function)
data (Optional[list]) – documents or embeddings that will be added to the newly created collection
Computer Vision#
- class prompttools.experiment.StableDiffusionExperiment(hf_model_path, prompt, compare_images_folder, use_auth_token=False, **kwargs)#
Experiment for experiment with the Stable Diffusion model.
- class prompttools.experiment.ReplicateExperiment(models, input_kwargs, model_specific_kwargs={}, use_image_model=False)#
Perform an experiment with the Replicate API for both image models and LLMs.
Note
Set your API token to
os.environ["REPLICATE_API_TOKEN"]
. If you are using an image model, setuse_image_model=True
as input argument.- Parameters:
models (list[str]) – “stability-ai/stable-diffusion:27b93a2413e”
input_kwargs (dict[str, list]) – keyword arguments that can be used across all models
model_specific_kwargs (dict[str, dict[str, list]]) – model-specific keyword arguments that will only be used by a specific model (e.g.
stability-ai/stable-diffusion:27b93a2413
use_image_model (bool) – Defaults to
False
, must set toTrue
to render output from image models.