prompttools.utils package#
Submodules#
prompttools.utils.autoeval module#
- prompttools.utils.autoeval.autoeval_binary_scoring(row, prompt_column_name, response_column_name='response')#
Uses auto-evaluation to score the model response with “gpt-4” as the judge, returning 0.0 or 1.0.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
prompt_column_name (str) – name of the column that contains the input prompt
response_column_name (str) – name of the column that contains the model’s response, defaults to
"response"
- Return type:
- prompttools.utils.autoeval.compute(prompt, response, model='gpt-4')#
Uses a high quality chat model, like GPT-4, to automatically evaluate a given prompt/response pair. Outputs can be 0 or 1.
- prompttools.utils.autoeval.evaluate(prompt, response, _metadata)#
Uses auto-evaluation to score the model response with “gpt-4” as the judge, returning 0.0 or 1.0.
prompttools.utils.error module#
prompttools.utils.expected module#
- prompttools.utils.expected.compute(prompt, model='gpt-4')#
Computes the expected result of a given prompt by using a high quality LLM, like GPT-4.
- prompttools.utils.expected.compute_similarity_against_model(row, prompt_column_name, model='gpt-4', response_column_name='response')#
Computes the similarity of a given response to the expected result generated from a high quality LLM (by default GPT-4) using the same prompt.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
prompt_column_name (str) – name of the column that contains the input prompt
model (str) – name of the model that will serve as the judge
response_column_name (str) – name of the column that contains the model’s response, defaults to
"response"
- Return type:
- prompttools.utils.expected.evaluate(prompt, response, model='gpt-4')#
Computes the similarity of a given response to the expected result generated from a high quality LLM (by default GPT-4) using the same prompt.
prompttools.utils.json module#
prompttools.utils.python module#
prompttools.utils.similarity module#
Use a list to optionally hold a reference to the embedding model and client, allowing for lazy initialization.
- prompttools.utils.similarity.compute(doc1, doc2, use_chroma=False)#
Computes the semantic similarity between two documents, using either ChromaDB or HuggingFace sentence_transformers.
- prompttools.utils.similarity.evaluate(prompt, response, metadata, expected)#
A simple test that checks semantic similarity between the expected response (provided by the user) and the model’s text responses.
- prompttools.utils.similarity.semantic_similarity(row, expected, response_column_name='response')#
A simple test that checks semantic similarity between the expected response (provided by the user) and the model’s text responses.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
expected (str) – the expected responses for each row in the column
response_column_name (str) – name of the column that contains the model’s response, defaults to
"response"
- Return type:
- prompttools.utils.similarity.structural_similarity(row, expected, response_column_name='response')#
Compute the structural similarity index measure (SSIM) between two images.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
expected (str) – the column name of the expected image responses in each row
response_column_name (str) – the column name that contains the model’s response, defaults to
"response"
- Return type:
Module contents#
- prompttools.utils.autoeval_binary_scoring(row, prompt_column_name, response_column_name='response')#
Uses auto-evaluation to score the model response with “gpt-4” as the judge, returning 0.0 or 1.0.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
prompt_column_name (str) – name of the column that contains the input prompt
response_column_name (str) – name of the column that contains the model’s response, defaults to
"response"
- Return type:
- prompttools.utils.autoeval_from_expected_response(row, expected, prompt_column_name, response_column_name='response')#
- prompttools.utils.autoeval_scoring(row, expected, response_column_name='response')#
Uses auto-evaluation to score the model response.
- Parameters:
- Return type:
- prompttools.utils.autoeval_with_documents(row, documents, response_column_name='response')#
Given a list of documents, score whether the model response is accurate with “gpt-4” as the judge, returning an integer score from 0 to 10.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
documents (list[str]) – documents to provide relevant context for the model to judge
response_column_name (str) – name of the column that contains the model’s response, defaults to
"response"
- Return type:
- prompttools.utils.chunk_text(text, max_chunk_length)#
Given a long string paragraph of text and a chunk max length, returns chunks of texts where each chunk’s length is smaller than the max length, without breaking up individual words (separated by space).
- prompttools.utils.compute_similarity_against_model(row, prompt_column_name, model='gpt-4', response_column_name='response')#
Computes the similarity of a given response to the expected result generated from a high quality LLM (by default GPT-4) using the same prompt.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
prompt_column_name (str) – name of the column that contains the input prompt
model (str) – name of the model that will serve as the judge
response_column_name (str) – name of the column that contains the model’s response, defaults to
"response"
- Return type:
- prompttools.utils.ranking_correlation(row, expected_ranking, ranking_column_name='top doc ids')#
A simple test that compares the expected ranking for a given query with the actual ranking produced by the embedding function being tested.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
expected_ranking (list) – the expected list of ranking to compare
ranking_column_name (str) – the column name of the actual ranking produced by the model, defaults to
"top doc ids"
- Return type:
Example
>>> EXPECTED_RANKING_LIST = [ >>> ["id1", "id3", "id2"], >>> ["id2", "id3", "id1"], >>> ["id1", "id3", "id2"], >>> ["id2", "id3", "id1"], >>> ] >>> experiment.evaluate("ranking_correlation", ranking_correlation, expected_ranking=EXPECTED_RANKING_LIST)
- prompttools.utils.semantic_similarity(row, expected, response_column_name='response')#
A simple test that checks semantic similarity between the expected response (provided by the user) and the model’s text responses.
- Parameters:
row (pandas.core.series.Series) – A row of data from the full DataFrame (including input, model response, other metrics, etc).
expected (str) – the expected responses for each row in the column
response_column_name (str) – name of the column that contains the model’s response, defaults to
"response"
- Return type:
- prompttools.utils.validate_json_response(row, response_column_name='response')#
Validate whether
responsestring is in a valid JSON format.
- prompttools.utils.validate_python_response(row, response_column_name='response')#
Validate whether
responsestring follows Python’s syntax.