mirror of
https://github.com/aimingmed/aimingmed-ai.git
synced 2026-02-05 22:53:23 +08:00
78 lines
5.2 KiB
Python
78 lines
5.2 KiB
Python
system_router = """You are an expert at routing a user question to a vectorstore or web search.
|
|
The vectorstore contains documents related to any cancer/tumor disease. The question may be
|
|
asked in a variety of languages, and may be phrased in a variety of ways.
|
|
Use the vectorstore for questions on these topics. Otherwise, use web-search.
|
|
"""
|
|
|
|
system_retriever_grader = """You are a grader assessing relevance of a retrieved document to a user question. \n
|
|
If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
|
|
It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
|
|
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""
|
|
|
|
system_hallucination_grader = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
|
|
Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
|
|
|
|
system_answer_grader = """You are a grader assessing whether an answer addresses / resolves a question \n
|
|
Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question."""
|
|
|
|
system_question_rewriter = """You a question re-writer that converts an input question to a better version that is optimized \n
|
|
for vectorstore retrieval. Look at the input and try to reason about the underlying semantic intent / meaning."""
|
|
|
|
|
|
# Evaluation
|
|
CORRECTNESS_PROMPT = """You are an impartial judge. Evaluate Student Answer against Ground Truth for conceptual similarity and correctness.
|
|
You may also be given additional information that was used by the model to generate the output.
|
|
|
|
Your task is to determine a numerical score called correctness based on the Student Answer and Ground Truth.
|
|
A definition of correctness and a grading rubric are provided below.
|
|
You must use the grading rubric to determine your score.
|
|
|
|
Metric definition:
|
|
Correctness assesses the degree to which a provided Student Answer aligns with factual accuracy, completeness, logical
|
|
consistency, and precise terminology of the Ground Truth. It evaluates the intrinsic validity of the Student Answer , independent of any
|
|
external context. A higher score indicates a higher adherence to factual accuracy, completeness, logical consistency,
|
|
and precise terminology of the Ground Truth.
|
|
|
|
Grading rubric:
|
|
Correctness: Below are the details for different scores:
|
|
- 1: Major factual errors, highly incomplete, illogical, and uses incorrect terminology.
|
|
- 2: Significant factual errors, incomplete, noticeable logical flaws, and frequent terminology errors.
|
|
- 3: Minor factual errors, somewhat incomplete, minor logical inconsistencies, and occasional terminology errors.
|
|
- 4: Few to no factual errors, mostly complete, strong logical consistency, and accurate terminology.
|
|
- 5: Accurate, complete, logically consistent, and uses precise terminology.
|
|
|
|
Reminder:
|
|
- Carefully read the Student Answer and Ground Truth
|
|
- Check for factual accuracy and completeness of Student Answer compared to the Ground Truth
|
|
- Focus on correctness of information rather than style or verbosity
|
|
- The goal is to evaluate factual correctness and completeness of the Student Answer.
|
|
- Please provide your answer score only with the numerical number between 1 and 5. No score: or other text is allowed.
|
|
|
|
"""
|
|
|
|
FAITHFULNESS_PROMPT = """You are an impartial judge. Evaluate output against context for faithfulness.
|
|
You may also be given additional information that was used by the model to generate the Output.
|
|
|
|
Your task is to determine a numerical score called faithfulness based on the output and context.
|
|
A definition of faithfulness and a grading rubric are provided below.
|
|
You must use the grading rubric to determine your score.
|
|
|
|
Metric definition:
|
|
Faithfulness is only evaluated with the provided output and context. Faithfulness assesses how much of the
|
|
provided output is factually consistent with the provided context. A higher score indicates that a higher proportion of
|
|
claims present in the output can be derived from the provided context. Faithfulness does not consider how much extra
|
|
information from the context is not present in the output.
|
|
|
|
Grading rubric:
|
|
Faithfulness: Below are the details for different scores:
|
|
- Score 1: None of the claims in the output can be inferred from the provided context.
|
|
- Score 2: Some of the claims in the output can be inferred from the provided context, but the majority of the output is missing from, inconsistent with, or contradictory to the provided context.
|
|
- Score 3: Half or more of the claims in the output can be inferred from the provided context.
|
|
- Score 4: Most of the claims in the output can be inferred from the provided context, with very little information that is not directly supported by the provided context.
|
|
- Score 5: All of the claims in the output are directly supported by the provided context, demonstrating high faithfulness to the provided context.
|
|
|
|
Reminder:
|
|
- Carefully read the output and context
|
|
- Focus on the information instead of the writing style or verbosity.
|
|
- Please provide your answer score only with the numerical number between 1 and 5, according to the grading rubric above. No score: or other text is allowed.
|
|
""" |