![[interface] image of a laptop with ai legal tech software interface in an office](https://cdn.prod.website-files.com/69f81c02e87a58f0517f69cb/69fad09fcfbd17e066cdf219_LLM%20Response%20Validation%20with%20LLMs%20(LLM-as-a-Judge)%20blogpost.png)
In short, LLM response validation is the use of an LLM to evaluate AI-generated texts based on user-specified criteria stated in the evaluation prompt.
Since AI-generated texts may not always align with facts or the user-provided context, issues of trustworthiness, traceability, quality, and even safety can arise. There is a need to detect false or hallucinated responses before delivering them to users.
Validation is not limited to factual alignment; it can also assess other aspects of responses, such as:
To validate, the AI-generated text together with the evaluation prompt (with criteria stated) is sent to the LLM. The LLM then provides comments or a score to indicate the evaluation results.
Given the user query/question and context information: {{QUERY +CONTEXT}} and the response: {{RESPONSE}}, examine whether the responseinformation is solely based on the context. Ensure that all facts described inthe response can be found in the context. Return your result as a label: ‘Yes’or ‘No,’ followed by a concise reason to support your decision.
You may wonder why an LLM is used to assess the responses of another LLM — or even itself. The point is that generating a response to a user query is more difficult than evaluating the quality of that response, since the latter focuses only on quality factors. Content critique is easier than content generation. Moreover, we can even use the same LLM for evaluation because this leverages different capabilities of the model.
Teams like Jardine reduced contract review time by 75% with ACE.
