Arthur Unveils Open Source Tool to Optimize LLM Selection

Mal McCallion
Aug 29, 2023
1 min read

Updated: Dec 11, 2023

Arthur, the trailblazing machine learning monitoring start-up, is turning heads again.

This time, it's introducing Arthur Bench, an innovative open-source tool designed to help users pinpoint the best Large Language Model (LLM) for their unique data sets.

Arthur's CEO and co-founder, Adam Wenchel, noted a surge in interest in generative AI and LLMs. In response, the company has been focusing on developing products to streamline the process of working with these models. Wenchel observed that companies currently lack a systematic way to measure the effectiveness of one tool against another. Enter Arthur Bench, the solution to this critical issue.

Arthur Bench is more than just a performance testing tool. It offers users the ability to test and measure how the types of prompts used in their specific applications perform against different LLMs. Imagine being able to test 100 different prompts and then compare the performance of two different LLMs, such as Anthropic and OpenAI. This tool allows you to do just that, and on a grand scale, enabling you to make a more informed decision on the best model for your specific use case.

While Arthur Bench is being released as an open-source tool, a SaaS version is also in the pipeline for customers who prefer to avoid the complexities of managing the open-source version or have larger testing requirements.

This new tool follows the release of Arthur Shield in May, an LLM firewall designed to detect model hallucinations while protecting against toxic information and private data leaks. With these innovative tools, Arthur continues to lead the way in optimising the use of LLMs in business applications.

Made with TRUST_AI - see the Charter: https://www.modelprop.co.uk/trust-ai

ModelProp

Arthur Unveils Open Source Tool to Optimize LLM Selection

Recent Posts

Comments