Sunday, February 25, 2024
Show HN: Continuous-eval – Granular evaluation of GenAI pipelines https://ift.tt/wZMoYhV
Show HN: Continuous-eval – Granular evaluation of GenAI pipelines Hi HN - we are the creators of “continuous-eval”, an open-source tool to test and evaluate generative AI apps. "Continuous-eval" came from our efforts to measure, validate and improve the reliability of a finance AI copilot we were developing for banks. End-to-end evaluation was not enough for us. We wanted to have granular evaluations that help pinpoint the bottlenecks and identify what / how to improve. We’ve since developed more metrics and made the framework more flexible so it can evaluate components like agent tool use, code change, retrieval steps, etc. Let us know what you think of our approach to GenAI App evaluation. https://ift.tt/68l5tSX February 26, 2024 at 12:11AM
Subscribe to:
Post Comments (Atom)
Show HN: AI quiz generator from any topic or book in seconds https://ift.tt/8f7I9vU
Show HN: AI quiz generator from any topic or book in seconds https://www.wiyomi.com April 10, 2025 at 10:57AM
-
Show HN: High school robotics code/CAD/design binder release Hello HN! My name is Patrick, and I am a junior at my High School’s FRC robotic...
-
Show HN: D&D meets Siri – Interactive voice adventure Hey HN! I've been building tooling for voice-driven apps over the past few mon...
-
Show HN: I Made an AI Social Media Manager to Automate Content Creation Hey HN, I am a Solopreneur, and I love building apps to automate bor...
No comments:
Post a Comment