-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): Automatically extract evaluation metrics (no-changelog) #14051
base: ai-769-add-metrics-node_rebased
Are you sure you want to change the base?
feat(core): Automatically extract evaluation metrics (no-changelog) #14051
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good already! Couple of small comments that would be good to check out
const lastNodeExecuted = execution.data.resultData.lastNodeExecuted; | ||
assert(lastNodeExecuted, 'Could not find the last node executed in evaluation workflow'); | ||
const metricsNodes = evaluationWorkflow.nodes.filter( | ||
(node) => node.type === 'n8n-nodes-base.evaluationMetrics', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This node type is repeated a couple of times (twice here and in tests) -- does it make sense to make this into a constant, so that if it ever gets updated it's easy to change in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add (a) test(s) for multiple evaluation metrics nodes? That it picks up the different names across the nodes and/or takes the last executed one if multiple nodes specify the same metric?
@@ -359,6 +421,9 @@ export class TestRunnerService { | |||
pastExecutions.map((e) => e.id), | |||
); | |||
|
|||
// Sync the metrics of the test definition with the evaluation workflow | |||
await this.syncMetrics(test.id, evaluationWorkflow); | |||
|
|||
// Get the metrics to collect from the evaluation workflow | |||
const testMetricNames = await this.getTestMetricNames(test.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now doing the same query as in syncMetrics
-- perhaps better would be to return the metric names from syncMetrics
in order to avoid an extra DB query?
Summary
This PR implements automatic metrics extraction from evaluation workflow result data. Because it's using the new Evaluation Metrics node, #14050 needs to be merged first.
Key Changes
How It Works Now
Related Linear tickets, Github issues, and Community forum posts
Review / Merge checklist
release/backport
(if the PR is an urgent fix that needs to be backported)