feat(core): Automatically extract evaluation metrics (no-changelog) #14051

OlegIvaniv · 2025-03-19T12:58:14Z

Summary

This PR implements automatic metrics extraction from evaluation workflow result data. Because it's using the new Evaluation Metrics node, #14050 needs to be merged first.

Key Changes

Removed Separate Controller
- The metrics controller (metrics.controller.ts) has been removed
- Metrics are now automatically extracted from the evaluation workflow rather than being managed separately
Simplified Metrics Extraction
- Instead of using the last node output, metrics are now collected from all EvaluationMetrics nodes in the workflow
- The system merges metrics from multiple nodes into a single result object
Removed UI for Manual Metrics Management
- The frontend UI for managing metrics has been removed (MetricsInput.vue)
- Metrics sections have been removed from the test definition form
Removed Error Cases
- The UNKNOWN_METRICS error code has been removed as metrics are now automatically synced

How It Works Now

Defining Metrics
- Users add one or more EvaluationMetrics nodes to their evaluation workflow
- Each node defines metrics using name/value pairs, with values restricted to numbers
Running Tests
- When a test is run, the system first syncs metrics by scanning the evaluation workflow
- It finds all EvaluationMetrics nodes and extracts metric names
- It updates the database by adding new metrics and removing unused ones
Collecting Results
- When each test case is executed, results are collected from all EvaluationMetrics nodes
- Results from multiple nodes are merged into a single metrics object
- This object is then validated and stored with the test case execution

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

PR title and summary are descriptive. (conventions)
Docs updated or follow-up ticket created.
Tests included.
PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

codecov · 2025-03-19T13:32:50Z

Codecov Report

Attention: Patch coverage is 95.12195% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...rc/views/TestDefinition/TestDefinitionEditView.vue	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

jeanpaul

Looks good already! Couple of small comments that would be good to check out

jeanpaul · 2025-03-20T14:59:25Z

packages/cli/src/evaluation.ee/test-runner/test-runner.service.ee.ts

 		const lastNodeExecuted = execution.data.resultData.lastNodeExecuted;
 		assert(lastNodeExecuted, 'Could not find the last node executed in evaluation workflow');
+		const metricsNodes = evaluationWorkflow.nodes.filter(
+			(node) => node.type === 'n8n-nodes-base.evaluationMetrics',


This node type is repeated a couple of times (twice here and in tests) -- does it make sense to make this into a constant, so that if it ever gets updated it's easy to change in one place?

jeanpaul · 2025-03-20T15:05:27Z

packages/cli/src/evaluation.ee/test-runner/__tests__/evaluation-metrics.ee.test.ts

Should we add (a) test(s) for multiple evaluation metrics nodes? That it picks up the different names across the nodes and/or takes the last executed one if multiple nodes specify the same metric?

jeanpaul · 2025-03-20T15:09:34Z

packages/cli/src/evaluation.ee/test-runner/test-runner.service.ee.ts

@@ -359,6 +421,9 @@ export class TestRunnerService {
 				pastExecutions.map((e) => e.id),
 			);

+			// Sync the metrics of the test definition with the evaluation workflow
+			await this.syncMetrics(test.id, evaluationWorkflow);
+
 			// Get the metrics to collect from the evaluation workflow
 			const testMetricNames = await this.getTestMetricNames(test.id);


This is now doing the same query as in syncMetrics -- perhaps better would be to return the metric names from syncMetrics in order to avoid an extra DB query?

OlegIvaniv added 3 commits March 19, 2025 13:45

feat: Extract metrics from evaluation workflow

Verified

This commit was signed with the committer’s verified signature.

OlegIvaniv oleg

SSH Key Fingerprint: DbwMmCyxnL7iw2gNRo3ZzhTUcC1Iqf2xT7b7kn/u7nk
Verified
Learn about vigilant mode

d43e6a6

Remove metrics from front-end

Verified

This commit was signed with the committer’s verified signature.

OlegIvaniv oleg

SSH Key Fingerprint: DbwMmCyxnL7iw2gNRo3ZzhTUcC1Iqf2xT7b7kn/u7nk
Verified
Learn about vigilant mode

c2f7111

Fix unit tests

Verified

This commit was signed with the committer’s verified signature.

OlegIvaniv oleg

SSH Key Fingerprint: DbwMmCyxnL7iw2gNRo3ZzhTUcC1Iqf2xT7b7kn/u7nk
Verified
Learn about vigilant mode

Loading
Loading status checks…

74bace6

OlegIvaniv changed the base branch from master to ai-769-add-metrics-node_rebased March 19, 2025 13:00

n8n-assistant bot added core n8n team labels Mar 19, 2025

OlegIvaniv added 3 commits March 19, 2025 14:08

Remove missing metrics validation

Verified

This commit was signed with the committer’s verified signature.

OlegIvaniv oleg

SSH Key Fingerprint: DbwMmCyxnL7iw2gNRo3ZzhTUcC1Iqf2xT7b7kn/u7nk
Verified
Learn about vigilant mode

Loading
Loading status checks…

4e7a7ef

Update evaluation workflow template

Verified

This commit was signed with the committer’s verified signature.

OlegIvaniv oleg

SSH Key Fingerprint: DbwMmCyxnL7iw2gNRo3ZzhTUcC1Iqf2xT7b7kn/u7nk
Verified
Learn about vigilant mode

44fae0b

Linting fixes

Verified

This commit was signed with the committer’s verified signature.

OlegIvaniv oleg

SSH Key Fingerprint: DbwMmCyxnL7iw2gNRo3ZzhTUcC1Iqf2xT7b7kn/u7nk
Verified
Learn about vigilant mode

Loading
Loading status checks…

14bccbf

jeanpaul requested changes Mar 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): Automatically extract evaluation metrics (no-changelog) #14051

feat(core): Automatically extract evaluation metrics (no-changelog) #14051

OlegIvaniv commented Mar 19, 2025

codecov bot commented Mar 19, 2025

jeanpaul left a comment

jeanpaul Mar 20, 2025

jeanpaul Mar 20, 2025

jeanpaul Mar 20, 2025

feat(core): Automatically extract evaluation metrics (no-changelog) #14051

Are you sure you want to change the base?

feat(core): Automatically extract evaluation metrics (no-changelog) #14051

Conversation

OlegIvaniv commented Mar 19, 2025

Summary

Key Changes

How It Works Now

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

codecov bot commented Mar 19, 2025

Codecov Report

jeanpaul left a comment

Choose a reason for hiding this comment

jeanpaul Mar 20, 2025

Choose a reason for hiding this comment

jeanpaul Mar 20, 2025

Choose a reason for hiding this comment

jeanpaul Mar 20, 2025

Choose a reason for hiding this comment