Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): Automatically extract evaluation metrics (no-changelog) #14051

Open
wants to merge 6 commits into
base: ai-769-add-metrics-node_rebased
Choose a base branch
from

Conversation

OlegIvaniv
Copy link
Contributor

Summary

This PR implements automatic metrics extraction from evaluation workflow result data. Because it's using the new Evaluation Metrics node, #14050 needs to be merged first.

Key Changes

  1. Removed Separate Controller
    • The metrics controller (metrics.controller.ts) has been removed
    • Metrics are now automatically extracted from the evaluation workflow rather than being managed separately
  2. Simplified Metrics Extraction
    • Instead of using the last node output, metrics are now collected from all EvaluationMetrics nodes in the workflow
    • The system merges metrics from multiple nodes into a single result object
  3. Removed UI for Manual Metrics Management
    • The frontend UI for managing metrics has been removed (MetricsInput.vue)
    • Metrics sections have been removed from the test definition form
  4. Removed Error Cases
    • The UNKNOWN_METRICS error code has been removed as metrics are now automatically synced

How It Works Now

  1. Defining Metrics
    • Users add one or more EvaluationMetrics nodes to their evaluation workflow
    • Each node defines metrics using name/value pairs, with values restricted to numbers
  2. Running Tests
    • When a test is run, the system first syncs metrics by scanning the evaluation workflow
    • It finds all EvaluationMetrics nodes and extracts metric names
    • It updates the database by adding new metrics and removing unused ones
  3. Collecting Results
    • When each test case is executed, results are collected from all EvaluationMetrics nodes
    • Results from multiple nodes are merged into a single metrics object
    • This object is then validated and stored with the test case execution

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.

Verified

This commit was signed with the committer’s verified signature.

Verified

This commit was signed with the committer’s verified signature.
@OlegIvaniv OlegIvaniv changed the base branch from master to ai-769-add-metrics-node_rebased March 19, 2025 13:00
@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Mar 19, 2025

Verified

This commit was signed with the committer’s verified signature.

Verified

This commit was signed with the committer’s verified signature.

Verified

This commit was signed with the committer’s verified signature.
Copy link

codecov bot commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 95.12195% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...rc/views/TestDefinition/TestDefinitionEditView.vue 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@jeanpaul jeanpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good already! Couple of small comments that would be good to check out

const lastNodeExecuted = execution.data.resultData.lastNodeExecuted;
assert(lastNodeExecuted, 'Could not find the last node executed in evaluation workflow');
const metricsNodes = evaluationWorkflow.nodes.filter(
(node) => node.type === 'n8n-nodes-base.evaluationMetrics',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This node type is repeated a couple of times (twice here and in tests) -- does it make sense to make this into a constant, so that if it ever gets updated it's easy to change in one place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add (a) test(s) for multiple evaluation metrics nodes? That it picks up the different names across the nodes and/or takes the last executed one if multiple nodes specify the same metric?

@@ -359,6 +421,9 @@ export class TestRunnerService {
pastExecutions.map((e) => e.id),
);

// Sync the metrics of the test definition with the evaluation workflow
await this.syncMetrics(test.id, evaluationWorkflow);

// Get the metrics to collect from the evaluation workflow
const testMetricNames = await this.getTestMetricNames(test.id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now doing the same query as in syncMetrics -- perhaps better would be to return the metric names from syncMetrics in order to avoid an extra DB query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants