Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(evals): add status enum for evaluation scores #2169

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ssbushi
Copy link
Contributor

@ssbushi ssbushi commented Feb 25, 2025

Fixes #2252

Checklist (if applicable):

@ssbushi ssbushi changed the title feat: breaking, add status enum for evaluations feat: add status enum for evaluations Mar 14, 2025
@ssbushi ssbushi marked this pull request as ready for review March 14, 2025 20:35
@ssbushi ssbushi changed the title feat: add status enum for evaluations feat(evals): add status enum for evaluation scores Mar 14, 2025
@ssbushi ssbushi requested review from pavelgj and shrutip90 March 14, 2025 20:35

export interface BaseGenkitMetricConfig {
type: GenkitMetric;
statusOverrideFn?: (score: Score) => EvalStatusEnum;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavelgj I made this override method to use the score and get a new status. should we make this even more generic and just pass in the full datapoint?

This is part of the plugin API so should be easy to fix.... I think?

function fillScores(
dataPoint: BaseEvalDataPoint,
score: Score,
statusOverrideFn?: (s: Score) => EvalStatusEnum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this fn need to know the metric? and maybe original status as well? so that one could do something like:

statusOverrideFn = ({score, originalStatus, metric}) => {
  if (metric === 'faithfulness') {
    return ...;
  }
  return originalStatus;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • originalStatus is available in the score object
  • Since this is configured per metric, I think the metric field is redundant.

I like the idea of using an object for future expansion, so I added that.

@ssbushi ssbushi requested a review from pavelgj March 24, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Evals][Tooling] Add support for status enum in evaluation results
2 participants