[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator #1886

jnanliu · 2025-02-23T03:29:19Z

Motivation

To support dataset repeat for multi-run
To support general g-pass computation for each evaluator

Modification

BaseDataset: modify __init__ to implement dataset repeat after load method
BaseEvaluator: add evaluate method to compute average and g-pass (only accuracy-like evaluators)

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

opencompass/utils/build.py

…into general-gpass

opencompass/configs/datasets/livemathbench/livemathbench_gen_9befbf.py

MaiziXiao · 2025-02-26T09:18:51Z

opencompass/configs/datasets/livemathbench/livemathbench_gen_9befbf.py

-            k=[4, 8, 16],
-            replication=3,
-            thresholds=[0.0, 0.25, 0.5, 0.75, 1.0]
+            url=[]


What's the url for?

for defining jude model urls

MaiziXiao

LGTM

jnanliu added 3 commits February 23, 2025 03:05

support dataset repeat and g-pass compute for each evaluator

8def693

fix pre-commit errors

762b66d

delete print

6d5a996

mm-assistant bot assigned tonysy Feb 23, 2025

jnanliu temporarily deployed to prod February 23, 2025 08:21 — with GitHub Actions Inactive

MaiziXiao reviewed Feb 24, 2025

View reviewed changes

opencompass/utils/build.py Show resolved Hide resolved

delete gpassk_evaluator and fix potential errors

2349fcf

jnanliu changed the title ~~[Feature ] Support Dataset Repeat and G-Pass Compute for Each Evaluator~~ [Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator Feb 24, 2025

jnanliu added 2 commits February 24, 2025 08:11

change repeat to n

b0330ef

fix repeat to n in openicl_eval

4e63ebb

jnanliu temporarily deployed to prod February 25, 2025 07:10 — with GitHub Actions Inactive

jnanliu added 7 commits February 25, 2025 08:21

update doc for multi-run and g-pass

4e07fcb

update latex equation in doc

8ebb8a5

update eng doc for multi-run and g-pass

c1fe59d

update datasets.md

2915d77

update datasets.md

91111ce

fix multi-line equation

fed2df4

fix multi-line equation

516313d

jnanliu had a problem deploying to prod February 25, 2025 09:30 — with GitHub Actions Error

fix multi-line equation

a7d15f8

jnanliu had a problem deploying to prod February 25, 2025 09:32 — with GitHub Actions Error

fix multi-line equation

6a6ac3c

jnanliu had a problem deploying to prod February 25, 2025 09:36 — with GitHub Actions Error

fix multi-line equation

fea7411

jnanliu had a problem deploying to prod February 25, 2025 09:40 — with GitHub Actions Error

fix multi-line equation

7fc189d

jnanliu had a problem deploying to prod February 25, 2025 09:41 — with GitHub Actions Error

fix multi-line equation in zh_cn user_guides

76381c9

jnanliu had a problem deploying to prod February 25, 2025 09:42 — with GitHub Actions Error

mmodify pre-commit-zh-cn

830142e

jnanliu temporarily deployed to prod February 25, 2025 09:48 — with GitHub Actions Inactive

jnanliu added 2 commits February 26, 2025 03:53

recover pre-commit and edit math expr in doc

46cd631

Merge branch 'main' into general-gpass

bb4d53e

jnanliu had a problem deploying to prod February 26, 2025 03:57 — with GitHub Actions Error

jnanliu added 2 commits February 26, 2025 04:01

del [TIP]

9759467

Merge branch 'general-gpass' of https://github.com/jnanliu/opencompass …

12f4604

…into general-gpass

jnanliu temporarily deployed to prod February 26, 2025 04:02 — with GitHub Actions Inactive

del cite tag in doc

66b1c6c

jnanliu temporarily deployed to prod February 26, 2025 04:23 — with GitHub Actions Inactive

MaiziXiao reviewed Feb 26, 2025

View reviewed changes

opencompass/configs/datasets/livemathbench/livemathbench_gen_9befbf.py Outdated Show resolved Hide resolved

del extract_model param in livemathbench config

32a8d81

jnanliu temporarily deployed to prod February 26, 2025 06:40 — with GitHub Actions Inactive

MaiziXiao reviewed Feb 26, 2025

View reviewed changes

MaiziXiao approved these changes Feb 26, 2025

View reviewed changes

MaiziXiao merged commit 73c8095 into open-compass:main Feb 26, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator #1886

[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator #1886

jnanliu commented Feb 23, 2025 •

edited

Loading

MaiziXiao Feb 26, 2025

jnanliu Feb 26, 2025

MaiziXiao left a comment

[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator #1886

[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator #1886

Conversation

jnanliu commented Feb 23, 2025 • edited Loading

Motivation

Modification

Checklist

MaiziXiao Feb 26, 2025

Choose a reason for hiding this comment

jnanliu Feb 26, 2025

Choose a reason for hiding this comment

MaiziXiao left a comment

Choose a reason for hiding this comment

jnanliu commented Feb 23, 2025 •

edited

Loading