[Feature] Update SC #126

Leymore · 2023-07-28T08:47:59Z

The same as #57

gaotongxiao · 2023-08-01T02:59:59Z

docs/en/prompt/chain_of_thought.md

+```
+
+```{note}
+注意，OpenCompass 默认使用默认使用 argmax 的方式采样下一个 token，因此若不指定采样参数，模型每次的推理结果将会是完全一致的，多轮评测将会失效。


gaotongxiao · 2023-08-01T03:03:45Z

docs/en/prompt/chain_of_thought.md

+注意，OpenCompass 默认使用默认使用 argmax 的方式采样下一个 token，因此若不指定采样参数，模型每次的推理结果将会是完全一致的，多轮评测将会失效。
+```
+
+Where `SAMPLE_SIZE` is the number of reasoning paths in Self-Consistency, higher value usually outcome higher performance. The following figure from the paper demonstrates the relation between reasoning paths and performance in several reasoning tasks:


From which paper? We need to make a citation

Also need to point out that the sample generation_kwargs only works for HuggingFace models.

gaotongxiao · 2023-08-01T03:05:50Z

docs/en/prompt/chain_of_thought.md

+Where `SAMPLE_SIZE` is the number of reasoning paths in Self-Consistency, higher value usually outcome higher performance. The following figure from the paper demonstrates the relation between reasoning paths and performance in several reasoning tasks:
+![image](https://github.com/InternLM/opencompass/assets/28834990/05c7d850-7076-43ca-b165-e6251f9b3001)
+From the figure, it can be seen that in different reasoning tasks, performance tends to improve as the number of reasoning paths increases. However, for some tasks, increasing the number of reasoning paths may reach a limit, and further increasing the number of paths may not bring significant performance improvement. Therefore, it is necessary to conduct experiments and adjustments on specific tasks to find the optimal number of reasoning paths that best suit the task.


A blank line between the paragraph and image makes layout better

gaotongxiao · 2023-08-01T03:13:43Z

docs/en/prompt/chain_of_thought.md

+
+## 3. Self-Consistency
+
+The SC (Self-Consistency) method is proposed in [this paper](https://arxiv.org/abs/2203.11171), which will sample multiple reasoning paths for the question, and make majority voting to the generated answers for LLMs. This method displays remarkable proficiency among reasoning tasks with high accuracy but may consume more time and resources when inferencing, because of the majority voting strategy. In OpenCompass, you can simply set SC method in the dataset config like:


We should explicitly tell readers they have to replace GenInferencer with SCInferencer

gaotongxiao · 2023-08-01T03:14:56Z

docs/en/prompt/chain_of_thought.md

+    )
+)
+gsm8k_eval_cfg = dict(sc_size=SAMPLE_SIZE)
+```


Need a link to the new gsm8k config for interested readers to follow

gaotongxiao · 2023-08-01T03:18:37Z

opencompass/openicl/icl_inferencer/icl_sc_inferencer.py

+                    sc_results.append(results)
+                sc_prediction = list(map(list, zip(*sc_results)))
+                generated = sc_prediction
+                print(generated)


gaotongxiao · 2023-08-01T03:27:05Z

opencompass/openicl/icl_inferencer/icl_sc_inferencer.py

+            save_every: Optional[int] = None,
+            fix_id_list: Optional[List[int]] = None,
+            sc_size: Optional[int] = 1,
+            infer_type: Optional[str] = '',


infer_type is not even used here.

Its implementation seems pretty close to GenInferencer. Consider employing inheritance to cut down on code redundancy and ease future maintenance.

gaotongxiao · 2023-08-01T03:29:06Z

opencompass/tasks/openicl_eval.py

@@ -164,6 +186,14 @@ def _extract_role_pred(self, s: str, begin_str: Optional[str],

        return s[start:end]

+    def _get_vote_out(


A short docstring is required here.

gaotongxiao · 2023-08-01T03:39:35Z

Don't forget to update the ToC of documentation docs/en/index.rst & docs/zh_cn/index.rst. Otherwise the markdown docs won't be rendered in readthedocs

* add self-consistency * add CoT method Self-Consistency * fix typo error and update openicl_eval * add tydiQA-GoldP task * fix sc * rename gsm8k_sc * fix sc * add self-consistency doc * refine sc --------- Authored-by: liushz <[email protected]>

liuhongwei and others added 15 commits July 12, 2023 18:47

add self-consistency

216e6ec

Merge branch 'main' of https://github.com/InternLM/opencompass into main

0756767

add CoT method Self-Consistency

5506dc7

Merge branch 'main' of https://github.com/liushz/opencompass into main

6d118f5

fix typo error and update openicl_eval

75cc5fe

add tydiQA-GoldP task

bc4908e

fix sc

2de1878

fix pull

a4fa7cb

merge local tydiqa

11e8b81

rename gsm8k_sc

bcba21c

fix sc

bdf97c7

add self-consistency doc

f9bc955

Merge branch 'main' into main

6442557

refine sc

48fabc3

Merge branch 'main' into liushz/main

637a907

Leymore requested a review from liushz July 28, 2023 08:50

liushz approved these changes Jul 28, 2023

View reviewed changes

Leymore merged commit d862f57 into open-compass:main Jul 28, 2023

Leymore deleted the liushz/sc branch July 28, 2023 09:29

gaotongxiao reviewed Aug 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Update SC #126

[Feature] Update SC #126

Leymore commented Jul 28, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao Aug 1, 2023

gaotongxiao commented Aug 1, 2023


		## 3. Self-Consistency

		The SC (Self-Consistency) method is proposed in [this paper](https://arxiv.org/abs/2203.11171), which will sample multiple reasoning paths for the question, and make majority voting to the generated answers for LLMs. This method displays remarkable proficiency among reasoning tasks with high accuracy but may consume more time and resources when inferencing, because of the majority voting strategy. In OpenCompass, you can simply set SC method in the dataset config like:

		@@ -164,6 +186,14 @@ def _extract_role_pred(self, s: str, begin_str: Optional[str],

		return s[start:end]

		def _get_vote_out(

[Feature] Update SC #126

[Feature] Update SC #126

Conversation

Leymore commented Jul 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaotongxiao commented Aug 1, 2023