-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM_CHUNK implementation #18471
LLM_CHUNK implementation #18471
Conversation
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Reviewer Guide 🔍
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Code Suggestions ✨
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add result verification in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add a "trivial" chunking method that will simply chunk the whole doc as 1 chunk?
A new chunk strategy |
result verification has been added. |
What type of PR is this?
Which issue(s) this PR fixes:
issue #18664
What this PR does / why we need it:
As part of our document LLM support, we are introducing the
LLM_CHUNK
function. This function can chunk the content in datalink with 4 chunk strategy available.Usage:
select llm_chunk("<input datalink>", "fixed_width; <width number>");
orselect llm_chunk("<input datalink>", "<sentence or paragraph or document>");
Return Value: a JSON-like string representation of an array of chunks with offset and size:
[[offset0, size0, "chunk"], [offset1, size1, "chunk"],...]
Example SQL for fixed with:
Example return:
Example SQL for sentence:
Example return: