-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Redis-compatible cursors for SCAN
commands
#1489
Conversation
dont check GenerateCursorFromKeyName key_name empty
Hi @jihuayu, Thanks for your great efforts. Can you explain a bit about the cursor number design and what if we use the wrong cursor for the hash/set/zset scan(should we store and compare the prefix as well when getting it by the cursor)? And if it's good to use a random or timestamp to avoid conflicts when restarting the server. |
Thanks for your contribution! Several points after a short glance:
|
@PragmaTwice It cannot be per connection since we need to allow using the cursor in different connections. That's to say, the connection A got the cursor should can be visible to connection B from the client side. |
I will redesign the cursor. |
Is this property used by some client libraries? Actually I cannot find how and when these cursors become invalid (e.g. Will different connections lead to invalid cursor? Will adding and deleting elements lead to invalid cursor?) in the official document, which seems they just do not give any guarentee about this. It is sad. But I find such a statement.
It may make the implementation of redis cursor harder, if we need to follow it. |
Most libraries should use the same connection to scan, but the cursor needs to be still valid if the connection was reconnected. And as the scan command is a long-time operation, it'd be better to allow reconnecting behavior. Yes, agreed that it's impossible to follow the Redis exactly since the underlying storage is different. What we can do is to keep compatible as possible. |
@PragmaTwice @git-hulk If the user uses We can make a promise to how long or how many recent cursors are guaranteed to be valid. For example, cursors created within the last hour or the last 100 cursors. In redis doc:
I think this is necessary, while the others are optional. Our current implementation can ensure these two points.
I agree as well. We just need to follow redis in common behavior. |
Thanks, @jihuayu. Keeping the recent N cursors solution is good for me since those cursors won't occupy too much memory. To see if @PragmaTwice has any comments. |
@git-hulk We may need to know how the N value is determined in actual project groups. If N=1024 and the average key name length is 40, the memory usage would be 1024*(40+40)=80KB. |
It's hard to determine which value is suitable, but I think we can give a relatively large one like 10240 or even more. It's fine since the usage of memory is a fixed number. |
Do we need to provide a configuration option for this value? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the review. Does anyone have any other suggestions? |
flix cli_test not working, fix cursor_ error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your contribution and patience.
@torwig @PragmaTwice Can you take a look again while you're free. |
Thanks all, merging... |
very neat design ! |
Hello everyone, I have completed this PR and below is my explanation of the changes made.
We have added the following steps to the processing of the
SCAN
,ZSCAN
,SSCAN
,HSCAN
commands:numeric cursor
sent by the client back to thekeyname cursor
.keyname cursor
back to anumeric cursor
and return it to the client, And store the conversion dictionary in theServer#cursor_dict_
.Since those steps is non-intrusive to the internal implementation of the
SCAN
command, we can ensure that the behavior of commands such asZSCAN
,SSCAN
, andHSCAN
is consistent with that of theSCAN
command.In the following, I will only use the
SCAN
command as an example to describe the new design.Cursor design
We call the new cursor the
numeric cursor
and the old one thekeyname cursor
.The numeric cursor is composed of 3 parts:
The
counter
is a 16-bit unsigned integer that is incremented by 1 every time. When thecounter
overflows, it returns to 0 because it is an unsigned number. Since ourcursor_dict_size
is a power of 2, thecounter
is continuous modcursor_dict_size
.timestap
is a 16-bit timestamp in seconds, which can store up to 9 hours.hash
is a 32-bit hash value of thekeyname
.Cursor dictionary(
Server#cursor_dict_
)The cursor dictionary is an array with a length of 16384(1024*16), which is determined at compile time, and occupies about 640KB of memory. Including the length of the referenced keyname strings, its size is about 1-2M.
During convert the
keyname cursor
back to anumeric cursor
, a new cursor is generated based on the above rules, and the index for storing the dictionary is determined based on the counter value (index = counter % DICT_SIZE).During convert the
numeric cursor
back to akeyname cursor
, we get the counter from the cursor, and calculate the index of the cursor in thecursor_dict_
based on the counter. We only need to compare the cursor value of the item at that index with the input cursor value to determine if they are the same.Other information about the cursor
This design guarantees the validity of the latest 16384(1024*16) cursors, while cursors that are older or not generated in our system are considered invalid cursors. For invalid cursors, we treat them as a 0 cursor, which means we will start iterating over the collection from the beginning.
Our cursor is globally visible, and we store index information in the cursor. As long as the cursor remains valid, using the same cursor in different connections will produce the same results.
We prevent other users from guessing the data traversed by adjacent cursors by adding the hash value of the
keyname
to the cursor. If a user tries to obtain adjacent cursor information by traversing the hash, the cursor will become an invalid cursor before the traversal is complete because the size of the 32-bit space is much larger than the length of thecursor_dict_
.We add a timestamp to the cursor to ensure that the same cursor does not appear within a short period of time before and after a restart.
Other behaviors are consistent with the original SCAN implementation.
Configuration file
Added
redis-cursor-compatible
configuration item.If enabled, the cursor will be an unsigned 64-bit integer.
If disabled, the cursor will be a string.
Test file
Added tests for
redis-cli --bigkey
andredis-cli --memkeys
commands. We only need to ensure that these commands run correctly, because their correctness is guaranteed byredis-cli
on the premise that we ensure the correctness of thescan
command.Modified the scan command test to test for the cases where
redis-cursor-compatible
is set toyes
orno
.Other changes:
Fixed a bug where the cursor did not return 0 when
SCAN
commands return less than number elements.fixed #1402
fixed #877