Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PartialOrd/Ord implementation possibility #35

Open
3Hren opened this issue Jun 24, 2024 · 8 comments
Open

PartialOrd/Ord implementation possibility #35

3Hren opened this issue Jun 24, 2024 · 8 comments

Comments

@3Hren
Copy link

3Hren commented Jun 24, 2024

Hi again!

There is the final step to take full advantage of your great library.

Sometimes it is necessary to deduplicate bitsets or to build a mapping between bitsets and some values. There are two standard approaches: using HashMap or BTreeMap.

The first one requires to calculate bitset's hash, which, in case of large bitsets, can be painful. However, for BTreeMap it seems that a multilevel approach to comparing two bitsets will result in a very fast comparison, also lazy bitsets seems to perfectly fit in this case.
Since the PartialEq is already implemented it seems that it will no harm to also implement PartialOrd/Ord. Or is there something that prevents this from being implemented?

@tower120
Copy link
Owner

If you want to "deduplicate bitsets", why not substract? S1 - S2 will give you S1 without S2 elements.

If you want to map indices to values, you need https://github.com/tower120/hi_sparse_array . It is somewhat like hi_sparse_bitset, but last level points to the actual data. It is still in research phase.

And what is the meaning of sets order? When S1>S2?

@3Hren
Copy link
Author

3Hren commented Jun 24, 2024

By deduplicating/mapping I meant that suppose we have the following mapping:

{
  0b10101 -> 0,
  0b10101 -> 1,
  0b10001 -> 2,
  ...
  0b10001 -> 100500,
}

I would like to transform it in a such way:

{
  0b10101 -> [0, 1],
  0b10001 -> [2, 100500],
  ...
}

Using BTreeMap this is trivial.

By order I mean that in fact each bitset can be represented as a very large unsigned integer. So why not allow to comparing them?

@tower120
Copy link
Owner

tower120 commented Jun 24, 2024

Looks like you need hi_sparse_array's multi_merge. There is a version of it now in repository, but I rewriting it to new more space efficient data structure.

Ord is probably possible. But looks like what you want to do will be inefficient (since you'll need to process each bitset one-by-one). Again, looks like you need merge. And I don't clearly see how to implement Ord... Especially for non TRUSTED_HIERARCHY.
It's not trivial algorithm (maybe I'm wrong).

@tower120
Copy link
Owner

Well, if you "just" need Ord, and don't care about speed - you probably can compare block_iter() between two sets. (Don't forget to take block starts into count as well).

@3Hren
Copy link
Author

3Hren commented Jun 24, 2024

Isn't it enough to just to compare levels one-by-one? First try level0 indices, if they are equal then level1, and finally, if again equal - block-by-block?
I'm not familiar with TRUSTED_HIERARCHY internals, will take a look.

Well, if you "just" need Ord, and don't care about speed - you probably can compare block_iter() between two sets. (Don't forget to take block starts into count as well).

Of course I care, that's why I am here ;)

Anyway, thanks for the answer!

@tower120
Copy link
Owner

For TRUSTED_HIERARCHY maybe that is enough (I can't see that far without practical implementation).

But if you work, for example, with lazy(!) result of intersection - raised bits in level bitmasks may lead to EMPTY data blocks. That case is called non-trusted-hierarchy. You can't trust hierarchy and just compare level bitmasks. From Eq implementation experience thou, there is a "special case" when you compare TRUSTED_HIERARCHY to !TRUSTED_HIERARCHY - then some optimization possible. Otherwise - (as far as I remember) it ends up with plain data block comparison one-by-one.

@tower120
Copy link
Owner

It is just with multi_merge/fold_merge you basically OR bitmasks of all maps/arrays at once, and then just apply resolve function to intersected data items in the very last level. At least on paper that should be super-linearly faster then what you trying to do.

@3Hren
Copy link
Author

3Hren commented Jun 24, 2024

Ah, I see it's more complicated than I thought, interesting nonetheless. Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants