Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use better ANN library #18

Open
flying-sheep opened this issue May 27, 2019 · 5 comments
Open

Use better ANN library #18

flying-sheep opened this issue May 27, 2019 · 5 comments

Comments

@flying-sheep
Copy link
Collaborator

flying-sheep commented May 27, 2019

Instead of using my built in cover-tree approximate nearest neighbor lib, new options have popped up:

NMSLIB HNSWLIB Annoy FALCONN
sparse matrix support yes no no* yes
R bindings through Python good good* no
distances numerous Euclidean, Squared L2, Inner product, Cosine** Angular, Euclidean, Manhattan, Hamming, Inner product + custom Cosine

*Annoy is super flexible, but neither the Python nor R bindings support sparse matrices. I think sparse matrix support might be custom added though.
** I think you can extend HNSWLIB with more distances, but it’s not as easy as doing the same with Annoy

@AmberLJC
Copy link

Does FALCONN support sparse data in sparse matrix format ? Or it just do the optimization on sparse data in dense format?

@flying-sheep
Copy link
Collaborator Author

flying-sheep commented Jun 26, 2019

According to How to use FALCONN, it can be used with dense or sparse vectors. The sparse ones have to be std::vector< std::pair< int32_t/int64_t, float/double > > however, so you can’t just give it column/row-compressed sparse matrix data.

Thinking about it, that makes it little better than the others: The user has to copy their whole data to match the format FALCONN understands.

@flying-sheep
Copy link
Collaborator Author

Huh, I missed that Aaron lun created BiocNeighbors! I have to investigate this :D

@Yunuuuu
Copy link

Yunuuuu commented Dec 26, 2023

Hope to integrate BiocNeighbors

@gdagstn
Copy link

gdagstn commented Jan 25, 2024

Hi @flying-sheep,
findKmknn() and findVptree() from BiocNeighbors provide exact KNN with euclidean and cosine metrics, and work out of the box with sparse matrices. I have successfully (as far as vignette and test builds go) implemented them in my fork, can do a PR if it works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants