-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hierarchical structure to the graph index #402
base: main
Are you sure you want to change the base?
Conversation
…where the node is present
vels to candidates pool in layer 0
… no dupes are submitted
# Conflicts: # jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/CommonHeader.java # jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndex.java # jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndexWriter.java # jvector-base/src/main/java/io/github/jbellis/jvector/quantization/BQVectors.java # jvector-examples/src/main/java/io/github/jbellis/jvector/example/Bench.java # jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java # jvector-examples/src/main/java/io/github/jbellis/jvector/example/IPCService.java # jvector-tests/src/test/java/io/github/jbellis/jvector/graph/Test2DThreshold.java # jvector-tests/src/test/java/io/github/jbellis/jvector/graph/disk/TestOnDiskGraphIndex.java
…om the same level. If the entrypoint has been deleted, the new one can be in level 0 if that's the new top level
…e live nodes in the bottom layer.
…t is smaller than layers.size() - 1
A point worth noting is that we recently updated the file format for graphs to V4 in main (#400). Since this PR should be merged fairly quickly after that one and V4 has not been included in a release yet, I opted for not bumping up the file version to V5 but to just update V4 instead. This was to avoid an unnecessary proliferation of versions that will complicate backwards compatibility in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this looks quite good to me. This PR is easy to understand if you're already well-versed in JVector internals, and I agree with the implementation choices. The vast majority of my comments are polish/style-related. Nice work! Let me know if any of my feedback is unclear.
EDIT: Can you also add some coverage of this new index functionality/API changes to README/UPGRADING?
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndex.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndex.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndex.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndexBuilder.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndexBuilder.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndex.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndex.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java
Show resolved
Hide resolved
@@ -105,6 +111,10 @@ public static SearchResult search(VectorFloat<?> queryVector, int topK, RandomAc | |||
} | |||
} | |||
|
|||
public void setView(GraphIndex.View view) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a Javadoc here with an example of when this would be/is used.
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/CommonHeader.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Joel Knighton <[email protected]>
# Conflicts: # jvector-tests/src/test/java/io/github/jbellis/jvector/graph/Test2DThreshold.java
This PR introduces a hybrid HNSW-DiskANN graph. From HNSW, we take the idea of using multiple layers. This adds robustness to particularly hard datasets. Each layer is a Vamana graph. The upper layers reside in memory while the base layer resides on disk (in DiskANN style). It also enable using a single layer. In that case, it is a plain Vamana graph.