Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for jinaai/jina-embeddings-v2-base-de #270

Merged
merged 2 commits into from
Jun 14, 2024

Conversation

deichrenner
Copy link
Contributor

Issue

This PR resolves #266.

A new version of jina embeddings was added:

    {
        "model": "jinaai/jina-embeddings-v2-base-de",
        "dim": 768,
        "description": "German embedding model supporting 8192 sequence length",
        "size_in_GB": 0.16,
        "sources": {"hf": "jinaai/jina-embeddings-v2-base-de"},
        "model_file": "onnx/model_fp16.onnx",
    },

The quantized, onnx exported model is directly hosted by jinaai.

Changes

The following files were changed to add this model:

  • fastembed/text/jina_onnx_embedding.py: Model definition was added

  • tests/test_text_onnx_embeddings.py: Test was added, where the expected data was created with the supplied Colab-Notebook

    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
    input_texts = [
        "hello world", "flag embedding"
    ]
    embeddings = model.encode(input_texts, normalize_embeddings=True)
    print(embeddings[0][:5])
    
    [-0.00857827  0.04176599  0.03420503  0.0309742  -0.01496792]
    

Tests

All tests passed after the changes.

@deichrenner deichrenner marked this pull request as ready for review June 11, 2024 11:49
@joein
Copy link
Member

joein commented Jun 14, 2024

Hey @deichrenner,
thank you for the contribution!

I'll approve it as soon as the CI is green
It'll be available as of the next release (in the meantime you can use it from the main branch)

@joein joein self-requested a review June 14, 2024 11:25
@joein joein merged commit fd0b26f into qdrant:main Jun 14, 2024
15 checks passed
Anush008 pushed a commit that referenced this pull request Jun 17, 2024
* feat: add support for SOTA german embedding model with long context length jinaai/jina-embeddings-v2-base-de

* Fix jina de model weight

---------

Co-authored-by: George <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request for model jinaai/jina-embeddings-v2-base-de
3 participants