Skip to content

Commit f3aaaeb

Browse files
authoredDec 17, 2024··
[Reorg] Remove redundant file in retrievers/redis (#1016)
Signed-off-by: letonghan <[email protected]>
1 parent ce1faf6 commit f3aaaeb

21 files changed

+78
-806
lines changed
 

‎.github/workflows/docker/compose/retrievers-compose.yaml

-4
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,6 @@ services:
1515
build:
1616
dockerfile: comps/retrievers/vdms/langchain/Dockerfile
1717
image: ${REGISTRY:-opea}/retriever-vdms:${TAG:-latest}
18-
retriever-multimodal-redis:
19-
build:
20-
dockerfile: comps/retrievers/multimodal/redis/langchain/Dockerfile
21-
image: ${REGISTRY:-opea}/retriever-multimodal-redis:${TAG:-latest}
2218
retriever-pgvector:
2319
build:
2420
dockerfile: comps/retrievers/pgvector/langchain/Dockerfile

‎comps/retrievers/README.md

-4
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,3 @@ For details, please refer to this [readme](qdrant/haystack/README.md)
2929
## Retriever Microservice with VDMS
3030

3131
For details, please refer to this [readme](vdms/langchain/README.md)
32-
33-
## Retriever Microservice with Multimodal
34-
35-
For details, please refer to this [readme](multimodal/redis/langchain/README.md)

‎comps/retrievers/milvus/langchain/ingest.py

-99
This file was deleted.

‎comps/retrievers/multimodal/redis/langchain/Dockerfile

-28
This file was deleted.

‎comps/retrievers/multimodal/redis/langchain/README.md

-123
This file was deleted.

‎comps/retrievers/multimodal/redis/langchain/__init__.py

-2
This file was deleted.

‎comps/retrievers/multimodal/redis/langchain/docker_compose_retriever.yaml

-23
This file was deleted.

‎comps/retrievers/multimodal/redis/langchain/multimodal_config.py

-83
This file was deleted.

‎comps/retrievers/multimodal/redis/langchain/requirements.txt

-11
This file was deleted.

‎comps/retrievers/multimodal/redis/langchain/retriever_redis.py

-93
This file was deleted.

‎comps/retrievers/qdrant/haystack/ingest.py

-110
This file was deleted.
-2.29 MB
Binary file not shown.

‎comps/retrievers/redis/langchain/README.md

+26-5
Original file line numberDiff line numberDiff line change
@@ -57,12 +57,24 @@ python retriever_redis.py
5757

5858
### 2.1 Setup Environment Variables
5959

60+
Two versions of retriever are supported for redis: text retriever and multimodal retriever.
61+
Users need to setup different environment variables for each type of retriever as below.
62+
6063
```bash
64+
# for text retriever
65+
export your_ip=$(hostname -I | awk '{print $1}')
6166
export RETRIEVE_MODEL_ID="BAAI/bge-base-en-v1.5"
6267
export REDIS_URL="redis://${your_ip}:6379"
6368
export INDEX_NAME=${your_index_name}
6469
export TEI_EMBEDDING_ENDPOINT="http://${your_ip}:6060"
6570
export HUGGINGFACEHUB_API_TOKEN=${your_hf_token}
71+
72+
# for multimodal retriever
73+
export your_ip=$(hostname -I | awk '{print $1}')
74+
export RETRIEVE_MODEL_ID="BAAI/bge-base-en-v1.5"
75+
export REDIS_URL="redis://${your_ip}:6379"
76+
export INDEX_NAME=${your_index_name}
77+
export BRIDGE_TOWER_EMBEDDING=true
6678
```
6779

6880
### 2.2 Build Docker Image
@@ -82,7 +94,10 @@ You can choose one as needed.
8294
### 2.3 Run Docker with CLI (Option A)
8395

8496
```bash
97+
# Start a text retriever server
8598
docker run -d --name="retriever-redis-server" -p 7000:7000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e REDIS_URL=$REDIS_URL -e INDEX_NAME=$INDEX_NAME -e TEI_EMBEDDING_ENDPOINT=$TEI_EMBEDDING_ENDPOINT -e HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN opea/retriever-redis:latest
99+
# start a multimodal retriever server
100+
docker run -d --name="retriever-multimodal-redis-server" -p 7000:7000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e REDIS_URL=$REDIS_URL -e INDEX_NAME=$INDEX_NAME -e BRIDGE_TOWER_EMBEDDING=${BRIDGE_TOWER_EMBEDDING} opea/retriever-redis:latest
86101
```
87102

88103
### 2.4 Run Docker with Docker Compose (Option B)
@@ -103,10 +118,20 @@ curl http://localhost:7000/v1/health_check \
103118

104119
### 3.2 Consume Embedding Service
105120

106-
To consume the Retriever Microservice, you can generate a mock embedding vector of length 768 with Python.
121+
To consume the Retriever Microservice, you can generate a mock embedding vector with Python.
122+
123+
Same here, users need to validate text/multimodal embedding service with different lengths of vectors. Then use the `curl` command to validate.
107124

108125
```bash
126+
# for text retriever
109127
export your_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
128+
# for multimodal retriever
129+
export your_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)")
130+
```
131+
132+
Default validation.
133+
134+
```bash
110135
curl http://${your_ip}:7000/v1/retrieval \
111136
-X POST \
112137
-d "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${your_embedding}}" \
@@ -116,31 +141,27 @@ curl http://${your_ip}:7000/v1/retrieval \
116141
You can set the parameters for the retriever.
117142

118143
```bash
119-
export your_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
120144
curl http://localhost:7000/v1/retrieval \
121145
-X POST \
122146
-d "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${your_embedding},\"search_type\":\"similarity\", \"k\":4}" \
123147
-H 'Content-Type: application/json'
124148
```
125149

126150
```bash
127-
export your_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
128151
curl http://localhost:7000/v1/retrieval \
129152
-X POST \
130153
-d "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${your_embedding},\"search_type\":\"similarity_distance_threshold\", \"k\":4, \"distance_threshold\":1.0}" \
131154
-H 'Content-Type: application/json'
132155
```
133156

134157
```bash
135-
export your_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
136158
curl http://localhost:7000/v1/retrieval \
137159
-X POST \
138160
-d "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${your_embedding},\"search_type\":\"similarity_score_threshold\", \"k\":4, \"score_threshold\":0.2}" \
139161
-H 'Content-Type: application/json'
140162
```
141163

142164
```bash
143-
export your_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
144165
curl http://localhost:7000/v1/retrieval \
145166
-X POST \
146167
-d "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${your_embedding},\"search_type\":\"mmr\", \"k\":4, \"fetch_k\":20, \"lambda_mult\":0.5}" \

‎comps/retrievers/redis/langchain/docker_compose_retriever.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ services:
2727
INDEX_NAME: ${INDEX_NAME}
2828
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
2929
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
30+
BRIDGE_TOWER_EMBEDDING: ${BRIDGE_TOWER_EMBEDDING}
3031
restart: unless-stopped
3132

3233
networks:

‎comps/retrievers/redis/langchain/ingest.py

-121
This file was deleted.

‎comps/retrievers/redis/langchain/redis_config.py

+3
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,6 @@ def format_redis_conn_from_env():
7373

7474
current_file_path = os.path.abspath(__file__)
7575
parent_dir = os.path.dirname(current_file_path)
76+
REDIS_SCHEMA = os.getenv("REDIS_SCHEMA", "redis_schema_multi.yml")
77+
schema_path = os.path.join(parent_dir, REDIS_SCHEMA)
78+
INDEX_SCHEMA = schema_path

‎comps/retrievers/redis/langchain/requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ pymupdf
1111
redis
1212
sentence_transformers
1313
shortuuid
14+
transformers
1415
uvicorn

‎comps/retrievers/redis/langchain/retriever_redis.py

+20-10
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,14 @@
88
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
99
from langchain_community.vectorstores import Redis
1010
from langchain_huggingface import HuggingFaceEndpointEmbeddings
11-
from redis_config import EMBED_MODEL, INDEX_NAME, REDIS_URL
11+
from redis_config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL
1212

1313
from comps import (
1414
CustomLogger,
1515
EmbedDoc,
16+
EmbedMultimodalDoc,
1617
SearchedDoc,
18+
SearchedMultimodalDoc,
1719
ServiceType,
1820
TextDoc,
1921
opea_microservices,
@@ -28,11 +30,13 @@
2830
RetrievalResponse,
2931
RetrievalResponseData,
3032
)
33+
from comps.embeddings.multimodal.bridgetower import BridgeTowerEmbedding
3134

3235
logger = CustomLogger("retriever_redis")
3336
logflag = os.getenv("LOGFLAG", False)
3437

3538
tei_embedding_endpoint = os.getenv("TEI_EMBEDDING_ENDPOINT")
39+
bridge_tower_embedding = os.getenv("BRIDGE_TOWER_EMBEDDING")
3640

3741

3842
@register_microservice(
@@ -44,30 +48,25 @@
4448
)
4549
@register_statistics(names=["opea_service@retriever_redis"])
4650
async def retrieve(
47-
input: Union[EmbedDoc, RetrievalRequest, ChatCompletionRequest]
48-
) -> Union[SearchedDoc, RetrievalResponse, ChatCompletionRequest]:
51+
input: Union[EmbedDoc, EmbedMultimodalDoc, RetrievalRequest, ChatCompletionRequest]
52+
) -> Union[SearchedDoc, SearchedMultimodalDoc, RetrievalResponse, ChatCompletionRequest]:
4953
if logflag:
5054
logger.info(input)
5155
start = time.time()
5256
# check if the Redis index has data
5357
if vector_db.client.keys() == []:
5458
search_res = []
5559
else:
56-
if isinstance(input, EmbedDoc):
57-
query = input.text
60+
if isinstance(input, EmbedDoc) or isinstance(input, EmbedMultimodalDoc):
5861
embedding_data_input = input.embedding
5962
else:
6063
# for RetrievalRequest, ChatCompletionRequest
61-
query = input.input
6264
if isinstance(input.embedding, EmbeddingResponse):
6365
embeddings = input.embedding.data
6466
embedding_data_input = []
6567
for emb in embeddings:
6668
# each emb is EmbeddingResponseData
67-
# print("Embedding data: ", emb.embedding)
68-
# print("Embedding data length: ",len(emb.embedding))
6969
embedding_data_input.append(emb.embedding)
70-
# print("All Embedding data length: ",len(embedding_data_input))
7170
else:
7271
embedding_data_input = input.embedding
7372

@@ -98,6 +97,12 @@ async def retrieve(
9897
for r in search_res:
9998
retrieved_docs.append(TextDoc(text=r.page_content))
10099
result = SearchedDoc(retrieved_docs=retrieved_docs, initial_query=input.text)
100+
elif isinstance(input, EmbedMultimodalDoc):
101+
metadata_list = []
102+
for r in search_res:
103+
metadata_list.append(r.metadata)
104+
retrieved_docs.append(TextDoc(text=r.page_content))
105+
result = SearchedMultimodalDoc(retrieved_docs=retrieved_docs, initial_query=input.text, metadata=metadata_list)
101106
else:
102107
for r in search_res:
103108
retrieved_docs.append(RetrievalResponseData(text=r.page_content, metadata=r.metadata))
@@ -119,9 +124,14 @@ async def retrieve(
119124
if tei_embedding_endpoint:
120125
# create embeddings using TEI endpoint service
121126
embeddings = HuggingFaceEndpointEmbeddings(model=tei_embedding_endpoint)
127+
vector_db = Redis(embedding=embeddings, index_name=INDEX_NAME, redis_url=REDIS_URL)
128+
elif bridge_tower_embedding:
129+
# create embeddings using BridgeTower service
130+
embeddings = BridgeTowerEmbedding()
131+
vector_db = Redis(embedding=embeddings, index_name=INDEX_NAME, index_schema=INDEX_SCHEMA, redis_url=REDIS_URL)
122132
else:
123133
# create embeddings using local embedding model
124134
embeddings = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL)
135+
vector_db = Redis(embedding=embeddings, index_name=INDEX_NAME, redis_url=REDIS_URL)
125136

126-
vector_db = Redis(embedding=embeddings, index_name=INDEX_NAME, redis_url=REDIS_URL)
127137
opea_microservices["opea_service@retriever_redis"].start()

‎tests/retrievers/test_retrievers_multimodal_redis_langchain.sh

-84
This file was deleted.

‎tests/retrievers/test_retrievers_redis_langchain.sh

+27-6
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,29 @@ function start_service() {
4242
sleep 3m
4343
}
4444

45+
function start_multimodal_service() {
46+
# redis
47+
docker run -d --name test-comps-retriever-redis-vector-db -p 5689:6379 -p 5011:8001 -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy redis/redis-stack:7.2.0-v9
48+
sleep 10s
49+
50+
# redis retriever
51+
export REDIS_URL="redis://${ip_address}:5689"
52+
export INDEX_NAME="rag-redis"
53+
retriever_port=5435
54+
unset http_proxy
55+
docker run -d --name="test-comps-retriever-redis-server" -p ${retriever_port}:7000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e REDIS_URL=$REDIS_URL -e INDEX_NAME=$INDEX_NAME -e BRIDGE_TOWER_EMBEDDING=true opea/retriever-redis:comps
56+
57+
sleep 2m
58+
}
59+
4560
function validate_microservice() {
61+
local test_embedding="$1"
62+
4663
retriever_port=5435
4764
export PATH="${HOME}/miniforge3/bin:$PATH"
4865
source activate
4966
URL="http://${ip_address}:$retriever_port/v1/retrieval"
5067

51-
test_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
52-
5368
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "{\"text\":\"test\",\"embedding\":${test_embedding}}" -H 'Content-Type: application/json' "$URL")
5469
if [ "$HTTP_STATUS" -eq 200 ]; then
5570
echo "[ retriever ] HTTP status is 200. Checking content..."
@@ -60,13 +75,11 @@ function validate_microservice() {
6075
else
6176
echo "[ retriever ] Content does not match the expected result: $CONTENT"
6277
docker logs test-comps-retriever-redis-server >> ${LOG_PATH}/retriever.log
63-
docker logs test-comps-retriever-redis-tei-endpoint >> ${LOG_PATH}/tei.log
6478
exit 1
6579
fi
6680
else
6781
echo "[ retriever ] HTTP status is not 200. Received status was $HTTP_STATUS"
6882
docker logs test-comps-retriever-redis-server >> ${LOG_PATH}/retriever.log
69-
docker logs test-comps-retriever-redis-tei-endpoint >> ${LOG_PATH}/tei.log
7083
exit 1
7184
fi
7285
}
@@ -81,12 +94,20 @@ function stop_docker() {
8194
function main() {
8295

8396
stop_docker
84-
8597
build_docker_images
98+
99+
# test text retriever
86100
start_service
101+
test_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
102+
validate_microservice "$test_embedding"
103+
stop_docker
87104

88-
validate_microservice
105+
# test multimodal retriever
106+
start_multimodal_service
107+
test_embedding_multi=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)")
108+
validate_microservice "$test_embedding_multi"
89109

110+
# clean env
90111
stop_docker
91112
echo y | docker system prune
92113

0 commit comments

Comments
 (0)
Please sign in to comment.