In this project we will upload COVID-19 related academic publications to Weaviate, a vector database, and then query for publications related to specific topics. The goal is to create a tool that can help people search for scientifically accurate articles related to COVID.
- How the weaviate platform works
- How to create a weaviate cluster using Weaviate Cloud Services (WCS)
- How to create a schema
- How to upload data to the weaviate cluster using batches
- How we can create a Streamlit app that can be used to query the database for relevant COVID-19 publications.
To get a local copy up and running follow these simple steps.
-
Create a free Weaviate Cloud Services account at console.semi.technology/
-
Clone the repo
git clone https://github.com/zainhas/weaviatedemo.git
-
Create a new conda virtual environment with
python=3.8.13
then install the requirementsconda create -n mynewenv python=3.8.13 conda activate mynewenv python -m pip install -r requirements.txt
-
Download the
covid_articles.csv
data file from here and place it in thedata
folder. -
Run the
import.py
file in a terminal, which will ask you for your WCS credentials and to name your weaviate cluster to be created. This script will then create the weaviate cluster and upload data to it.python import.py
You should see the following output in your terminal:
-
Run the
covidQueryApp.py
streamlit app which will start the streamlit app.streamlit run covidQueryApp.py
You should see the following output in your terminal:
-
You can now navigate to the Local URL where your demo app is launched. You will need to provide the same weaviate cluster name to connect to it.
-
Enter topics to query COVID related publication with. You can even expand the articles to read thier abstracts!
Below is a description of all the files included in this tutorial.
import.py
- A python script that will start a weaviate cluster and upload data to this clustercovidQueryApp.py
- A streamlit app that will allow you to query and search your weaviate database for relevant articlesbatchHelper.py
- Contains functions to help with uploading data to the weaviate cluster.data/covid_articles.csv
- a .csv file containing COVID-19 related publication datarequirements.txt
- contains all the packages you need to run this tutorialschema.json
- contains the schema for our databasehelper.py
- contains helper functions to clean and format out data