Skip to content

Podcast Search Engine powered by ElasticSearch implemented using python indexing the Spotify Podcast Dataset.

Notifications You must be signed in to change notification settings

DamianValle/PodcastSearch

Repository files navigation

Spotify Podcast Search Engine

Podcast Search Engine powered by ElasticSearch implemented using python indexing the Spotify Podcast Dataset.

System architecture

Podcast data:

  • Available at: Spotify Podcast Dataset
  • Structure of the data:
    • JSON file divided into pieces (transcripts) with the following structure
      • Transcript: all the words as a text file
      • Confidence: float number between 0 and 1
      • Words: each word individually with start and end time
    • Metadata file:
      • Contains podcast name, URI, description, publisher, language, episode name and duration.
  • There is a smaller (1.2 GB) test sample with the same structure as the other files: spotify-podcasts-2020-summarization-testset

The dataset should extracted into the /podcasts-no-audio13GB folder.

Needed for GUI and Spotify Web API

pip install requirements.txt
sudo apt-get install python3-tk
sudo apt install tkinter
export SPOTIPY_CLIENT_ID='your-client-id'
export SPOTIPY_CLIENT_SECRET='your-client-secret'

Elasticsearch setup:

Kibana setup:

About

Podcast Search Engine powered by ElasticSearch implemented using python indexing the Spotify Podcast Dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages