A FastAPI-based PDF Retrieval-Augmented Generation system that provides an intuitive web interface for document processing and intelligent querying.
This system combines vision models for accurate text extraction with language models for intelligent document querying, all wrapped in a user-friendly web interface.
- Backend Framework: FastAPI
- Vector Storage: ChromaDB
- LLM Integration: Ollama
- PDF Processing: PyMuPDF
- Frontend Styling: TailwindCSS (via CDN)
- Upload and process multiple PDF files simultaneously
- Extract text using vision models for superior accuracy
- Store and index document embeddings for fast retrieval
- Natural language querying of document content
- Download document transcriptions as JSON
- View answer sources and references
- Clean web interface with real-time status updates
pip install fastapi uvicorn chromadb ollama PyMuPDF Pillow python-multipart
- Ollama must be running locally on port 11434
- Pull these models using Ollama:
ollama pull llama3.2
ollama pull mxbai-embed-large
ollama pull llama3.2-vision
- Clone the repository
- Create a
static
directory in the project root:
mkdir static
- Start the server:
python app.py
- Access the web interface at:
http://localhost:8005
The interface provides:
- PDF upload section with multi-file support
- Transcription download functionality
- Question input for document querying
- Response display with source references
- Real-time status updates for all operations
- Upload: Submit one or more PDF files
- Processing:
- Vision model extracts text
- Text is embedded and stored in ChromaDB
- Querying:
- Enter natural language questions
- System retrieves relevant context
- LLM generates precise answers with sources
This application is configured for local use. Implement appropriate security measures before deploying in a production environment.
Contributions are welcome! Please feel free to submit pull requests.