FireCrawl LLM Website Info Finder Python Streamlit App

A Streamlit-based web application that leverages Firecrawl to scrape multiple websites using a user-defined schema. Input your API key, specify the data fields you want to extract (e.g., strings, numbers, booleans), list the URLs, and retrieve structured JSON results—all through an intuitive interface.

Features

Dynamic Schema Creation: Define custom extraction fields on the fly.
Multi-URL Scraping: Scrape multiple websites with a single schema in one operation.
Interactive UI: Real-time schema preview and easy input via Streamlit.
Error Handling: Per-URL error reporting ensures robust scraping.
JSON Output: Structured results for easy data processing.

Requirements

Python 3.7+
Streamlit
Firecrawl (Python SDK)
Pydantic

Installation

Clone the repository:

git clone https://github.com/yourusername/firecrawl-website-scraper.git
cd firecrawl-website-scraper

Install dependencies:

pip install streamlit firecrawl-py pydantic

Ensure you have a valid Firecrawl API key (sign up at Firecrawl if needed).

Usage

Run the app:
```
streamlit run app.py
```
Open your browser to http://localhost:8501.
Enter your Firecrawl API key.
Define your schema by adding field names and types (e.g., "title" as String, "price" as Number).
Click "Update Schema" to save your schema.
Enter URLs (one per line) in the text area.
Click "Scrape URLs" to extract data.
View the JSON results, with each URL’s data or errors displayed.

Example

Input URLs:

https://example.com
https://anothersite.com

Schema:

title: String
description: String

Output:

{
    "https://example.com": {
        "title": "Example Site",
        "description": "This is an example"
    },
    "https://anothersite.com": {
        "error": "404 Not Found"
    }
}

Contributing

Feel free to fork this repository, submit pull requests, or open issues for bugs and feature requests. Contributions to enhance functionality (e.g., more field types, export options) are welcome!

Name	Name	Last commit message	Last commit date
Latest commit petermartens98 Update README.md Mar 6, 2025 f2ad745 · Mar 6, 2025 History 5 Commits
README.md	README.md	Update README.md	Mar 6, 2025
app.py	app.py	Create app.py	Mar 6, 2025
requirements.txt	requirements.txt	Create requirements.txt	Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FireCrawl LLM Website Info Finder Python Streamlit App

Features

Requirements

Installation

Usage

Example

Contributing

Screenshot

About

Releases

Packages

Languages

petermartens98/FireCrawl-LLM-Website-Info-Finder-Python-Streamlit-App-

Folders and files

Latest commit

History

Repository files navigation

FireCrawl LLM Website Info Finder Python Streamlit App

Features

Requirements

Installation

Usage

Example

Contributing

Screenshot

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages