Skip to content

Streamlit app for scraping multiple websites with Firecrawl. Define custom schemas dynamically, input URLs, and get JSON output. Simple and robust.

Notifications You must be signed in to change notification settings

petermartens98/FireCrawl-LLM-Website-Info-Finder-Python-Streamlit-App-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f2ad745 · Mar 6, 2025

History

5 Commits
Mar 6, 2025
Mar 6, 2025
Mar 6, 2025

Repository files navigation

FireCrawl LLM Website Info Finder Python Streamlit App

A Streamlit-based web application that leverages Firecrawl to scrape multiple websites using a user-defined schema. Input your API key, specify the data fields you want to extract (e.g., strings, numbers, booleans), list the URLs, and retrieve structured JSON results—all through an intuitive interface.

Features

  • Dynamic Schema Creation: Define custom extraction fields on the fly.
  • Multi-URL Scraping: Scrape multiple websites with a single schema in one operation.
  • Interactive UI: Real-time schema preview and easy input via Streamlit.
  • Error Handling: Per-URL error reporting ensures robust scraping.
  • JSON Output: Structured results for easy data processing.

Requirements

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/firecrawl-website-scraper.git
    cd firecrawl-website-scraper
  2. Install dependencies:
    pip install streamlit firecrawl-py pydantic
  3. Ensure you have a valid Firecrawl API key (sign up at Firecrawl if needed).

Usage

  1. Run the app:
    streamlit run app.py
  2. Open your browser to http://localhost:8501.
  3. Enter your Firecrawl API key.
  4. Define your schema by adding field names and types (e.g., "title" as String, "price" as Number).
  5. Click "Update Schema" to save your schema.
  6. Enter URLs (one per line) in the text area.
  7. Click "Scrape URLs" to extract data.
  8. View the JSON results, with each URL’s data or errors displayed.

Example

Input URLs:

https://example.com
https://anothersite.com

Schema:

  • title: String
  • description: String

Output:

{
    "https://example.com": {
        "title": "Example Site",
        "description": "This is an example"
    },
    "https://anothersite.com": {
        "error": "404 Not Found"
    }
}

Contributing

Feel free to fork this repository, submit pull requests, or open issues for bugs and feature requests. Contributions to enhance functionality (e.g., more field types, export options) are welcome!

Screenshot

Screenshot 2025-03-06 090354

About

Streamlit app for scraping multiple websites with Firecrawl. Define custom schemas dynamically, input URLs, and get JSON output. Simple and robust.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages