scholarship-data-extractor

Scholarship Data Scraper

This project automates the extraction, processing, and organization of scholarship data from websites using Selenium for web scraping and Google's Gemini AI for data structuring.

Features

Web Scraping: Extracts scholarship data from target websites
AI Processing: Uses Gemini AI to structure raw data
Excel Export: Saves processed data in organized Excel format
Automatic Class Extraction: Identifies relevant HTML classes dynamically
Error Handling: Robust retry mechanisms for API and web operations

Requirements

Python 3.8+
Chrome browser
Google Gemini API key

Installation

Clone the repository:

git clone https://github.com/qed42/scholarship-data-extractor
cd scholarship-scraper

Install dependencies:
```
pip install -r requirements.txt
```
Set up your Gemini API key:
- Create a .env file in the project root
- Add your API key:
```
GEMINI_API_KEY=your_api_key_here
```

Usage

Run the main script:

python scholarship_scraper.py

The script will:

Launch Chrome browser
Extract scholarship data
Process with Gemini AI
Save results to scholarship_data.xlsx

Configuration

Edit scholarship_scraper.py to customize:

WEBSITE_URL: Target scholarship website
KEYWORDS: Class name keywords for filtering
OUTPUT_EXCEL: Output file path

Requirements

The requirements.txt file includes:

selenium>=4.0
google-generativeai>=0.3.0
openpyxl>=3.0
webdriver-manager>=3.0
python-dotenv>=0.19

Troubleshooting

Common Issues

ChromeDriver Issues:

Ensure Chrome is updated
Run python -m webdriver_manager update

API Rate Limits:

Script includes exponential backoff
Consider upgrading API quota if needed

Website Changes:

Update KEYWORDS if class names change
Adjust wait times if website is slow

Contributing

Fork the repository
Create your feature branch:
```
git checkout -b feature/AmazingFeature
```
Commit your changes:
```
git commit -m 'Add some AmazingFeature'
```
Push to the branch:
```
git push origin feature/AmazingFeature
```
Open a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Scholership_finder		Scholership_finder
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
app2.py		app2.py
app3.py		app3.py
class_name_extarction.py		class_name_extarction.py
combined_code.py		combined_code.py
div_classes.txt		div_classes.txt
extracted_data.txt		extracted_data.txt
extracted_data_with_links.txt		extracted_data_with_links.txt
requirements.txt		requirements.txt
scholarship_data.xlsx		scholarship_data.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scholarship-data-extractor

Scholarship Data Scraper

Features

Requirements

Installation

Usage

Configuration

Requirements

Troubleshooting

Common Issues

Contributing

About

Releases

Packages

Languages

License

qed42/scholarship-data-extractor

Folders and files

Latest commit

History

Repository files navigation

scholarship-data-extractor

Scholarship Data Scraper

Features

Requirements

Installation

Usage

Configuration

Requirements

Troubleshooting

Common Issues

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages