This project analyzes direct phone marketing campaigns by a Portuguese bank (2008-2010) to predict term deposit subscriptions. It combines exploratory data analysis, predictive modeling, and strategic recommendations to optimize customer targeting. The workflow includes:
- Data Analysis: Exploratory insights into customer demographics, economic indicators, and campaign performance.
- Predictive Modeling: Development of machine learning models (LightGBM, XGBoost, RandomForest) to identify high-potential customers.
- Actionable Recommendations: Data-driven strategies to enhance campaign effectiveness.
Domain: Banking/Finance
Source: UCI Machine Learning Repository
Enriched Data: Includes macroeconomic indicators from Banco de Portugal.
Size: 41,188 records × 21 features.
- Demographics: Age, job, marital status, education.
- Financial: Credit default, housing/personal loans.
- Campaign Metrics: Contact type, duration, month/day of contact.
- Economic Indicators: Euribor rate, employment rate, consumer confidence index.
- Target:
y
(binary: "yes"/"no").
Dataset Link: Download Here
├── data/ # Raw and processed datasets
├── docs/ # Project briefs and dataset documentation
├── notebook/ # Jupyter notebook for analysis and modeling
├── report/ # Final report and strategic suggestions
├── results/ # Visualizations, model outputs, and analysis reports
├── scripts/ # Utility scripts and helper functions
└── requirements.txt # Python dependencies
- Clone the Repository:
git clone https://github.com/dhaneshbb/ProtugeseBank.git cd ProtugeseBank
- Install Dependencies:
pip install -r requirements.txt
- Custom Library: Install
insightfulpy
for streamlined analytics:pip install insightfulpy
- Open the Jupyter notebook:
notebook/PRCP-1000-PortugeseBank.ipynb
- Execute cells sequentially to:
- Perform data cleaning and EDA.
- Train and evaluate predictive models.
- Generate visualizations and reports.
- Final Model:
results/model/final_lgbm_model.pkl
(LightGBM with 88.8% accuracy). - Reports:
report/Final Report.md
: Comprehensive analysis and recommendations.results/365csv pre-anlysis/
: Preprocessing reports, visualizations, and statistical summaries.
Model | Accuracy | Precision | Recall | F1-Score | ROC AUC |
---|---|---|---|---|---|
LightGBM | 88.8% | 47.1% | 57.3% | 51.5% | 81.2% |
XGBoost | 87.7% | 46.4% | 58.7% | 51.9% | 80.9% |
RandomForest | 86.8% | 43.8% | 60.2% | 50.7% | 79.7% |
- Data Imbalance: Addressed using class weighting and LightGBM’s
is_unbalance
parameter. - Multicollinearity: reduced complexity Mitigated via VIF analysis and tree-based models.
- Overfitting: Controlled through cross-validation and regularization.
Moro, S., Cortez, P., & Rita, P. (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, In press. DOI: 10.1016/j.dss.2014.03.001.
License: This project is for educational/research purposes. Cite the dataset authors when referencing this work.