Skip to content

Bayesian data analysis and causal inference using Python & PyMC

License

Notifications You must be signed in to change notification settings

abdullahau/bayesian-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statistical Rethinking: A Bayesian Course with Examples in Python and Stan (Second Edition)

Here is yet another attempt to replicate (nearly) all models in Richard McElreath’s Statistical Rethinking (2nd ed.) book using Python, Stan (CmdStanPy), BridgeStan, and ArviZ. This is a work in progress.

Rendered Quarto notebooks can be viewed here:

  1. The Garden of Forking Data
  2. Sampling The Imaginary
  3. (a) Geocentric Model
  4. (b) Geocentric Model
  5. (c) Geocentric Model
  6. (a) The Many Variables & The Spurious Waffles
  7. (b) The Many Variables & The Spurious Waffles

All data and code can be downloaded from GitHub: https://github.com/abdullahau/bayesian-analysis


StanQuap

Overview

In the first part of his book, Richard McElreath utilizes his custom quap function from the rethinking package. To replicate its behavior using Stan and BridgeStan, I created a custom class called StanQuap. This class approximates the full posterior distribution by leveraging quadratic curvature at the mode.

How StanQuap Works

  • The provided Stan model is optimized using CmdStanPy's optimize() API to compute the Maximum Likelihood Estimate (MLE) or Maximum A Posteriori Estimate (MAP).
  • The class constructor passes the model, data, and unconstrained parameters from the MLE/MAP into BridgeStan's log_density_hessian, which computes (or provides access to) the unconstrained Hessian matrix.
  • The inverse of the unconstrained Hessian matrix is computed and transformed into the constrained space using an analytical method (Jacobian matrix).
  • The posterior distribution is approximated as a normal/multivariate normal distribution, where the mean is the parameter output and the covariance is the variance-covariance matrix.
  • The methods laplace_sample, extract_samples, link, and sim utilize CmdStanPy's laplace_sample API for speed and robustness, ensuring proper parameter transformations.

Features

  • Mode Finding: Computes the mode of the posterior distribution.
  • Variance-Covariance Approximation: Estimates uncertainty using the Hessian.
  • Posterior Sampling: Draws from the posterior distribution via the Laplace approximation.
  • Link Function: Transforms posterior samples using a user-defined function.
  • Posterior Prediction: Simulates posterior observations for predictive checks.
  • Jacobian Transformation: Converts unconstrained variance-covariance matrices into constrained space.

Usage

Example

import utils

bernoulli = """
data {
  int<lower=1> N;
  array[N] int<lower=0,upper=1> y;
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}
"""

data = {"N":10,"y":[0,1,0,1,0,0,0,0,0,1]}

# Define StanQuap Model
quap = utils.StanQuap(
    stan_file="bernoulli_model",
    stan_code=bernoulli,
    data=data,
)

# Extract Samples
samples = quap.extract_samples(n=1000)

# Compute Posterior Summary
summary = quap.precis()
print(summary)

Dependencies

Installation

Ensure you have CmdStan and BridgeStan installed:

pip install -r requirements.txt

Follow the CmdStanPy installation guide and BridgeStan installation guide for additional setup.

License

This project is licensed under the MIT License.