Interactive Research Companion · Browser Implementation

PBIE Pro: A Hybrid Probabilistic Decision Framework Integrating Bayesian Inference, Scenario Simulation, and Calibration-Aware Intuition for Robust Forecasting Under Data Scarcity

A fully reproducible JavaScript implementation — synthetic data generator, all seven algorithmic phases, nine baselines, exact Brier-Score decomposition, ablation, and sensitivity analysis — running entirely in your browser.

Overview

Probabilistic forecasting under data scarcity is hard for three reasons that compound: gradient methods overfit with small n, post-hoc calibration tools starve when held-out data is thin, and no standard pipeline treats expert intuition as a learnable Bayesian quantity. PBIE Pro folds all three concerns into one interpretable architecture.

Brier Score (n=200)
0.192
± 0.011 over 30 seeds
ECE (n=200)
0.041
Below the 0.05 target
AUROC (n=200)
0.741
Best discrim. at low data

Six contributions, plainly

1. A feature-conditioned time-decay logistic base learner — recovers ordinary LR as γ → 0.

2. Expert intuition formalised as a Beta-Binomial shrinkage prior whose strength is governed by ECE-tracked calibration quality (λₜ).

3. Cross-validated log-score scenario aggregation that avoids BMA's selection bias and information double-counting.

4. Nine-baseline empirical comparison on a controlled synthetic dataset across three sample-size regimes with Holm-Bonferroni-corrected bootstrap testing.

5. Exact Brier-Score decomposition into calibration loss and refinement loss — uniquely possible because the DGP is known.

6. Full reproducibility: this very browser app reconstructs every experiment from scratch.

Hypotheses

Tested via paired bootstrap, B = 10,000

H0 PBIE Pro achieves lower Brier Score than all uncalibrated baselines at n=200, and is statistically tied with LR + Isotonic.

H1 PBIE Pro achieves ECE ≤ 0.05 in the low-data regime, beating all uncalibrated baselines.

H2 Components interact synergistically — total improvement exceeds the linear sum of individual contributions.

Three formal propositions

Proved in §3.7 of the paper, executable here

P1 · Boundedness. P_final ∈ [0,1] for all admissible inputs.

P2 · Asymptotic consistency. As n_eff → ∞, P_final → σ(β*ᵀx*).

P3 · Calibration monotonicity. |P_final − ψ_adj| is monotonically non-increasing in ECEₜ.

Architectural Pipeline (Figure 1 reconstructed)

Seven phases — three inputs feed four model components into a final synthesis

INPUT
Historical data
D, t
Phase 1–2
Time-decay LR
ψ_base
Phase 3
Uncertainty adj.
ψ_adj
OUTPUT
P_final
∈ [0,1]
INPUT
Expert intuition
δ_int
Phase 4
Beta-Binomial
ψ_post
Phase 6 · Synthesis
P = αₛψ_scen + (1−αₛ)ψ_post
Phase 7 (opt.)
MC uncertainty
CI₉₅
INPUT
Scenarios {S₁...Sₛ}
S = 3
Phase 5
Log-score agg.
ψ_scen
Cross-validated log-score weights via softmax, scenarios trained on orthogonal feature subsets.
How to use this app. Start at Data Generator to build a synthetic dataset matching the paper's DGP, tune PBIE Pro hyperparameters in the Editor, then hit Run Experiment. Every chart, table, and number in the Results, Ablation, and Sensitivity tabs is recomputed from your data — nothing is hardcoded.

Synthetic Data Generator

A fixed coefficient vector β ∈ ℝ²⁰ with entries drawn from N(0,1) defines the latent signal. Features are sampled as x ~ N(0, I₂₀). The true conditional probability is P(Y=1|x) = σ(βᵀx + ε), ε ~ N(0, 0.1).

DGP Parameters

Reproducible · Seeded

Data Summary

Updated when you generate

Press Generate Dataset to see distribution statistics.
Latent signal distribution
True P(Y=1|x) over the synthetic population
Coefficient vector β
All 20 coordinates of the known DGP coefficient vector

Three Sample-Size Regimes (Table 3)

Stratified subsamples of the population — class balance preserved

Regimen (train+test)Train splitClass balanceDGP control
Low-data200160 / 40~45–55%Full (known β)
Medium-data2,0001,600 / 400~45–55%Full (known β)
High-data≈100,00080,000 / 20,000~45–55%Full (known β)
In the synthetic setting, δ_int is derived from the known DGP and corrupted with Gaussian noise σ_noise ∈ {0, 0.05, 0.10}. When σ_noise = 0 the expert is a DGP oracle; σ_noise = 0.10 simulates moderate expert uncertainty.

Theory & Equations

PBIE Pro is built from four interpretable components fused through one convex combination. Every equation below is implemented verbatim in the JavaScript engine running this page.

Component 1 — Time-Decay Logistic Base Learner

Observation recency matters in non-stationary environments. Sample weights decay exponentially with age:

wᵢ(γ) = exp(−γ · (t_query − tᵢ)),   γ ≥ 0

Parameters β̂ are then estimated by weighted maximum likelihood with L2 regularisation:

β̂ = argmaxβ Σᵢ wᵢ[ yᵢ log σ(βᵀxᵢ) + (1−yᵢ) log(1 − σ(βᵀxᵢ)) ] − (λ_L2 / 2)‖β‖²

When γ → 0, weights wᵢ → 1 and this reduces to standard LR. The base output is ψ_base(x*) = σ(β̂ᵀx*).

Component 2 — Marginal Uncertainty Adjustment

U = σ_U · √( ψ_base(1 − ψ_base) / n_eff ),   n_eff = Σᵢ wᵢ
ψ_adj = clamp(ψ_base − U, 0, 1)

Controlled conservatism: predictions are pulled toward 0.5 in proportion to estimation uncertainty, preventing overconfidence under sparse data.

Component 3 — Formalised Intuition as Beta-Binomial Shrinkage

μ_prior = clamp(0.5 + δ_int, 0.3, 0.7),   δ_int ∈ [−δ_max, +δ_max]
λₜ = max(0, 1 − ECEₜ)  // calibration-aware weight
α_prior = μ_prior · κ · λₜ,   β_prior = (1 − μ_prior) · κ · λₜ
ψ_posterior = (s + α_prior) / (n_eff + α_prior + β_prior),   s = Σ wᵢ · yᵢ

As ECEₜ → 1, λₜ → 0 and the prior recedes — Proposition 3. As n_eff → ∞, the posterior collapses to the data likelihood — Proposition 2.

Component 4 — Log-Score Scenario Aggregation

LS(Mᵢ) = Σ_fold log P(y | x, Mᵢ)
wᵢ* = exp(LS(Mᵢ)) / Σⱼ exp(LS(Mⱼ)) // softmax over CV log-scores
ψ_scenario(x*) = Σᵢ wᵢ* · σ(β̂ᵢᵀx*)

Three scenarios — best (+20% β), base (β̂), worst (−20% β) — trained on orthogonal feature subsets.

Final Synthesis

P_final = α_s · ψ_scenario(x*) + (1 − α_s) · ψ_posterior

A convex combination of test-specific discrimination (scenario aggregation) and calibrated global anchoring (Beta-Binomial posterior). Proposition 1 follows trivially.

Phase 7 — Monte Carlo Uncertainty (optional)

P̄_MC, σ²_MC, CI₉₅ over N = 500 trials perturbing (γ, σ_U, δ_int) ~ Uniform(±5%)

Enables interval forecasting without changing the point estimate.

PBIE Pro Hyperparameter Editor

Every hyperparameter in Table 2 is editable below. Defaults match §5 of the paper. The Phase column maps each knob to where it acts in the algorithmic pipeline.

Core Hyperparameters

Cross-validated in the paper · Adjustable here

Component Toggles (Ablation)

Disable components individually to reproduce Table 5

Time decay
Set γ = 0 to disable
Uncertainty adjustment
Set σ_U = 0 to disable
Scenario aggregation
Set α_s = 0 to disable
Intuition prior
Set δ_int = 0, λ disabled
Monte Carlo CI
Phase 7 — adds ~500 forward passes

Baselines to Include

All nine baselines from §5.3 — toggle off if you want a faster run

Run Full Experiment

Trains every enabled model on the current dataset, evaluates on the held-out test set, and computes the full metric suite plus exact Brier-Score decomposition.

Run Settings

BBootstrap seedsMore seeds → tighter ±. Browser-bounded to 10.
αSignificance thresholdHolm-Bonferroni corrected

Configuration Snapshot

What this run will execute


    

Results

After running an experiment, results appear here. Numbers will differ from the paper's exact figures (paper uses scikit-learn, 30 seeds, and ±20% scenario perturbations of the recovered β; this engine uses pure-JS implementations) but should reproduce the same qualitative ordering and the calibration-driven advantage.

Comprehensive Comparison (Table 4 analog)

All metrics on the test partition · ★ = PBIE Pro · † = best recalibrated baseline

Model BS ↓ ECE ↓ NLL ↓ AUROC ↑ Tail-ECE ↓ BS_cal ↓ BS_ref ↓
Run an experiment to populate this table.
Reliability Diagram (Figure 3a)
Predicted probability vs empirical frequency · perfect calibration on the diagonal
ECE Comparison (Figure 3b)
Expected Calibration Error · target ≤ 0.05
Brier Score Comparison
Lower is better — jointly penalises miscalibration and inaccuracy
Brier Score Decomposition (Figure 4b)
BS = BS_calibration + BS_refinement · made possible by the known DGP
AUROC Comparison
Rank-ordering accuracy · calibration-independent

Ablation Study

Remove each component in turn and re-evaluate. The paper's central ablation finding is sub-additive synergy: total improvement (0.027) exceeds the linear sum of individual contributions (0.034).

Component Contributions

ConfigurationBS ↓ECE ↓NLL ↓ΔBS
Run the ablation to populate this table.
Ablation Bar Chart (Figure 5)
Each removal incrementally degrades performance · base-only reaches uncalibrated NN level

Intuition Sensitivity Analysis

As intuition noise σ_noise grows, the λₜ mechanism automatically attenuates the prior. The framework should remain calibrated (ECE ≤ 0.05) across all tested noise levels.

Noise Profile

ConfigurationBS ↓ECE ↓Mean λₜΔBS
Run sensitivity to populate.
Brier Score vs Intuition Noise (Figure 6b)
λₜ automatically attenuates the prior — calibration-aware shrinkage in action

Interactive Single Prediction

Adjust a query vector x* and an expert intuition signal δ_int, then watch every intermediate quantity — ψ_base, ψ_adj, ψ_post, ψ_scen, P_final — update in real time.

Query Vector x*

20 features · adjust or randomise

Expert & Run

Component Breakdown

How each phase contributes to the final probability

Run a single prediction to see component contributions.