PBIE Pro · Interactive Framework

Overview

Abstract · Contributions · Hypotheses

Probabilistic forecasting under data scarcity is hard for three reasons that compound: gradient methods overfit with small n, post-hoc calibration tools starve when held-out data is thin, and no standard pipeline treats expert intuition as a learnable Bayesian quantity. PBIE Pro folds all three concerns into one interpretable architecture.

Brier Score (n=200)

0.192

± 0.011 over 30 seeds

ECE (n=200)

0.041

Below the 0.05 target

AUROC (n=200)

0.741

Best discrim. at low data

Six contributions, plainly

1. A feature-conditioned time-decay logistic base learner — recovers ordinary LR as γ → 0.

2. Expert intuition formalised as a Beta-Binomial shrinkage prior whose strength is governed by ECE-tracked calibration quality (λₜ).

3. Cross-validated log-score scenario aggregation that avoids BMA's selection bias and information double-counting.

4. Nine-baseline empirical comparison on a controlled synthetic dataset across three sample-size regimes with Holm-Bonferroni-corrected bootstrap testing.

5. Exact Brier-Score decomposition into calibration loss and refinement loss — uniquely possible because the DGP is known.

6. Full reproducibility: this very browser app reconstructs every experiment from scratch.

Hypotheses

Tested via paired bootstrap, B = 10,000

H₀ PBIE Pro achieves lower Brier Score than all uncalibrated baselines at n=200, and is statistically tied with LR + Isotonic.

H₁ PBIE Pro achieves ECE ≤ 0.05 in the low-data regime, beating all uncalibrated baselines.

H₂ Components interact synergistically — total improvement exceeds the linear sum of individual contributions.

Three formal propositions

Proved in §3.7 of the paper, executable here

P1 · Boundedness. P_final ∈ [0,1] for all admissible inputs.

P2 · Asymptotic consistency. As n_eff → ∞, P_final → σ(β*ᵀx*).

P3 · Calibration monotonicity. |P_final − ψ_adj| is monotonically non-increasing in ECEₜ.

Architectural Pipeline (Figure 1 reconstructed)

Seven phases — three inputs feed four model components into a final synthesis

INPUT

Historical data

D, t

→

Phase 1–2

Time-decay LR

ψ_base

→

Phase 3

Uncertainty adj.

ψ_adj

→

OUTPUT

P_final

∈ [0,1]

INPUT

Expert intuition

δ_int

→

Phase 4

Beta-Binomial

ψ_post

↘

Phase 6 · Synthesis

P = αₛψ_scen + (1−αₛ)ψ_post

↗

Phase 7 (opt.)

MC uncertainty

CI₉₅

INPUT

Scenarios {S₁...Sₛ}

S = 3

→

Phase 5

Log-score agg.

ψ_scen

Cross-validated log-score weights via softmax, scenarios trained on orthogonal feature subsets.

How to use this app. Start at Data Generator to build a synthetic dataset matching the paper's DGP, tune PBIE Pro hyperparameters in the Editor, then hit Run Experiment. Every chart, table, and number in the Results, Ablation, and Sensitivity tabs is recomputed from your data — nothing is hardcoded.

Synthetic Data Generator

§5.1 · Controlled DGP · Known β

A fixed coefficient vector β ∈ ℝ²⁰ with entries drawn from N(0,1) defines the latent signal. Features are sampled as x ~ N(0, I₂₀). The true conditional probability is P(Y=1|x) = σ(βᵀx + ε), ε ~ N(0, 0.1).

DGP Parameters

Reproducible · Seeded

Data Summary

Updated when you generate

⌖Press Generate Dataset to see distribution statistics.

Latent signal distribution

True P(Y=1|x) over the synthetic population

Coefficient vector β

All 20 coordinates of the known DGP coefficient vector

Three Sample-Size Regimes (Table 3)

Stratified subsamples of the population — class balance preserved

Regime	n (train+test)	Train split	Class balance	DGP control
Low-data	200	160 / 40	~45–55%	Full (known β)
Medium-data	2,000	1,600 / 400	~45–55%	Full (known β)
High-data	≈100,000	80,000 / 20,000	~45–55%	Full (known β)

In the synthetic setting, δ_int is derived from the known DGP and corrupted with Gaussian noise σ_noise ∈ {0, 0.05, 0.10}. When σ_noise = 0 the expert is a DGP oracle; σ_noise = 0.10 simulates moderate expert uncertainty.

Theory & Equations

§3 · Mathematical Foundation

PBIE Pro is built from four interpretable components fused through one convex combination. Every equation below is implemented verbatim in the JavaScript engine running this page.

Component 1 — Time-Decay Logistic Base Learner

Observation recency matters in non-stationary environments. Sample weights decay exponentially with age:

wᵢ(γ) = exp(−γ · (t_query − tᵢ)), γ ≥ 0

Parameters β̂ are then estimated by weighted maximum likelihood with L2 regularisation:

β̂ = argmax_β Σᵢ wᵢ[ yᵢ log σ(βᵀxᵢ) + (1−yᵢ) log(1 − σ(βᵀxᵢ)) ] − (λ_L2 / 2)‖β‖²

When γ → 0, weights wᵢ → 1 and this reduces to standard LR. The base output is ψ_base(x*) = σ(β̂ᵀx*).

Component 2 — Marginal Uncertainty Adjustment

U = σ_U · √( ψ_base(1 − ψ_base) / n_eff ), n_eff = Σᵢ wᵢ
ψ_adj = clamp(ψ_base − U, 0, 1)

Controlled conservatism: predictions are pulled toward 0.5 in proportion to estimation uncertainty, preventing overconfidence under sparse data.

Component 3 — Formalised Intuition as Beta-Binomial Shrinkage

μ_prior = clamp(0.5 + δ_int, 0.3, 0.7), δ_int ∈ [−δ_max, +δ_max]
λₜ = max(0, 1 − ECEₜ) // calibration-aware weight
α_prior = μ_prior · κ · λₜ, β_prior = (1 − μ_prior) · κ · λₜ
ψ_posterior = (s + α_prior) / (n_eff + α_prior + β_prior), s = Σ wᵢ · yᵢ

As ECEₜ → 1, λₜ → 0 and the prior recedes — Proposition 3. As n_eff → ∞, the posterior collapses to the data likelihood — Proposition 2.

Component 4 — Log-Score Scenario Aggregation

LS(Mᵢ) = Σ_fold log P(y | x, Mᵢ)
wᵢ* = exp(LS(Mᵢ)) / Σⱼ exp(LS(Mⱼ)) // softmax over CV log-scores
ψ_scenario(x*) = Σᵢ wᵢ* · σ(β̂ᵢᵀx*)

Three scenarios — best (+20% β), base (β̂), worst (−20% β) — trained on orthogonal feature subsets.

Final Synthesis

P_final = α_s · ψ_scenario(x*) + (1 − α_s) · ψ_posterior

A convex combination of test-specific discrimination (scenario aggregation) and calibrated global anchoring (Beta-Binomial posterior). Proposition 1 follows trivially.

Phase 7 — Monte Carlo Uncertainty (optional)

P̄_MC, σ²_MC, CI₉₅ over N = 500 trials perturbing (γ, σ_U, δ_int) ~ Uniform(±5%)

Enables interval forecasting without changing the point estimate.

PBIE Pro Hyperparameter Editor

Table 2 · Live

Every hyperparameter in Table 2 is editable below. Defaults match §5 of the paper. The Phase column maps each knob to where it acts in the algorithmic pipeline.

Core Hyperparameters

Cross-validated in the paper · Adjustable here

Component Toggles (Ablation)

Disable components individually to reproduce Table 5

Time decay
Set γ = 0 to disable

Uncertainty adjustment
Set σ_U = 0 to disable

Scenario aggregation
Set α_s = 0 to disable

Intuition prior
Set δ_int = 0, λ disabled

Monte Carlo CI
Phase 7 — adds ~500 forward passes

Baselines to Include

All nine baselines from §5.3 — toggle off if you want a faster run

Run Full Experiment

§6 · All Models · All Metrics

Trains every enabled model on the current dataset, evaluates on the held-out test set, and computes the full metric suite plus exact Brier-Score decomposition.

Run Settings

BBootstrap seedsMore seeds → tighter ±. Browser-bounded to 10.

αSignificance thresholdHolm-Bonferroni corrected

Configuration Snapshot

What this run will execute

Results

Table 4 · Reliability · BS Decomposition

After running an experiment, results appear here. Numbers will differ from the paper's exact figures (paper uses scikit-learn, 30 seeds, and ±20% scenario perturbations of the recovered β; this engine uses pure-JS implementations) but should reproduce the same qualitative ordering and the calibration-driven advantage.

Comprehensive Comparison (Table 4 analog)

All metrics on the test partition · ★ = PBIE Pro · † = best recalibrated baseline

Model	BS ↓	ECE ↓	NLL ↓	AUROC ↑	Tail-ECE ↓	BS_cal ↓	BS_ref ↓
⌖Run an experiment to populate this table.

Reliability Diagram (Figure 3a)

Predicted probability vs empirical frequency · perfect calibration on the diagonal

ECE Comparison (Figure 3b)

Expected Calibration Error · target ≤ 0.05

Brier Score Comparison

Lower is better — jointly penalises miscalibration and inaccuracy

Brier Score Decomposition (Figure 4b)

BS = BS_calibration + BS_refinement · made possible by the known DGP

AUROC Comparison

Rank-ordering accuracy · calibration-independent

Ablation Study

Table 5 · §6.3

Remove each component in turn and re-evaluate. The paper's central ablation finding is sub-additive synergy: total improvement (0.027) exceeds the linear sum of individual contributions (0.034).

Component Contributions

Configuration	BS ↓	ECE ↓	NLL ↓	ΔBS
⌖Run the ablation to populate this table.

Ablation Bar Chart (Figure 5)

Each removal incrementally degrades performance · base-only reaches uncalibrated NN level

Intuition Sensitivity Analysis

Table 6 · §6.4

As intuition noise σ_noise grows, the λₜ mechanism automatically attenuates the prior. The framework should remain calibrated (ECE ≤ 0.05) across all tested noise levels.

Noise Profile

Configuration	BS ↓	ECE ↓	Mean λₜ	ΔBS
⌖Run sensitivity to populate.

Brier Score vs Intuition Noise (Figure 6b)

λₜ automatically attenuates the prior — calibration-aware shrinkage in action

Interactive Single Prediction

Inspect the full component breakdown

Adjust a query vector x* and an expert intuition signal δ_int, then watch every intermediate quantity — ψ_base, ψ_adj, ψ_post, ψ_scen, P_final — update in real time.

Query Vector x*

20 features · adjust or randomise

Expert & Run

Component Breakdown

How each phase contributes to the final probability

⌖Run a single prediction to see component contributions.