Overview
Probabilistic forecasting under data scarcity is hard for three reasons that compound: gradient methods overfit with small n, post-hoc calibration tools starve when held-out data is thin, and no standard pipeline treats expert intuition as a learnable Bayesian quantity. PBIE Pro folds all three concerns into one interpretable architecture.
Six contributions, plainly
1. A feature-conditioned time-decay logistic base learner — recovers ordinary LR as γ → 0.
2. Expert intuition formalised as a Beta-Binomial shrinkage prior whose strength is governed by ECE-tracked calibration quality (λₜ).
3. Cross-validated log-score scenario aggregation that avoids BMA's selection bias and information double-counting.
4. Nine-baseline empirical comparison on a controlled synthetic dataset across three sample-size regimes with Holm-Bonferroni-corrected bootstrap testing.
5. Exact Brier-Score decomposition into calibration loss and refinement loss — uniquely possible because the DGP is known.
6. Full reproducibility: this very browser app reconstructs every experiment from scratch.
Hypotheses
Tested via paired bootstrap, B = 10,000
H0 PBIE Pro achieves lower Brier Score than all uncalibrated baselines at n=200, and is statistically tied with LR + Isotonic.
H1 PBIE Pro achieves ECE ≤ 0.05 in the low-data regime, beating all uncalibrated baselines.
H2 Components interact synergistically — total improvement exceeds the linear sum of individual contributions.
Three formal propositions
Proved in §3.7 of the paper, executable here
P1 · Boundedness. P_final ∈ [0,1] for all admissible inputs.
P2 · Asymptotic consistency. As n_eff → ∞, P_final → σ(β*ᵀx*).
P3 · Calibration monotonicity. |P_final − ψ_adj| is monotonically non-increasing in ECEₜ.
Architectural Pipeline (Figure 1 reconstructed)
Seven phases — three inputs feed four model components into a final synthesis
Synthetic Data Generator
A fixed coefficient vector β ∈ ℝ²⁰ with entries drawn from N(0,1) defines the latent signal. Features are sampled as x ~ N(0, I₂₀). The true conditional probability is P(Y=1|x) = σ(βᵀx + ε), ε ~ N(0, 0.1).
DGP Parameters
Reproducible · Seeded
Data Summary
Updated when you generate
Three Sample-Size Regimes (Table 3)
Stratified subsamples of the population — class balance preserved
| Regime | n (train+test) | Train split | Class balance | DGP control |
|---|---|---|---|---|
| Low-data | 200 | 160 / 40 | ~45–55% | Full (known β) |
| Medium-data | 2,000 | 1,600 / 400 | ~45–55% | Full (known β) |
| High-data | ≈100,000 | 80,000 / 20,000 | ~45–55% | Full (known β) |
Theory & Equations
PBIE Pro is built from four interpretable components fused through one convex combination. Every equation below is implemented verbatim in the JavaScript engine running this page.
Component 1 — Time-Decay Logistic Base Learner
Observation recency matters in non-stationary environments. Sample weights decay exponentially with age:
Parameters β̂ are then estimated by weighted maximum likelihood with L2 regularisation:
When γ → 0, weights wᵢ → 1 and this reduces to standard LR. The base output is ψ_base(x*) = σ(β̂ᵀx*).
Component 2 — Marginal Uncertainty Adjustment
ψ_adj = clamp(ψ_base − U, 0, 1)
Controlled conservatism: predictions are pulled toward 0.5 in proportion to estimation uncertainty, preventing overconfidence under sparse data.
Component 3 — Formalised Intuition as Beta-Binomial Shrinkage
λₜ = max(0, 1 − ECEₜ) // calibration-aware weight
α_prior = μ_prior · κ · λₜ, β_prior = (1 − μ_prior) · κ · λₜ
ψ_posterior = (s + α_prior) / (n_eff + α_prior + β_prior), s = Σ wᵢ · yᵢ
As ECEₜ → 1, λₜ → 0 and the prior recedes — Proposition 3. As n_eff → ∞, the posterior collapses to the data likelihood — Proposition 2.
Component 4 — Log-Score Scenario Aggregation
wᵢ* = exp(LS(Mᵢ)) / Σⱼ exp(LS(Mⱼ)) // softmax over CV log-scores
ψ_scenario(x*) = Σᵢ wᵢ* · σ(β̂ᵢᵀx*)
Three scenarios — best (+20% β), base (β̂), worst (−20% β) — trained on orthogonal feature subsets.
Final Synthesis
A convex combination of test-specific discrimination (scenario aggregation) and calibrated global anchoring (Beta-Binomial posterior). Proposition 1 follows trivially.
Phase 7 — Monte Carlo Uncertainty (optional)
Enables interval forecasting without changing the point estimate.
PBIE Pro Hyperparameter Editor
Every hyperparameter in Table 2 is editable below. Defaults match §5 of the paper. The Phase column maps each knob to where it acts in the algorithmic pipeline.
Core Hyperparameters
Cross-validated in the paper · Adjustable here
Component Toggles (Ablation)
Disable components individually to reproduce Table 5
Set γ = 0 to disable
Set σ_U = 0 to disable
Set α_s = 0 to disable
Set δ_int = 0, λ disabled
Phase 7 — adds ~500 forward passes
Baselines to Include
All nine baselines from §5.3 — toggle off if you want a faster run
Run Full Experiment
Trains every enabled model on the current dataset, evaluates on the held-out test set, and computes the full metric suite plus exact Brier-Score decomposition.
Run Settings
Configuration Snapshot
What this run will execute
Results
After running an experiment, results appear here. Numbers will differ from the paper's exact figures (paper uses scikit-learn, 30 seeds, and ±20% scenario perturbations of the recovered β; this engine uses pure-JS implementations) but should reproduce the same qualitative ordering and the calibration-driven advantage.
Comprehensive Comparison (Table 4 analog)
All metrics on the test partition · ★ = PBIE Pro · † = best recalibrated baseline
| Model | BS ↓ | ECE ↓ | NLL ↓ | AUROC ↑ | Tail-ECE ↓ | BS_cal ↓ | BS_ref ↓ |
|---|---|---|---|---|---|---|---|
| ⌖Run an experiment to populate this table. | |||||||
Ablation Study
Remove each component in turn and re-evaluate. The paper's central ablation finding is sub-additive synergy: total improvement (0.027) exceeds the linear sum of individual contributions (0.034).
Component Contributions
| Configuration | BS ↓ | ECE ↓ | NLL ↓ | ΔBS |
|---|---|---|---|---|
| ⌖Run the ablation to populate this table. | ||||
Intuition Sensitivity Analysis
As intuition noise σ_noise grows, the λₜ mechanism automatically attenuates the prior. The framework should remain calibrated (ECE ≤ 0.05) across all tested noise levels.
Noise Profile
| Configuration | BS ↓ | ECE ↓ | Mean λₜ | ΔBS |
|---|---|---|---|---|
| ⌖Run sensitivity to populate. | ||||
Interactive Single Prediction
Adjust a query vector x* and an expert intuition signal δ_int, then watch every intermediate quantity — ψ_base, ψ_adj, ψ_post, ψ_scen, P_final — update in real time.
Query Vector x*
20 features · adjust or randomise
Expert & Run
Component Breakdown
How each phase contributes to the final probability