investment-sandbox/QUANT_ROADMAP.md

# QuantSandbox System Architecture & Quantitative Roadmap

This document serves as the permanent, centralized system architecture design and master context for all future quantitative feature deployments.

---

## 1. Repository Status & Milestone Log

### Completed Phases & Integrated Silos

*   **Phase 1.0: Portfolio Sandbox**
    *   *Features*: Real-time volatility estimators, portfolio optimization mechanics, and Swamy-Arora random effects panel regression solvers.
    *   *Status*: **Fully Operational (Production Lock)**.
*   **Phase 2.0: Live GJR-GARCH Scanners**
    *   *Features*: Real-time rolling volatility forecasting engine that detects asymmetric leverage effects in equity volatility. Upgraded with an interactive catalyst diagnostic drawer (systemic selloff, supply chain, management changes, legal fines, earnings misses), live FMP Small-Cap screener integration, and dynamic Rebound Probability calculations.
    *   *Status*: **Fully Operational (Production Lock)**.
*   **Phase 3.0: Real FRED Macro Ingestion**
    *   *Features*: Real-time server-side API integration with Federal Reserve Economic Data (FRED). Ingests Personal Savings Rates, Credit Card Delinquencies, Housing Starts, and Case-Shiller indices.
    *   *Status*: **Fully Operational (Production Lock)**.
*   **Phase 3.0: Smart Money & Whale Satellite-Screener**
    *   *Features*: Track high-conviction institutional shifts (SEC Form 13F filings) for Scion (Michael Burry), Akre Capital, and Mairs & Power. Calculates Velocity of Conviction (VoC) weight deltas, integrated with `DEV_MODE` offline protection shield. Consolidated under the unified 'Smart Money' workstation with a sub-tab switcher alongside broad corporate and congressional flows.
    *   *Status*: **Fully Operational (Production Lock)**.
*   **Phase 4.7: AI & Tech Hyper-Leverage Silo**
    *   *Features*: Track the AI CapEx-Overinvestment Cycle for NVDA, MSFT, GOOGL, META, and AMD. Calculates ROI-to-CapEx (Monetization Gap), Nvidia Supply-Chain Velocity Index, and Tech Infrastructure Leverage with a 60-minute caching layer.
    *   *Status*: **Fully Operational (Production Lock)**.
*   **Phase 5.0: Native KaTeX Rig & Dual-Handbook System**
    *   *Features*: Refactored all 8 sandbox modules to feature a unified twin-modal header layout (`📖 Quantitative Handbook` and `⚙️ Operational Blueprint`). Replaced all inline string-escaped LaTeX with native React `react-katex` calls to completely eliminate escaping anomalies. Authored 8 new functional operational blueprints explaining scanner, whale, and math mechanics.
    *   *Status*: **Fully Operational (Production Lock)**.

---

## 2. Master Backlog Architecture: The 6-Level Cockpit Matrix

The system tracks and synthesizes ~50 quantitative metrics divided into 6 distinct analytical levels to form a unified Market Regime Classifier.

### Level 1: Macro & Credit Layer (21 Metrics)
*   **Inflation Vectors**: CPI YoY, Core CPI, PPI.
*   **Sovereign Yields & Term Structure**: US 10Y Yield, US 2Y Yield, 2S10S Yield Spread, High-Yield Credit Spreads.
*   **Central Bank Liquidity**: Fed Balance Sheet Assets, ECB Refinancing Rate, Fed Funds Rate, M2 Money Supply, Reverse Repo (RRP) Volumes, Treasury General Account (TGA) levels.
*   **Macro Capacity**: S&P 500-to-GDP Ratio (Buffett Indicator Proxy).
*   **Labor Market Dynamics**: Non-Farm Payrolls (NFP), Unemployment Rate, Initial Jobless Claims.
*   **Housing & Credit Velocity**: Housing Starts, Mortgage Applications Index Proxy, S&P Case-Shiller Home Price Index.
*   **Consumer Stress Indexes**: Credit Card Delinquency Rates, Personal Savings Rate.

### Level 2: Market Breadth Layer (8 Metrics)
*   **Moving Average Spreads**: Percentage of S&P 500 constituents trading above their 50-day and 200-day Simple Moving Averages.
*   **Volume Accumulation**: Cumulative Advance-Decline Line (A/D Line) scaled by volume.
*   **McClellan Oscillator**: Index tracking short-term momentum shifts in net advances.
*   **High-Low Index**: Ratio of stocks making new 52-week highs to total new highs/lows.
*   **Sector Rotational Momentum**: Relative strength vectors of Defensive (XLU, XLP, XLV) vs. Cyclical/Growth (XLK, XLY, XLI) sectors.
*   **Beta Distribution spreads**: Dispersion of individual constituent betas relative to index beta.

### Level 3: Sentiment & Positioning Flow Layer (7 Metrics)
*   **Implied Volatility Structures**: VIX, VIX/VVIX term structure spreads.
*   **Option Flows**: CBOE Equity Put/Call Volume Ratio (10-day moving average).
*   **Retail Positioning**: AAII Bulls-Bears Spread, margin debt levels in retail brokerage accounts.
*   **Institutional Positioning**: NAAIM Exposure Index, CFTC Commitments of Traders (COT) net non-commercial positioning in S&P 500 futures.

### Level 4: Corporate Fundamental & Accruals Layer (6 Metrics)
*   **Accrual Integrity**: Sloan Ratio tracking earnings quality.
*   **Bankruptcy Probability**: Altman Z-Score for manufacturing and non-manufacturing firms.
*   **Earnings Manipulation**: Beneish M-Score tracking probability of financial statement manipulation.
*   **Financial Strength**: Piotroski F-Score (9-point fundamental health checklist).
*   **Margin Compression Dynamics**: Operating Margin YoY changes, Gross Margin trends.

### Level 5: Technical Momentum & Volatility Layer (5 Metrics)
*   **Vol Forecasts**: Rolling GJR-GARCH downside volatility forecast vectors.
*   **Relative Strength**: 14-day Relative Strength Index (RSI).
*   **Trend Vectors**: MACD Signal Line Spreads.
*   **Range Expansion**: Average True Range (ATR) normalized by price.
*   **Beta Expansion Multipliers**: Realized beta shifts in high-beta tech components.

### Level 6: Alternative Data Layer (3 Metrics)
*   **Supply Chain Disruption**: Supply-Chain Velocity Index (Aggregate buyer purchase obligations vs. hardware supplier inventories).
*   **Employment Demand**: Tech sector job postings scraped from aggregators.
*   **Credit Card Transactions**: Real-time consumer retail spending proxies.

---

## 3. Whale Reconnaissance Layer

Designed to track the equity holdings of institutional boutique Value and Small-Cap asset managers via SEC Form 13F filings.

```mermaid
graph TD
    A[SEC 13F Filings Ingestion] --> B{Filter Boutique Managers}
    B -- AUM < $5B & High Active Share --> C[Extract High-Conviction Long Positions]
    B -- Large Index Funds --> D[Discard]
    C --> E[Compute Quarterly Position Shifts]
    E --> F[Generate Whale Satellite-Screener Score]
```

### Screener Specifications
*   **Target Universe**: Boutique managers with Asset Under Management (AUM) between $100M and $5B, exhibiting an Active Share $> 80\%$.
*   **Quant Filters**:
    1.  **Concentration Index**: Top 10 holdings must exceed $50\%$ of the total reported portfolio value.
    2.  **Position Size Changes**: Track quarterly additions ($\Delta W_{i} > 2\%$) where the manager is actively building a stake.
    3.  **Co-ownership Clusters**: Identify stocks bought by 3 or more selected boutique managers simultaneously.

### Implemented Architecture
- **API Endpoint**: [/api/whale/screener](file:///c:/Users/jannr/.gemini/antigravity/scratch/investment-sandbox/app/api/whale/screener/route.ts) fetches SEC Form 13F filings for Scion Asset Management (Michael Burry: `0001649339`), Akre Capital Management (`0001483348`), and Mairs & Power Small Cap Fund (`0001099684`).
- **Offline Fallback**: Respects `DEV_MODE=true` environment configurations, completely bypassing outbound FMP requests and serving structured `MOCK_WHALE_DATA` with `isShieldActive: true`.
- **Visual Interface**: Consolidated under the 'Smart Money' tab (`app/page.tsx`) with a sub-tab selection toggle to switch between broad corporate/congressional flows and whale conviction screeners.
- **Velocity of Conviction**:
  $$\text{VoC}_i = w_{i, t} - w_{i, t-1}$$
  $$w_{i, t} = \frac{V_{i, t}}{\sum_{j} V_{j, t}} \times 100$$


---

## 4. Deep-Dive Corporate Terminal Specifications

When evaluating an individual equity ticker, the terminal computes three quantitative risk markers:

### I. The Sloan Ratio (Earnings Quality Indicator)
Measures the proportion of earnings backed by non-cash accruals. A high ratio indicates that earnings are driven by accounting accruals rather than real operating cash flows.

#### Mathematical Formulation:
$$\text{Accruals} = \text{Net Income} - (\text{Cash Flow From Operations} + \text{Cash Flow From Investing})$$
$$\text{Sloan Ratio} = \frac{\text{Accruals}}{\text{Total Assets}} \times 100$$

#### Regime Classifications & Thresholds:
*   **Safe Regime (High Earnings Quality)**: $-10\% \le \text{Sloan Ratio} \le 10\%$
*   **Anomaly Regime (Aggressive Accrual Expansion)**: $\text{Sloan Ratio} > 10\%$ or $\text{Sloan Ratio} < -10\%$ (Signals earnings manipulation or aggressive capitalization risk)

---

### II. Analyst Revision Impulse (ARI)
Tracks the momentum of consensus earnings estimates over a rolling 14-day window to identify positive or negative structural inflections before earnings reports.

#### Mathematical Formulation:
$$\text{ARI}_{t} = \sum_{h=1}^{H} \frac{E_{t}(\text{EPS}_{h}) - E_{t-14}(\text{EPS}_{h})}{E_{t-14}(\text{EPS}_{h})}$$

Where:
*   $E_{t}(\text{EPS}_{h})$ is the consensus EPS estimate at day $t$ for fiscal period $h$.
*   $H$ represents the number of forward fiscal quarters modeled (standard $H=4$).

---

### III. GJR-GARCH Downside Buffer
Calculates the conditional Value-at-Risk (VaR) and Expected Shortfall (ES) at a $99\%$ confidence level over a 10-day forward horizon using volatility projections from Module 1.

#### Mathematical Formulation:
$$\sigma_{t}^2 = \omega + \left(\alpha + \gamma I_{t-1}\right) \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2$$

$$\text{VaR}_{99\%, 10D} = P_{t} \times \left(1 - e^{z_{0.01} \times \sqrt{10} \times \sigma_{t}}\right)$$

Where:
*   $I_{t-1} = 1$ if $\epsilon_{t-1} < 0$, and $0$ otherwise (asymmetric shock multiplier).
*   $z_{0.01}$ is the $1\%$ quantile of the standardized residual distribution (Student-t or Normal).

---

### IV. GJR-GARCH Rebound Probability Score (Scanner Module)
Formulates a dynamic, real-time rebound probability based on the GJR-GARCH volatility outlier score adjusted by news-based qualitative shock stress damping.

#### Mathematical Formulation:
$$P_{\text{rebound}} = 0.6 \times S_{\text{overreact}} + 0.4 \times (100 - C_{\text{stress}})$$

Where:
*   \(S_{\text{overreact}}\) is the Overreaction Outlier Score, derived from the ratio of the absolute price decline relative to the estimated conditional volatility:
    $$S_{\text{overreact}} = \text{clip}\left(\frac{\Delta P}{\sigma_{\text{GJR-GARCH}}} \times 30 + 30, 10, 95\right)$$
*   \(C_{\text{stress}}\) is the catalyst-specific stress coefficient determined by the identified drop catalyst:
    *   *Systemic Selloff*: \(15\%\) (liquid liquidity shock, fast rebound)
    *   *Supply Chain Disruption*: \(40\%\) (transitory capacity constraint)
    *   *Executive Shift*: \(55\%\) (strategic and operational uncertainty)
    *   *Regulatory Issue / Fine*: \(65\%\) (direct balance sheet / cash flow impact)
    *   *Earnings Miss*: \(75\%\) (structural growth deceleration, slow rebound)

---

### V. Crypto Bayesian Markov & Self-Correcting Engine

Integrates a two-stage predictive mapping pipeline for cryptocurrency assets (BTC, ETH, SOL) that combines on-chain derivatives data with online Bayesian updates.

#### 1. Machine Learning Random Forest Classifier
Ensemble of 10 decision trees mapping four features to forecast trend probabilities:
*   **Funding Rates (FR)**: Future leverage balance indicators.
*   **Open Interest (OI) Volatility**: Contract buildup velocities.
*   **Long/Short (LS) Retail Skew**: Sentiment extreme markers.
*   **Whale Inflows (W)**: Cold-wallet transfer proxy metrics.

$$P_{\text{ML}} = \frac{1}{M} \sum_{m=1}^{M} T_m(FR_t, OI_t, LS_t, W_t)$$

#### 2. Conjugate Beta-Binomial Update
Continuous ML probability outputs ($P_{\text{ML}}$) are mapped into a discrete Binomial likelihood by defining the Trust-Weight Hyperparameter ($w=12$) as the Effective Sample Size (ESS):
*   **Prior distribution**: $\theta \sim \text{Beta}(\alpha_{\text{prior}}, \beta_{\text{prior}})$
*   **Binomial Likelihood pseudo-observations**: $k = P_{\text{ML}} \times w$ (successes), $w - k = (1 - P_{\text{ML}}) \times w$ (failures)
*   **Conjugate Posterior Update**:
    $$\alpha_{\text{post}} = \alpha_{\text{prior}} + k$$
    $$\beta_{\text{post}} = \beta_{\text{prior}} + (w - k)$$

#### 3. Posterior Mean Integration Proof
Integrating the continuous parameter $\theta$ out of the posterior distribution gives the mathematical expectation of the posterior:
$$\mathbb{E}[\theta \mid \text{Data}] = \int_{0}^{1} \theta \cdot P(\theta \mid \text{Data}) \, d\theta = \int_{0}^{1} \theta \cdot \frac{\theta^{\alpha_{\text{post}}-1}(1-\theta)^{\beta_{\text{post}}-1}}{\text{B}(\alpha_{\text{post}}, \beta_{\text{post}})} \, d\theta$$
$$\mathbb{E}[\theta \mid \text{Data}] = \frac{\text{B}(\alpha_{\text{post}} + 1, \beta_{\text{post}})}{\text{B}(\alpha_{\text{post}}, \beta_{\text{post}})} = \frac{\alpha_{\text{post}}}{\alpha_{\text{post}} + \beta_{\text{post}}}$$

#### 4. Expanded Workstation Formula
$$P_{\text{Posterior}} = \frac{\alpha_{\text{prior}} + (P_{\text{ML}} \times w)}{\alpha_{\text{prior}} + \beta_{\text{prior}} + w}$$

#### 5. Walk-Forward Validation & Multi-Model Ensemble
To prevent look-ahead bias and structural overfitting, the system deploys a Walk-Forward Validation framework on a fixed 365-day rolling window across a fleet of 5 machine learning estimators: Random Forest (RF), XGBoost/Gradient Boosting (GB), ElasticNet Logistic Regression (LR), Support Vector Machines (SVM), and Multi-Layer Perceptrons (MLP).

Predictions are generated across three distinct forecast horizons: \(T+1\), \(T+5\), and \(T+10\). To ensure absolute stationarity, all raw asset prices are stripped from the feature space, utilizing only Log-Returns, Rolling Volatility, RSI, Distance to Moving Averages, and Daily Spreads.

##### Leakage Safeguards (Horizon Cutoff):
For a training window ending at index \(T-1\) and forecasting horizon \(H \in \{1, 5, 10\}\):
*   **T+1 Horizon**: Trains on features up to index \(T-2\), using target labels resolved up to index \(T-1\).
*   **T+5 Horizon**: The training set is truncated by \(5\) steps, meaning the latest training features end at index \(T-6\) to ensure that the target labels (which require a 5-day future window) do not extend past index \(T-1\) (the window boundary).
*   **T+10 Horizon**: The training set is truncated by \(10\) steps, ending features at index \(T-11\) to ensure zero leakage of post-boundary price data.

##### Multi-Tracker Online Learning:
The cockpit maintains 15 independent Beta-Posterior trackers (5 models \(\times\) 3 horizons) persisted inside the client browser. Each tracker is initialized with historical priors and updated dynamically in the background. The expected accuracy is calculated as:
\[\mathbb{E}[\theta] = \frac{\alpha}{\alpha + \beta}\]
where \(\alpha\) represents successes and \(\beta\) represents false alarms, calculated independently for each estimator-horizon pair.

---

### VI. Sandbox Portfolio Cockpit & Kelly Sizing

Integrates fractional betting algorithms and asset weight models inside the active Portfolio Sandbox environment.

#### 1. Active Portfolio Weighting ($w_i$)
Calculates the dynamic percentage value allocation of constituent assets:
$$w_i = \frac{\text{Shares}_i \times P_{\text{current}, i}}{\sum_{j} \text{Shares}_j \times P_{\text{current}, j}}$$

#### 2. Synthetic Portfolio Return ($R_{pt}$)
Simulates active log returns of the combined holdings:
$$R_{pt} = \sum_{i} w_i \times \ln\left(\frac{P_{t, i}}{P_{t-1, i}}\right)$$

#### 3. Theoretical Kelly Sizing ($f^*$)
Calculates the optimal size fraction to maximize the log growth of capital:
$$f^* = \frac{p \cdot b - (1 - p)}{b}$$
Where $p$ is the probability of success, and $b$ is the payout odds ratio (average win/average loss).

#### 4. Half-Kelly Safety Buffer Sizing
Applies a fractional buffer to lower estimation variance and protect against drawdowns:
$$f_{\text{applied}} = \max\left(0, 0.5 \times f^*\right)$$

---

## 5. Multi-Regime Transition Classifier

The core cognitive brain of the sandbox dynamically adjusts allocation weights across our portfolio modules based on estimated macroeconomic and market states.

```mermaid
graph LR
    A[Level 1-6 Inputs] --> B[Dynamic Z-Score Solver]
    B --> C[Markov-Switching Model]
    C --> D{Regime Output}
    D -->|Regime 0: Risk-On| E[Overweight Equities/Growth]
    D -->|Regime 1: Transition| F[Neutral / Hedge overlay]
    D -->|Regime 2: Risk-Off| G[Overweight Bonds/Cash/Short Vol]
```

### Model Specifications
1.  **Regime Estimation**: A 3-state Markov-Switching Vector Autoregressive (MS-VAR) model classifying the market into:
    *   **Regime 0 (Expansion/Risk-On)**: Low volatility, positive macro surprise, expanding supply-chain velocity.
    *   **Regime 1 (Late-Cycle/Transition)**: Softening breadth, rising credit spreads, negative monetization gaps.
    *   **Regime 2 (Contraction/Risk-Off)**: High realized volatility, yield curve uninversion, consumer savings depletion.
2.  **Dynamic Weight Allocation**:
    $$\mathbf{W}_{t} = s_t \mathbf{W}_{\text{Risk-On}} + (1 - s_t) \mathbf{W}_{\text{Risk-Off}}$$
    Where $s_t \in [0, 1]$ represents the filtered probability of being in the expansionary regime at time $t$.