Why use an outer join instead of an inner join for facing comparison?

An inner join silently drops the two states that matter most: a SKU the planogram demands but the shelf never showed, which is a clean out-of-stock, and a SKU present on the shelf that the planogram never planned, which is an unplanned placement. An outer join preserves both single-sided rows so they can be classified rather than lost.

How do I stop low-count SKUs from being flagged for being off by one?

A percentage tolerance collapses toward zero on a one- or two-facing SKU, making any deviation a violation. Floor the tolerance with np.maximum against a minimum facing threshold so the band never rounds below 1, then tune the percentage per velocity tier rather than tightening it globally.

Why does the classifier test zero-expectation conditions before the tolerance check?

Order in np.select is significant. If the tolerance branch ran first, a planned-but-empty slot could be swept into a generic verdict instead of OUT_OF_STOCK, and an unplanned-but-present slot would be misread. Testing the zero-expectation and zero-observation branches first keeps each state operationally distinct.

How should the engine handle promotional displays that intentionally break the planogram?

Carry a promo_override_flag on slots that are intentional deviations such as secondary or end-of-aisle displays, and short-circuit those rows to COMPLIANT before classification so a sanctioned overstock is never scored as a violation.

Why are the variance calculations vectorized instead of using apply()?

Row-wise apply() and Python loops do not scale to thousands of bays. Using np.where for variance and np.select for classification assigns every row in one pass, which keeps the engine fast enough to run across an entire chain and is also easier to reason about and test deterministically.

Calculating Facing Discrepancies with Python

This walkthrough sits under Automating Facings vs Actuals Validation and solves one precise task: given a planogram’s expected facings and a captured count of what is physically on the shelf, compute a per-slot discrepancy that a category manager can act on without re-auditing the bay. The naive version — observed - expected — looks trivial and is wrong in production, because it fires phantom violations on identifier drift, treats a 0-expectation promo overstock as a compliance failure, masks true out-of-stocks behind rounding, and applies one rigid tolerance to a fast-moving beverage and a slow seasonal endcap alike. This page builds a deterministic, vectorized discrepancy engine in pandas and NumPy, step by step, where each step is independently verifiable and the output is an auditable record rather than a single ambiguous number.

Prerequisites & Context Jump to heading

Before applying this page, confirm the following are in place. This routine runs after detections have been resolved to a canonical SKU and assigned to a slot; if your inputs still carry raw vendor codes or unmatched boxes, the discrepancy will be noise no matter how careful the arithmetic is.

Runtime: Python 3.11+ with pandas and numpy on the host that builds compliance records.
Canonical identifiers: both inputs share one SKU taxonomy. UPC, internal merchandising SKU, and vendor code fragmentation must be reconciled upstream in Planogram Sync & SKU Mapping Strategies so the join key is trustworthy — a mismatched key produces a phantom OUT_OF_STOCK on one row and an UNPLANNED_PLACEMENT on its twin.
Slot-mapped actuals: observed facings already carry the slot_id they were assigned to, the output of the bipartite matching in Validating Shelf Position Tolerances in Retail.
Velocity and tier metadata: a per-SKU velocity_tier so the tolerance band can tighten on fast movers — the calibration of these bands is owned by Threshold Tuning for Compliance Accuracy.
Promo override flags: intentional planogram deviations (secondary displays, end-of-aisle features) marked so they bypass the standard gate rather than scoring as violations, per Promotional Display Alignment Checks.

The input contract is two DataFrames sharing a composite key. The planogram frame carries sku, slot_id, expected_facings, velocity_tier; the actuals frame carries sku, slot_id, observed_facings. Cast missing values to NaN explicitly rather than imputing them — a genuinely empty slot and an unread slot are different facts, and silently filling either to 0 destroys that distinction.

Step 1 — Align Schemas and Join on a Composite Key Jump to heading

Merge planogram and actuals on the composite ['sku', 'slot_id'] key with an outer join. An inner join would silently drop the two cases you most need to catch: a SKU the planogram demands but the shelf never showed (a clean out-of-stock), and a SKU on the shelf that the planogram never planned (an unplanned placement). After the join, fill only the facing columns to 0 and lock both to integer type so downstream arithmetic never drifts into floats.

import pandas as pd

REQUIRED = {"sku", "slot_id", "expected_facings", "observed_facings"}


def join_planogram_actuals(
    planogram_df: pd.DataFrame, actuals_df: pd.DataFrame
) -> pd.DataFrame:
    """Outer-join expectations and observations on the composite slot key."""
    available = planogram_df.columns.union(actuals_df.columns)
    if not REQUIRED.issubset(available):
        raise ValueError(f"inputs must jointly provide columns: {REQUIRED}")

    merged = pd.merge(
        planogram_df[["sku", "slot_id", "expected_facings", "velocity_tier"]],
        actuals_df[["sku", "slot_id", "observed_facings"]],
        on=["sku", "slot_id"],
        how="outer",
    )
    merged[["expected_facings", "observed_facings"]] = (
        merged[["expected_facings", "observed_facings"]].fillna(0).astype(int)
    )
    merged["velocity_tier"] = merged["velocity_tier"].fillna("standard")
    return merged

Step 2 — Compute Signed and Absolute Variance Jump to heading

Keep both the signed and absolute discrepancy. The sign carries the operational meaning — a discrepancy of -2 is a missing-facing replenishment ticket, +3 is an unauthorized expansion that needs a merchandising correction — while the absolute value is what the tolerance gate compares against. Compute the percentage variance with explicit zero-division protection so a 0-expectation slot never raises or silently yields inf.

import numpy as np


def add_variance(merged: pd.DataFrame) -> pd.DataFrame:
    merged["discrepancy"] = merged["observed_facings"] - merged["expected_facings"]
    merged["abs_discrepancy"] = merged["discrepancy"].abs()
    merged["variance_pct"] = np.where(
        merged["expected_facings"] == 0,
        np.where(merged["observed_facings"] > 0, 100.0, 0.0),
        (merged["discrepancy"] / merged["expected_facings"]) * 100.0,
    ).round(2)
    return merged

Step 3 — Apply a Velocity-Weighted Tolerance Gate Jump to heading

A static tolerance percentage rarely survives a real chain. High-velocity categories deserve a tight band; bulky slow movers need a wider buffer to absorb manual restock lag. Compute the tolerance as a percentage of expected facings, then floor it with np.maximum so a percentage band never collapses to an impossibly strict 0 on a one-facing SKU. Tighten the percentage per velocity_tier rather than hard-coding one global number.

TIER_TOLERANCE = {"fast": 0.05, "standard": 0.10, "slow": 0.20}


def add_tolerance_gate(merged: pd.DataFrame, min_facing_floor: int = 1) -> pd.DataFrame:
    pct = merged["velocity_tier"].map(TIER_TOLERANCE).fillna(0.10)
    merged["tolerance_limit"] = np.maximum(
        np.ceil(merged["expected_facings"] * pct), min_facing_floor
    ).astype(int)
    merged["within_tolerance"] = merged["abs_discrepancy"] <= merged["tolerance_limit"]
    return merged

The fast-mover band of 0.05 and the slow-mover band of 0.20 are starting points; treat them as a configuration surface, not constants, and let the tuning module move them as audit ground-truth accumulates.

Step 4 — Classify Compliance State Jump to heading

Reduce the numbers to a discrete state with np.select, which assigns vectorized conditions without the per-row penalty of apply(). Order matters: the zero-expectation branches must be tested before the tolerance branch, so a planned-but-empty slot resolves to OUT_OF_STOCK and an unplanned-but-present slot to UNPLANNED_PLACEMENT rather than being swept into a generic verdict.

def classify(merged: pd.DataFrame) -> pd.DataFrame:
    conditions = [
        (merged["expected_facings"] == 0) & (merged["observed_facings"] == 0),
        (merged["expected_facings"] > 0) & (merged["observed_facings"] == 0),
        (merged["expected_facings"] == 0) & (merged["observed_facings"] > 0),
        merged["within_tolerance"],
    ]
    choices = ["COMPLIANT", "OUT_OF_STOCK", "UNPLANNED_PLACEMENT", "COMPLIANT"]
    merged["status"] = np.select(conditions, choices, default="VIOLATION")
    return merged

Step 5 — Emit an Auditable Facing-Variance Record Jump to heading

The pipeline’s output is not a number, it is a record you can defend in a vendor dispute months later. Roll the per-slot frame up into a typed bay-level struct that carries provenance — the planogram revision, the capture timestamp, and the rolled-up flags downstream dashboards key on. Persist the raw inputs alongside this output so any score is reproducible.

from datetime import datetime, timezone


def build_record(
    merged: pd.DataFrame, planogram_id: str, fixture_id: str
) -> dict:
    total = len(merged)
    compliant = int((merged["status"] == "COMPLIANT").sum())
    return {
        "planogram_id": planogram_id,
        "fixture_id": fixture_id,
        "capture_timestamp": datetime.now(timezone.utc).isoformat(),
        "compliance_percentage": round(100.0 * compliant / total, 1) if total else 0.0,
        "out_of_stock_flags": merged.loc[
            merged["status"] == "OUT_OF_STOCK", "sku"
        ].tolist(),
        "misplaced_sku_list": merged.loc[
            merged["status"] == "UNPLANNED_PLACEMENT", "sku"
        ].tolist(),
        "slots": merged.to_dict(orient="records"),
    }

A serialized record carries the fields the reporting layer expects:

{
  "planogram_id": "PG-2026-GROCERY-A14",
  "fixture_id": "BAY-014-SHELF-03",
  "capture_timestamp": "2026-06-28T07:42:11Z",
  "compliance_percentage": 88.6,
  "out_of_stock_flags": ["0007800011546"],
  "misplaced_sku_list": ["0001200000341"],
  "slots": [
    {"sku": "0007800010013", "slot_id": "S-03-07", "expected_facings": 4,
     "observed_facings": 4, "discrepancy": 0, "variance_pct": 0.0,
     "tolerance_limit": 1, "status": "COMPLIANT"}
  ]
}

For storage, write the per-slot frame to Parquet partitioned by store_id and audit_date; that layout keeps the time-series compliance queries the dashboard runs cheap.

Verification & Testing Jump to heading

Confirm each stage deterministically rather than eyeballing a summary number:

Outer join preserves single-sided rows. Feed a planogram SKU with no matching actual and an actual SKU with no matching planogram entry; assert both appear in the merged frame and resolve to OUT_OF_STOCK and UNPLANNED_PLACEMENT respectively.
Zero-division is contained. Pass a slot with expected_facings of 0 and observed_facings of 3; assert variance_pct == 100.0 and no warning is raised, then a 0/0 slot returns 0.0.
Tolerance floor holds. With expected_facings of 1 and a fast tier, assert tolerance_limit == 1 (not 0), so a single-facing SKU is never impossible to satisfy.
Velocity band bites. Give a fast and a slow SKU the same expected_facings of 10 and the same abs_discrepancy of 2; assert the fast SKU is a VIOLATION and the slow SKU is COMPLIANT.
Classification order is correct. Assert a planned-but-empty slot returns OUT_OF_STOCK, never a generic VIOLATION, proving the zero branches are tested before the tolerance branch.

A healthy run shows a compliance_percentage that matches a hand-counted sample bay within rounding, an out_of_stock_flags list that contains only genuinely empty planned slots, and zero rows landing on the np.select default for inputs that have a defined expectation and observation.

Troubleshooting Jump to heading

Symptom	Likely root cause	Remediation
Phantom `OUT_OF_STOCK` and `UNPLANNED_PLACEMENT` on the same product	SKU key differs between inputs (UPC vs internal code) so the outer join never matches	Reconcile identifiers upstream before joining; assert the unmatched-row count is near zero on a known-good bay
Low-count SKUs flagged `VIOLATION` for being off by one	Percentage tolerance collapsed below `1` with no floor	Confirm the `np.maximum` floor against `min_facing_floor`; never let the band round to `0`
Promo overstock scored as a violation	Secondary-display facings run through the standard gate	Carry a `promo_override_flag` and short-circuit those rows to `COMPLIANT` before classification
Observed facings arrive fractional (e.g. `3.8`) and crash the int cast	Occlusion or angled packaging yields partial counts from the vision stage	Round per merchandising policy — `np.floor` for conservative scoring, `np.round` for reconciliation — before `astype(int)`
`compliance_percentage` drifts batch to batch on a stable shelf	Tolerance bands too tight for category velocity	Re-tune `TIER_TOLERANCE` against audit ground truth rather than tightening globally

Automating Facings vs Actuals Validation — the parent stage and the facing-variance record this engine emits
Threshold Tuning for Compliance Accuracy — how the velocity-weighted tolerance bands used in Step 3 are calibrated
Validating Shelf Position Tolerances in Retail — the slot assignment that produces the actuals this page consumes

Calculating Facing Discrepancies with Python

Prerequisites & Context Jump to heading#

Step 1 — Align Schemas and Join on a Composite Key Jump to heading#

Step 2 — Compute Signed and Absolute Variance Jump to heading#

Step 3 — Apply a Velocity-Weighted Tolerance Gate Jump to heading#

Step 4 — Classify Compliance State Jump to heading#

Step 5 — Emit an Auditable Facing-Variance Record Jump to heading#

Verification & Testing Jump to heading#

Troubleshooting Jump to heading#

Related Jump to heading#