What stops a frame from being routed to a model that cannot handle it?

The eligibility filter. An endpoint is only a candidate if it declares the frame's fixture_class and the frame's resolution falls inside the endpoint's band. If none is eligible the router falls back explicitly, recording a routing_score of 0.0 and the fallback model_id, and never silently forwards to an incompatible model. A spike in explicit fallbacks signals a new camera generation or fixture type needs a registry update.

Vision Model Routing for Shelf Detection

Within the Image Parsing & Computer Vision Workflows section, vision model routing is the control plane that sits between ingestion and detection — the stage that decides which detector ever sees a given frame, before a single expensive tensor is allocated on a GPU. A national banner captures shelf imagery across wildly heterogeneous conditions: a fixed gondola camera under even LED light, a field auditor’s smartphone shot of a refrigerated glass door, a robotic scanner’s motion-blurred aisle pass. Forcing every one of those frames through a single monolithic model inflates inference cost, blows latency budgets on captures that needed a cheaper path, and quietly degrades planogram-compliance accuracy on the captures that needed a harder one. Routing solves that by reading lightweight metadata and image statistics, then dispatching each frame to the detector variant whose training distribution actually matches it. This page defines the data contract the router enforces, the scoring architecture underneath it, the configuration that tunes its decisions, the failure modes you will hit in production, and the throughput numbers to design against.

Concept & Data Contract Jump to heading

The router has two hard boundaries, and everything between them is replaceable. At the inbound boundary it consumes a validated image envelope — the same normalized payload the ingestion layer produces, carrying store_id, fixture_id, fixture_class, planogram_id, capture_timestamp, a decodable image_uri, a device_class, and the sharpness_score that already gated the frame. At the outbound boundary it does not produce detections; it produces a routing decision record that names the chosen detector endpoint, the score that selected it, and the full feature trace that justified the choice. That separation is deliberate: the router stays stateless and auditable, and the bounding box extraction & SKU localization stage downstream receives both the frame and an explicit statement of which model is responsible for it.

A routing decision record looks like this:

{
  "capture_id": "8f3c2a1e-7b4d-4e2a-9c11-5a6b7c8d9e0f",
  "fixture_id": "COOLER-D07-DOOR2",
  "fixture_class": "refrigerated_glass_door",
  "planogram_id": "PLN-2026Q2-DAIRY-D07",
  "capture_timestamp": "2026-06-28T07:14:32Z",
  "device_class": "field_smartphone",
  "features": {
    "long_edge_px": 3024,
    "mean_luminance": 0.41,
    "sharpness_score": 0.78,
    "glare_ratio": 0.22
  },
  "routed_to": "detector-glare-robust-v4",
  "routing_score": 0.91,
  "routing_mode": "deterministic",
  "shadow_to": "detector-standard-v7"
}

The contract guarantee is that routed_to always resolves to a registered endpoint that declares it can serve this fixture_class at this resolution, and that routing_score and features together are sufficient to replay the decision offline. Nothing downstream is allowed to depend on how the score was computed — only on these typed fields — so the decision engine can evolve from rules to a learned router without breaking a single consumer.

Implementation Architecture Jump to heading

The router runs as a stateless microservice. It receives an image envelope, performs fast pre-inference triage on the CPU, computes a feature vector, evaluates that vector against a model registry, and emits the routing decision record. Triage is deliberately cheap — resolution and aspect ratio come from the image header without a full decode, luminance and glare come from a downsampled thumbnail, and fixture_class is a dictionary lookup. The whole pass must finish in single-digit milliseconds, because anything heavier defeats the purpose: you would be spending the compute you were trying to save.

The registry is the heart of the design. Each detector endpoint is declared with the fixture classes it serves, the resolution band it expects, a confidence floor, and a relative cost weight. The decision engine scores every eligible endpoint against the frame’s features and picks the best, rather than hard-coding a fixture-to-model mapping, so adding a new model is a config change rather than a code change.

from __future__ import annotations

from dataclasses import dataclass, field
from typing import Callable, Mapping, Sequence


@dataclass(frozen=True)
class ImageEnvelope:
    capture_id: str
    fixture_class: str
    long_edge_px: int
    mean_luminance: float   # 0..1
    sharpness_score: float  # 0..1
    glare_ratio: float      # 0..1


@dataclass(frozen=True)
class ModelEndpoint:
    model_id: str
    fixture_classes: frozenset[str]
    min_long_edge_px: int
    max_long_edge_px: int
    cost_weight: float                    # lower is cheaper
    affinity: Callable[[ImageEnvelope], float]  # 0..1 fitness for this frame


class NoEligibleModelError(RuntimeError):
    """Raised when no registered endpoint can serve a frame."""


class VisionModelRouter:
    """Stateless, deterministic router. Scores eligible endpoints and
    dispatches to the highest scorer; falls back explicitly, never silently."""

    def __init__(
        self,
        registry: Sequence[ModelEndpoint],
        fallback_model_id: str,
        cost_penalty: float = 0.15,
    ) -> None:
        if not registry:
            raise ValueError("registry must declare at least one endpoint")
        self._registry = tuple(registry)
        self._fallback = fallback_model_id
        self._cost_penalty = cost_penalty

    def _eligible(self, env: ImageEnvelope) -> list[ModelEndpoint]:
        return [
            m for m in self._registry
            if env.fixture_class in m.fixture_classes
            and m.min_long_edge_px <= env.long_edge_px <= m.max_long_edge_px
        ]

    def route(self, env: ImageEnvelope) -> tuple[str, float]:
        candidates = self._eligible(env)
        if not candidates:
            # Explicit, auditable degradation — the fallback must still exist.
            if not any(m.model_id == self._fallback for m in self._registry):
                raise NoEligibleModelError(
                    f"no endpoint for fixture_class={env.fixture_class!r} "
                    f"at {env.long_edge_px}px and fallback is unregistered"
                )
            return self._fallback, 0.0

        def score(m: ModelEndpoint) -> float:
            return m.affinity(env) - self._cost_penalty * m.cost_weight

        best = max(candidates, key=score)
        return best.model_id, round(score(best), 4)

The affinity callables are where domain knowledge lives. A glare-robust endpoint scores high when glare_ratio is elevated; a quantized light-tier endpoint scores high on bright, sharp, low-complexity captures where a cheap model is enough; a dense-packing endpoint scores high on high-resolution front-facing gondola shots. Keeping affinity as a function per endpoint — rather than one giant branching tree — means each model’s routing logic is owned and versioned alongside the model itself.

Deterministic scoring remains the production default for retail because it is auditable, its latency is predictable, and a category-management or vendor-chargeback dispute can be reconstructed from the recorded features and routing_score. A learned router (a gradient-boosted classifier or a small neural net trained on historical inference outcomes) is a legitimate upgrade once you have enough labelled routing outcomes, but it slots in behind the exact same route() contract so the rest of the pipeline never notices the swap.

Production Configuration & Tuning Jump to heading

The registry and its thresholds live in version-controlled config, not in code, so a new store format or a model rollout is a reviewable diff. A representative configuration looks like this:

router:
  fallback_model_id: detector-standard-v7
  cost_penalty: 0.15
  min_sharpness: 0.45        # frames below this are quarantined, not routed
  shadow_sample_rate: 0.05   # fraction of traffic mirrored to an alt model

endpoints:
  - model_id: detector-light-int8-v3
    fixture_classes: [endcap_promo, empty_shelf_audit]
    min_long_edge_px: 1280
    max_long_edge_px: 9000
    cost_weight: 0.30
    triggers: { min_luminance: 0.55, max_glare_ratio: 0.10 }

  - model_id: detector-standard-v7
    fixture_classes: [gondola, endcap_promo, refrigerated_glass_door]
    min_long_edge_px: 1600
    max_long_edge_px: 9000
    cost_weight: 1.00

  - model_id: detector-glare-robust-v4
    fixture_classes: [refrigerated_glass_door, freezer_door]
    min_long_edge_px: 1600
    max_long_edge_px: 9000
    cost_weight: 1.20
    triggers: { min_glare_ratio: 0.15 }

The settings that move the needle in practice: min_sharpness (default 0.45) keeps unrecoverable frames out of the model entirely by sending them to the quarantine path the error handling in computer vision pipelines workflows own; cost_penalty (default 0.15) sets how aggressively the router prefers a cheaper model when two endpoints are nearly tied on affinity — raise it to cut spend, lower it to favour accuracy; and shadow_sample_rate (default 0.05) controls how much live traffic is mirrored to a challenger model for evaluation. Endpoint triggers are the thresholds that feed each affinity function — min_glare_ratio: 0.15, for instance, is the boundary above which a refrigerated-door capture is treated as glare-dominated and steered to the specular-robust detector rather than the standard one.

Every threshold is read from config at startup and is hot-reloadable, so registries can swap model endpoints, run A/B tests, and stage gradual rollouts with no pipeline downtime. The discipline that keeps this stage debuggable is to tag every routed-frame record with the registry version that produced it; when compliance metrics shift, you can tell a routing-config change apart from a real shelf change.

Failure Modes & Debugging Workflow Jump to heading

Misrouting is insidious because it rarely throws — it produces plausible detections from the wrong model, which surface much later as systematic compliance gaps that contaminate category management and vendor-chargeback reconciliation. These are the failure modes you will actually hit, with how to reproduce and fix each:

Silent fallback storm. Symptom: a sudden spike in records where routing_score is 0.0 and routed_to equals the fallback model. Root cause is almost always a new fixture_class that no endpoint declares, or a resolution band that excludes a newly deployed camera generation. Reproduce by replaying a day of envelopes through route() and counting fallback hits per fixture_class; fix by adding the class or widening the offending endpoint’s resolution band in the registry.
Glare misroute on refrigerated fixtures. Symptom: dairy and freezer doors show depressed SKU match rates while ambient aisles look fine. Cause: glare_ratio is being computed on too aggressive a thumbnail downscale, so specular highlights wash out below min_glare_ratio and frames route to the standard detector instead of the glare-robust one. Reproduce by scatter-plotting glare_ratio against downstream sku_confidence for that fixture class; fix by recalibrating the glare estimator and lowering the trigger.
Cost-penalty starvation. Symptom: the light quantized model is winning frames it should not, and localization false-negatives climb. Cause: cost_penalty is set so high that a small affinity edge for the accurate model is overwhelmed by its cost weight. Reproduce by re-scoring affected captures with cost_penalty set to 0.0 and confirming the routing flips; fix by lowering the penalty or raising the accurate endpoint’s affinity floor.
Triage/inference distribution skew. Symptom: routing decisions look correct but accuracy still drifts over weeks. Cause: the model’s training distribution has moved away from live captures — a problem that belongs to debugging vision model drift in retail environments, not to routing logic. Reproduce by comparing the feature distribution of recent traffic against the registry’s assumed bands; fix by retraining or re-banding, not by re-tuning the router.
Registry/endpoint version mismatch. Symptom: a frame is routed to a model_id that the inference fleet has already retired, producing dispatch errors or, worse, a stale model silently still serving. Cause: the config rollout and the model deployment fell out of sync. Reproduce by asserting every routed_to in a recent window resolves to a live endpoint; fix by gating config promotion on a liveness check of the endpoints it references.

The technique that catches most of these before they reach a report is payload shadowing: mirror shadow_sample_rate of production frames to an alternate model, run both through localization, and compare outputs against ground-truth planogram audits. The routing decision trace — input features, selected model_id, score, and the downstream extraction metrics — is the audit record that makes a SOX-grade compliance review and a vendor SLA dispute resolvable after the fact.

The ordered sequence for bringing a new detector endpoint into the registry without misrouting live traffic follows a fixed order:

Register in shadow only. Add the endpoint to the registry with its fixture classes and bands, set its affinity, and route it nothing live — only mirror shadow_sample_rate of eligible frames to it.
Compare against ground truth. Run both the incumbent and challenger outputs through localization and score both against hand-audited planograms for the target fixture class.
Tune affinity and triggers. Adjust the endpoint’s triggers and affinity function until it wins exactly the frames it should and the score margin over the incumbent is stable, not marginal.
Ramp deterministically. Promote the endpoint from shadow to live for a single fixture_class, watch SKU match rate and tail latency, then widen class by class.
Gate and version. Assert every routed_to resolves to a live endpoint, tag records with the new registry version, and enable per-fixture routing-score and fallback-rate alerting before full rollout.

Scaling & Performance Benchmarks Jump to heading

The numbers worth designing against are operational, and they are dominated by the fact that triage runs on the CPU while inference runs on the GPU. Hold P95 routing latency — validated envelope in to routing decision out — under 8 milliseconds, because the router fronts every frame and any cost here multiplies across the whole store-walk volume. A single CPU worker sustains routing for thousands of frames per second; the router is never the bottleneck, the detectors behind it are, which is exactly why getting the dispatch right matters.

Concurrency is bounded by the broker ahead of this stage, not by the router itself. The broker-side batching and backpressure tuning that keeps tail latency flat through a morning store-walk burst is the subject of async image batching for high-volume stores, and a regional pipeline that must keep scoring when connectivity drops leans on the fallback routing for offline store scenarios patterns from the core-architecture section. The single most effective cost lever is the routing mix itself: steering empty-shelf audits and standardized promotional displays to a quantized int8 model at cost_weight: 0.30 while reserving the full-capacity detector for high-variance, compliance-critical captures typically cuts GPU spend 30–50% against a single-model baseline, with no measurable accuracy loss on the easy frames. The design target is to keep the fallback rate below 2% and the share of frames routed to the most expensive tier below 40%; those two ratios, not raw model speed, are what govern both cost and the accuracy of the planogram sync & SKU mapping strategies that consume this pipeline’s output.

Frequently Asked Questions Jump to heading

Why route before inference instead of running one strong model on everything? Because a single model forces a single trade-off onto frames with very different needs: it either over-spends GPU time on easy captures (empty-shelf audits, clean promo endcaps) or under-performs on hard ones (glare-dominated refrigerated doors, low-light smartphone shots). Routing reads cheap CPU-side features and matches each frame to a detector whose training distribution fits it, so you get the accuracy of a specialist on hard frames and the cost of a quantized model on easy ones — without retraining a single monolith to be good at everything at once.

Should the routing decision engine be rules or a learned model? Start with deterministic rules. They are auditable, their latency is flat, and a vendor-chargeback or category dispute can be replayed from the recorded features and routing_score. A learned router is a legitimate upgrade once you have enough labelled routing outcomes to train on, but it must slot in behind the same route() contract so nothing downstream changes. Most retail deployments never need more than well-tuned rules plus per-endpoint affinity functions.

How do I keep the router itself from becoming a latency bottleneck? Keep triage off the GPU and off a full image decode. Read resolution and aspect ratio from the image header, compute luminance and glare_ratio from a downsampled thumbnail, and resolve fixture_class from a dictionary, so the whole pass finishes under 8 milliseconds. If triage ever needs the model’s own features to decide, you have collapsed routing into inference and lost the cost saving the router exists to deliver.

What stops a frame from being routed to a model that can’t handle it? The eligibility filter. An endpoint is only a candidate if it declares the frame’s fixture_class and the frame’s resolution falls inside the endpoint’s band. If no endpoint is eligible the router falls back explicitly — recording routing_score of 0.0 and the fallback model_id — and never silently forwards to an incompatible model. A spike in those explicit fallbacks is your earliest signal that a new camera generation or fixture type needs a registry update.

How do I safely roll out a new detector without risking live compliance data? Register it in shadow mode first and mirror shadow_sample_rate (default 0.05) of eligible traffic to it without letting it serve any live result. Compare its localization output against hand-audited planograms, tune its affinity and triggers until its score margin over the incumbent is stable, then ramp it one fixture_class at a time while watching SKU match rate and tail latency. Tag every record with the registry version so any metric shift is traceable to the rollout.

Optimizing YOLOv8 for Grocery Shelf Detection — tuning the dense-packing detector that the router dispatches high-resolution gondola frames to
Bounding Box Extraction & SKU Localization — the stage that consumes the routed frame and turns it into planogram-ready coordinates
Async Image Batching for High-Volume Stores — the broker batching and backpressure that feed the router under burst load
Error Handling in Computer Vision Pipelines — quarantine, dead-letter, and drift-detection paths the router hands degraded frames to
Image Parsing & Computer Vision Workflows — the workflow section this routing layer belongs to

Vision Model Routing for Shelf Detection

Concept & Data Contract Jump to heading#

Implementation Architecture Jump to heading#

Production Configuration & Tuning Jump to heading#

Failure Modes & Debugging Workflow Jump to heading#

Scaling & Performance Benchmarks Jump to heading#

Frequently Asked Questions Jump to heading#

Related Jump to heading#