EPP Data Layer Architecture¶
The EPP Data Layer is a pluggable subsystem responsible for hydrating Endpoint objects with real-time signals (metrics, metadata) from external sources. It follows a highly decoupled architecture defined in pkg/epp/framework/interface/datalayer.
Pattern: Driver-Based Extraction (DataSource Push)¶
Unlike a traditional "pull" model where consumers request data, the Data Layer uses a Driver-Based Extraction pattern:
- The Driver (DataSource): An implementation of
fwkdl.DataSource(e.g.,HTTPDataSource) is the active component. ItsCollectmethod is triggered by the framework on a schedule. - The Payload: The
DataSourcefetches raw data (e.g., a Prometheus/metricspayload or a local status file). - The Push: The
DataSourcethen iterates through all registeredExtractorplugins and "pushes" the raw data to them via theirExtract(ctx, data, ep)method. - Wiring via Configuration: Unlike scheduling plugins that are grouped in profiles, Extractors are explicitly associated with a DataSource in the
datasection of the configuration usingPluginRefs. The configuration loader (pkg/epp/config/loader/configloader.go) resolves these references from the globalpluginsregistry. - Type Validation: During initialization,
AddExtractorvalidates that theDataSourceoutput type matches theExtractor'sExpectedInputType(), ensuring runtime safety.
Pattern: Driver-Based Push (Collector Loop)¶
The Data Layer follows a Push Model orchestrated by the Collector (pkg/epp/datalayer/collector.go):
1. The Collector runs a periodic loop for each endpoint.
2. On each tick, it calls Collect() on all registered DataSource instances.
3. The DataSource fetches data and "pushes" it to its registered Extractors.
This ensures that all extractors update their state (e.g., populating Metrics.Custom) in a synchronized fashion driven by the central collection cycle.
This architecture allows a single expensive network fetch (e.g., scraping vLLM metrics) to be shared by dozens of independent extraction plugins (Queue Depth, KV Cache, Custom Metrics) without redundant IO.
The Metric Scoring Pipeline: Hose, Filter, Consumer¶
To configure scoring based on an arbitrary metric, it is helpful to visualize the three distinct roles in the pipeline:
- The Hose (DataSource): Responsible for fetching the raw data blob (e.g., a Prometheus
/metricspayload). In the configuration, this is a plugin likeprometheus-data-source(or custom HTTP source). It doesn't know about specific metrics; it just knows how to connect to the model server and pull the full text. - The Filter (Extractor): A plugin like
prometheus-metricthat lives under the DataSource in thedata:section. It parses the raw blob for one specific metric (e.g.,vllm:lora_requests_info). It then populates the internalEndpointobject'sMetrics.Custommap with that value. - The Consumer (Scorer): A scheduling plugin like
metric-scorer(referenced in aschedulingProfile). It doesn't know about DataSources or Prometheus. It simply looks at theEndpoint.Metrics.Custommap for a key that matches its configuredmetricNameand uses that value to produce a score (0.0 - 1.0).
Key Wiring Rule: For the pipeline to work, both the DataSource and the Extractor must be explicitly defined in the top-level plugins: section of the config, even though they are primarily used within the data: section.
Pluggable Inventory¶
The Data Layer itself is fully pluggable. While the framework provides a robust HTTPDataSource, the architecture supports:
- Custom DataSources: Non-HTTP sources like local Unix sockets or shared memory.
- Custom Extractors: Plugins that parse arbitrary formats (JSON, Protobuf, custom text).
- Metric Plugins: Specialized Extractors that populate the Custom metric map. The Prometheus Metric Plugin is the primary implementation, allowing operators to extract any scalar gauge or counter from model servers without code changes.
Validation and Consistency¶
To ensure reliable data flow, the Data Layer follows a strict initialization lifecycle:
-
Validation (Construction Time): The
ConfigLoader(pkg/epp/config/loader) parses and validates the configuration structure.- Plugin Existence: Verifies that all referenced
pluginRefs are defined in the globalpluginssection. - Interface Compliance: Verifies that Plugins implementing
fwkdl.DataSourceandfwkdl.Extractorare correctly typed. - Runtime Safety: During the subsequent initialization phase (in
Runner), the framework enforces Type Safety by verifying that a genericDataSourceproducing typeTis only connected toExtractorsthat accept typeT.
- Plugin Existence: Verifies that all referenced
-
Hydration (Scrape Cycle): Data is updated by the
Collectoron an independent, per-endpoint periodic loop.- The Update: When a scrape succeeds, the Extractor populates the
Metrics.Custommap and sets a freshUpdateTime. - The Stale State: If a scrape fails or is delayed beyond the
metricsStalenessThreshold, the data remains in the map but becomes stale (indicated by an oldUpdateTime).
- The Update: When a scrape succeeds, the Extractor populates the
-
Consumption (Scheduling Cycle): Data is "consumable" at any point, but is most critical during the Scoring Phase of a request.
- Freshness Check: specialized plugins (like
UtilizationDetector) can check theUpdateTimeto decide if the metric is reliable. - Fallback logic: If a metric is missing (e.g., first-time initialization or persistent scrape failure), plugins like
MetricScoreroperate on a default/worst-case value to avoid scheduling blind.
- Freshness Check: specialized plugins (like