X-Ray Integration¶
mlflow-dynamodbstore integrates with AWS X-Ray to provide span-level observability for MLflow traces. When you instrument your application with OpenTelemetry, spans are exported to X-Ray and lazily cached in DynamoDB when accessed through the MLflow API.
How It Works¶
The key insight is that trace metadata (experiment, tags, timestamps) flows through MLflow, while span data (inputs, outputs, timing, attributes) flows through X-Ray. On first access, spans are fetched from X-Ray, converted, and cached in DynamoDB.
Dual-Export OTel Setup¶
Configure OpenTelemetry to export spans to both MLflow and X-Ray simultaneously:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# X-Ray exporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter as GrpcExporter
# MLflow exporter
from mlflow.tracing.export.mlflow import MlflowSpanExporter
# Annotation processor (maps MLflow attributes to X-Ray annotations)
from mlflow_dynamodbstore.otel.annotation_processor import AnnotationSpanProcessor
provider = TracerProvider()
# Add the annotation processor FIRST so annotations are set before export
provider.add_span_processor(AnnotationSpanProcessor())
# Export to X-Ray via OTLP (e.g., through the ADOT collector)
provider.add_span_processor(
BatchSpanProcessor(GrpcExporter(endpoint="http://localhost:4317"))
)
# Export to MLflow
provider.add_span_processor(
BatchSpanProcessor(MlflowSpanExporter())
)
trace.set_tracer_provider(provider)
ADOT Collector
The AWS Distro for OpenTelemetry (ADOT) collector can receive OTLP spans and forward them to X-Ray. This is the recommended setup for production:
Annotation Processor¶
The AnnotationSpanProcessor copies MLflow span attributes into X-Ray annotation attributes. X-Ray annotations are indexed key-value pairs that enable filter queries.
Default Annotation Mapping¶
| MLflow Attribute | X-Ray Annotation | Description |
|---|---|---|
mlflow.spanType |
mlflow_spanType |
Span type (e.g., LLM, RETRIEVER) |
name (span name) |
mlflow_spanName |
The span's display name |
status (span status) |
mlflow_spanStatus |
Status code (OK, ERROR, UNSET) |
mlflow.chat_model |
mlflow_chatModel |
Chat model identifier |
mlflow.invocation_params.model_name |
mlflow_modelName |
Model name from invocation params |
Custom Annotation Config¶
Pass a custom mapping to the processor:
from mlflow_dynamodbstore.otel.annotation_processor import AnnotationSpanProcessor
custom_config = {
"mlflow.spanType": "mlflow_spanType",
"name": "mlflow_spanName",
"status": "mlflow_spanStatus",
"mlflow.chat_model": "mlflow_chatModel",
"my.custom.attribute": "my_custom_annotation",
}
processor = AnnotationSpanProcessor(config=custom_config)
Warning
X-Ray annotation names must use only alphanumeric characters and underscores. The default config already follows this convention.
Span Search via X-Ray¶
When you search for traces with span-level filters, the plugin translates supported predicates into X-Ray filter expressions:
import mlflow
# This filter is translated to an X-Ray annotation query
traces = mlflow.search_traces(
experiment_ids=["1"],
filter_string="span.mlflow.spanType = 'LLM'",
)
The plugin translates equality filters on mapped annotation attributes into X-Ray filter expressions like:
Filters that cannot be translated to X-Ray expressions are applied as post-filters on the results.
Lazy Caching¶
Spans are cached in DynamoDB the first time a trace is accessed via get_trace(). The cache entry is stored as:
The cached spans item contains the full span data as a JSON blob and inherits the same TTL as the trace META item.
On cache, the plugin also denormalizes span attributes onto the trace META item:
span_types-- set of all span types in the tracespan_statuses-- set of all span statusesspan_names-- set of all span names
This enables filtering traces by span attributes without reading the full span data.
Pre-Caching with the CLI¶
For bulk operations or to ensure spans are cached before X-Ray retention expires, use the CLI:
mlflow-dynamodbstore cache-spans \
--table my-table \
--region us-east-1 \
--experiment-id 1 \
--experiment-id 2
The command:
- Queries all trace META items for the given experiments
- Skips traces that already have cached spans
- Calls
get_trace()for each uncached trace, triggering the lazy cache - Reports counts of cached and already-cached traces
Run Before X-Ray Expiry
X-Ray retains trace data for 30 days by default. If your trace retention in DynamoDB is longer than 30 days, run cache-spans periodically to ensure spans are cached before X-Ray deletes them.
Retention Considerations¶
X-Ray and DynamoDB have independent retention policies:
| System | Default Retention | Configurable? |
|---|---|---|
| AWS X-Ray | 30 days | No (AWS-managed) |
| DynamoDB trace items | 30 days | Yes (MLFLOW_DYNAMODB_TRACE_RETENTION_DAYS) |
| DynamoDB cached spans | Same as trace | Inherits trace TTL |
Alignment Matters
If your DynamoDB trace retention exceeds 30 days, you must pre-cache spans before X-Ray deletes them. Otherwise, get_trace() will return traces without span data.
Recommended approach:
- Set DynamoDB trace retention to match or be less than X-Ray's 30 days, or
- Run
cache-spanson a schedule (e.g., weekly) to pre-cache before X-Ray expiry