Monitoring
PolySimulator exposes health endpoints and Prometheus metrics for production observability.
Health Endpoints
Three health endpoints at increasing levels of detail:
GET /v1/health
Basic liveness probe — returns immediately.
curl http://localhost:8000/v1/health
{
"status": "ok",
"version": "1.0.0"
}
GET /v1/health/ready
Readiness probe — checks database and Redis connectivity.
curl http://localhost:8000/v1/health/ready
{
"status": "ok",
"timestamp": "2025-01-15T10:30:00Z",
"checks": {
"database": { "status": "ok", "latency_ms": 1.2 },
"redis": { "status": "ok", "latency_ms": 0.8, "cached_markets": 150 }
}
}
Use /v1/health for Kubernetes liveness probes and /v1/health/ready for readiness probes.
GET /v1/status
Detailed system status including market sync and price feed info.
curl http://localhost:8000/v1/status
{
"status": "healthy",
"trading_mode": "virtual",
"markets_synced": 847,
"hot_markets_tracked": 142,
"price_feed_active": true,
"last_sync": "2025-01-15T10:28:00Z",
"uptime_seconds": 86400
}
Prometheus Metrics
The API exposes a Prometheus-compatible metrics endpoint:
curl http://localhost:8000/metrics
Available Metrics
| Metric | Type | Description |
|---|
http_requests_total | Counter | Total HTTP requests by method, path, status |
http_request_duration_seconds | Histogram | Request latency distribution |
orders_placed_total | Counter | Total orders placed by side and type |
orders_filled_total | Counter | Total orders filled |
active_websocket_connections | Gauge | Current WebSocket connections |
price_cache_hits_total | Counter | Redis cache hits |
price_cache_misses_total | Counter | Redis cache misses |
market_sync_duration_seconds | Histogram | Market sync job duration |
Grafana Dashboard
Import these metrics into Grafana for visual monitoring:
# Request rate (5m average)
rate(http_requests_total[5m])
# P95 latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
# Order fill rate
rate(orders_filled_total[1h])
Alerting Rules
Recommended alert thresholds:
| Condition | Threshold | Severity |
|---|
| API down | /v1/health fails for 30s | Critical |
| Database unreachable | /v1/health/ready reports DB down | Critical |
| Redis unreachable | /v1/health/ready reports Redis down | Warning |
| High error rate | >5% 5xx responses (5m window) | Warning |
| High latency | P95 > 2s (5m window) | Warning |
| Price feed stale | No updates for > 60s | Warning |
Logging
The API uses structured JSON logging in production:
{
"timestamp": "2025-01-15T10:30:00Z",
"level": "INFO",
"message": "Order placed",
"order_id": "abc-123",
"market_id": "0x1234...",
"side": "BUY",
"quantity": "10",
"price": "0.42",
"latency_ms": 23
}
Configure log level via environment:
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
Next Steps