Files
Samuel Clay 7bc1b7144e Add database replication lag monitoring for Redis, Postgres, and Mongo
Implements Prometheus metrics and Grafana dashboard/alerts for monitoring
replication lag across all database secondaries. Fixes broken Mongo endpoint
that was trying to iterate over scalar values.

- Add /replication-lag/ endpoint to flask_metrics_redis.py
- Create new flask_metrics_postgres.py service for Postgres replication metrics
- Fix and improve Mongo /mongo-replset-lag/ endpoint with per-secondary metrics
- Enable Mongo replication lag scraping in Prometheus (was commented out)
- Add Prometheus scrape jobs for Redis and Postgres replication lag
- Create unified Grafana replication dashboard with gauges and time series
- Add Grafana alerting rules for lag >5s and disconnected secondaries
- Update Ansible to deploy flask_metrics on Postgres servers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 14:41:33 -08:00
..