Using time series as alert thresholds โ Robust Perception | Prometheus Monitoring Experts
This is neat, I _must_ try this out one day to let our dev teams define their own alerting thresholds per service. #sre #alerting #prometheus
matched #edyzbcq score:7.65
Search by:
Search by 3 tags:
So... The last couple of days for me at my day job has been figuring out how to tier our various environments and setup some kind of tiered Prometheus.
Federation was never going to work out so well for me because you can't scrape 10-20k time-series of another Prometheus server, the scraping loop will just timeout consume that amount of data.
Turns out there is a new feature in Prometheus v2.34.0 called "Agent mode" that allows one to run a Prometheus server in such a way that it does two very important things:
- Sends/writes all time-series it locally scrapes via Remote Write to another Prometheus server
- Keeps a WAL (write-ahead-log) to ensure no metric is ever lost and,
- only deletes the WAL entries on a successful write to the remote Prometheus.
This setup allows (along with appropriate External Labels) to have a very nice tiered Prometheus setup where you effectively have a central Prometheus server (with no scraping configuration, except for Prometheus itself) that acts as the Remote receiver for Prometheus agents.
#SRE #DevOps #Prometheus #Monitoring
matched #2n7pifa score:6.62
Search by:
Search by 4 tags: