This adds the ability to filter rules with the prometheus compatible api using:
1. `receiver_name` to filter by contact point name
2. `health` to filter by the health status of the rule (one of `ok`, `error`, `nodata`, or `unknown`)
This also ensures that groups with no rules (due to filters) are not returned.
* Alerting: Optimize prometheus api permission checks
This improves the performance of the Prometheus API by performing the permission checks for rule read permission in a folder upfront, rather than checking permissions for each rule group individually. This reduces the number of permission checks and should speed up the API response time.
* refactor vars
---------
Co-authored-by: Konrad Lalik <konradlalik@gmail.com>
* Alerting: Add extended definition to prometheus alert rules api
This adds `isPaused` and `notificationSettings` to the paginated rules api to enable the paginated view of GMA rules.
refactor: make alert rule status and state retrieval extensible
This lets us get status from other sources than the local ruler.
* update swagger spec
* add safety checks in test
* Move filtering code to generators for performance reasons
Discarding rules and groups early in the iterable chain limits the number of promises we need to wait for which improves performance significantly
* Add error handling for generators
* Add support for data source filter for GMA rules
* search WIP fix
* Fix datasource filter
* Move filtering back to filtered rules hook, use paged groups for improved performance
* Add queriedDatasources field to grafana managed rules and update filtering logic to rely on it
- Introduced a new field `queriedDatasources` in the AlertingRule struct to track data sources used in rules.
- Updated the Prometheus API to populate `queriedDatasources` when creating alerting rules.
- Modified filtering logic in the ruleFilter function to utilize the new `queriedDatasources` field for improved data source matching.
- Adjusted related tests to reflect changes in rule structure and filtering behavior.
* Add FilterView performance logging
* Improve GMA Prometheus types, rename queried datasources property
* Use custom generator helpers for flattening and filtering rule groups
* Fix lint errors, add missing translations
* Revert test condition
* Refactor api prom changes
* Fix lint errors
* Update backend tests
* Refactor rule list components to improve error handling and data source management
- Enhanced error handling in FilterViewResults by logging errors before returning an empty iterable.
- Simplified conditional rendering in GrafanaRuleLoader for better readability.
- Updated data source handling in PaginatedDataSourceLoader and PaginatedGrafanaLoader to use new individual rule group generator.
- Renamed toPageless function to toIndividualRuleGroups for clarity in prometheusGroupsGenerator.
- Improved filtering logic in useFilteredRulesIterator to utilize a dedicated function for data source type validation.
- Added isRulesDataSourceType utility function for better data source type checks.
- Removed commented-out code in PromRuleDTOBase for cleaner interface definition.
* Fix abort controller on FilterView
* Improve generators filtering
* fix abort controller
* refactor cancelSearch
* make states exclusive
* Load full page in one loadResultPage call
* Update tests, update translations
* Refactor filter status into separate component
* hoist hook
* Use the new function for supported rules source type
---------
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
What is this feature?
This PR introduces a new alert rule configuration option, keep_firing_for (Prometheus documentation).
keep_firing_for prevents alerts from resolving immediately after the alert condition returns to normal. Instead, they transition into a "Recovering" state and are not considered resolved by the Alertmanager. Once the recovery period ends (or after the next evaluation if it is bigger than keep_firing_for), the alert transitions to "Normal" if it doesn't start alerting again:
Before
+----------+ +----------+
| Alerting |---->| Normal |
+----------+ +----------+
-----
After
+----------+ +------------+ +----------+
| Alerting |----->| Recovering |---->| Normal |
+----------+ +------------+ +----------+
Why do we need this feature?
This feature prevents flapping alerts by adding a recovery period. This helps avoid false resolutions caused by brief alert