**What is this feature?**
This PR implements a new Prometheus historian backend that allows Grafana alerting to write alert state history as Prometheus-compatible `ALERTS` metrics to remote Prometheus-compatible data sources.
The metric includes a few additional labels:
* `grafana_alertstate`: Grafana's full alert state, more granular than Prometheus.
* `grafana_rule_uid`: Grafana's alert rule UID.
Grafana states are included in the `grafana_alertstate` label also mapped to Prometheus-compatible `alertstate` values:
| Grafana alert state | `alertstate` | `grafana_alertstate` |
|---------------------|-----------------------|-----------------------|
| `Alerting` | `firing` | `alerting` |
| `Recovering` | `firing` | `recovering` |
| `Pending` | `pending` | `pending` |
| `Error` | `firing` | `error` |
| `NoData` | `firing` | `nodata` |
| `Normal` | _(no metric emitted)_ | _(no metric emitted)_ |
* Remote Alertmanager: Consider auto-gen routes when flagging a config as 'default'
* remove always-nil error from isDefaultConfiguration
* remove unnecessary context.Background() in test
* pass orgID to autogenFn call during AM creation
* fix test
* make update-workspace
* Add FiredAt field to the State
* Update featuretoggle files
* Fix lint errors
* Fix test compilation
* Remove random print line + formatting
* Address PR comments
* remove feature flag
* remove feature flag in state manager
* make sure no data with empty results is handled
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
---------
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
The remote write path differs based on whether the data source is actually
Prometheus, Mimir, Cortex, or an older version of Cortex. We do not want
users to have to specify the path, so this change determines the path as
best it can.
It may be in the future we have to make this configurable per-datasource
to cater for setups where it's impossible to determine the correct path.
* Alerting: Allow selection of recording rule write target on per-rule basis.
Introduces a new feature flag (`grafanaManagedRecordingRulesDatasources`),
disabled by default, to enable the ability to write recording rules data using
data source settings, and selecting the data source to use on a per-rule basis.
To cope with the scenario of users upgrading, a configuration file option
allows setting the default data source to use, if none is specified in the rule,
emulating the behaviour of recording rules without the flag enabled.
* Lint
* Update conf/sample.ini
Co-authored-by: Alexander Akhmetov <me@alx.cx>
---------
Co-authored-by: Alexander Akhmetov <me@alx.cx>
* Alerting: Refactor NewPrometheusWriter function.
In order to re-use PrometheusWriter, changing the function take a
PrometheusWriterConfig instead of RecordingRulesSettings, and adapt the old
interface onto the new interface.
* Make linter happy
* Add health fields to rules and an aggregator method to the scheduler
* Move health, last error, and last eval time in together to minimize state processing
* Wire up a readonly scheduler to prom api
* Extract to exported function
* Use health in api_prometheus and fix up tests
* Rename health struct to status
* Fix tests one more time
* Several new tests
* Handle inactive rules
* Push state mapping into state manager
* rename to StatusReader
* Rectify cyclo complexity rebase
* Convert existing package local status implementation to models one
* fix tests
* undo RuleDefs rename
* Replace global authz abstraction with one compatible with uid scope
* Replace GettableApiReceiver with models.Receiver in receiver_svc
* GrafanaIntegrationConfig -> models.Integration
* Implement Create/Update methods
* Add optimistic concurrency to receiver API
* Add scope to ReceiversRead & ReceiversReadSecrets
migrates existing permissions to include implicit global scope
* Add receiver create, update, delete actions
* Check if receiver is used by rules before delete
* On receiver name change update in routes and notification settings
* Improve errors
* Linting
* Include read permissions are requirements for create/update/delete
* Alias ngalert/models to ngmodels to differentiate from v0alpha1 model
* Ensure integration UIDs are valid, unique, and generated if empty
* Validate integration settings on create/update
* Leverage UidToName to GetReceiver instead of GetReceivers
* Remove some unnecessary uses of simplejson
* alerting.notifications.receiver -> alerting.notifications.receivers
* validator -> provenanceValidator
* Only validate the modified receiver
stops existing invalid receivers from preventing modification of a valid
receiver.
* Improve error in Integration.Encrypt
* Remove scope from alert.notifications.receivers:create
* Add todos for receiver renaming
* Use receiverAC precondition checks in k8s api
* Linting
* Optional optimistic concurrency for delete
* make update-workspace
* More specific auth checks in k8s authorize.go
* Add debug log when delete optimistic concurrency is skipped
* Improve error message on authorizer.DecisionDeny
* Keep error for non-forbidden errutil errors
* Add success case and tests for writer using metrics
* Use testable version of clock
* Assert a specific series was written
* Fix linter
* Fix manually constructed writer
* Check if a time interval is used in alert rules before deleting it
* Add time interval to parameters of ListAlertRulesQuery and ListNotificationSettings of DbStore
== Refacorings ==
* refactor isMuteTimeInUse to accept a single route
* update getMuteTiming to not return err
* update delete to get the mute timing from config first
* add method CanReadAllRules to rule authorization service
* add alias type Namespace for Folder in ngalert's models package. It implements the Namespacer interface that is used by authz logic
* update state history's backends to authorize access to rules.
* update Loki to add folders UIDs to query.
* Update BuildLogQuery to drop filter by folders if it's too long and fall back to in-memory filtering.