244 Commits

Author SHA1 Message Date
894944dcb0 Alerting: Refactor remote alertmanager to use Crypto interface (#107228) 2025-06-26 12:39:06 +02:00
478f9bf597 Alerting: Emit metrics from prometheus state history backend (#107121)
Alerting: Emit metrics from prometheus historian backend
2025-06-24 18:27:52 +02:00
78bec77ca1 Alerting: Fix NewDatasourceWriter initialization (#106869)
Fix NewDatasourceWriter initialization
2025-06-18 07:53:48 +02:00
ad683f83ff Alerting: Add state history backend to write ALERTS metric (#104361)
**What is this feature?**

This PR implements a new Prometheus historian backend that allows Grafana alerting to write alert state history as Prometheus-compatible `ALERTS` metrics to remote Prometheus-compatible data sources.

The metric includes a few additional labels:

* `grafana_alertstate`: Grafana's full alert state, more granular than Prometheus.
* `grafana_rule_uid`: Grafana's alert rule UID.

Grafana states are included in the `grafana_alertstate` label also mapped to Prometheus-compatible `alertstate` values:

| Grafana alert state | `alertstate`          | `grafana_alertstate`  |
|---------------------|-----------------------|-----------------------|
| `Alerting`          | `firing`              | `alerting`            |
| `Recovering`        | `firing`              | `recovering`          |
| `Pending`           | `pending`             | `pending`             |
| `Error`             | `firing`              | `error`               |
| `NoData`            | `firing`              | `nodata`              |
| `Normal`            | _(no metric emitted)_ | _(no metric emitted)_ |
2025-06-18 07:17:57 +02:00
e92baba748 Alerting: Support PDC in Grafana-managed recording rules (#106677) 2025-06-17 11:46:34 +02:00
3fe73b8de9 Remote Alertmanager: Send SMTP config (#106337)
* (WIP) Remote Alertmanager: Send SMTP config

* send SMTP configs separately

* bring back deleted fields

* actually send stuff over

* remove redundant type, fix comments

* smtp -> smtpConfig

* also send SmtpFrom an StaticHeaders separately

* tests

* restore defaults.ini
2025-06-13 12:44:39 -03:00
e256f2d5e2 Alerting: Enable recording rules by default (#105603) 2025-06-02 10:56:05 +02:00
04f7f2451d Alerting: Add custom headers support for recording rules custom datasource writer (#105618)
Alerting: Add custom headers support for recording rules datasource writer
2025-05-19 19:42:22 +02:00
c58ac15031 Alerting: Remove grafanaManagedRecordingRules feature flag (#105569) 2025-05-19 12:15:49 +02:00
6c3d89f390 Remote Alertmanager: Add timeouts to the HTTP client (#105279)
* Remote Alertmanage: Add timeouts to the HTTP client

* code review suggestions
2025-05-13 13:25:56 +02:00
15be9861d0 Remote Alertmanager: Remove code for remote only mode (#105184) 2025-05-12 14:25:43 +02:00
51d7aa2bef Remote Alertmanager: Configure SMTP From address (#104925)
* Remote Alertmanager: Configure SMTP From address

* include smtp from address in config comparison

* updte tests

* trigger build

* make linter happy

* trigger build

* fix test
2025-05-12 10:37:27 +02:00
0ceea29787 Alerting: Remove alertingSimplifiedRouting feature toggle (#104980)
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2025-05-09 16:30:56 +03:00
57640e40a2 Remote Alertmanager: Consider auto-gen routes when flagging a config as "default" (#105120)
* Remote Alertmanager: Consider auto-gen routes when flagging a config as 'default'

* remove always-nil error from isDefaultConfiguration

* remove unnecessary context.Background() in test

* pass orgID to autogenFn call during AM creation

* fix test

* make update-workspace
2025-05-08 21:04:30 +02:00
5a589bb51a Alerting: Enable the remote Alertmanager feature using only feature toggles (#101410)
* Alerting: Enable the remote Alertmanager feature using only feature toggles

* Trigger build
2025-04-30 12:18:47 +02:00
3a054d5e00 Alerting: Add FiredAt field to State (#104046)
* Add FiredAt field to the State

* Update featuretoggle files

* Fix lint errors

* Fix test compilation

* Remove random print line + formatting

* Address PR comments
2025-04-22 12:16:38 +01:00
a8f60de620 Alerting: Remove feature toggles relating to Loki Alert State History (#103540)
* Remove feature toggles relating to Loki Alert State History
2025-04-08 09:50:27 -04:00
e30034a42a Alerting: Remove feature flag alertingNoDataErrorExecution (#102156)
* remove feature flag

* remove feature flag in state manager

* make sure no data with empty results is handled

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>

---------

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-03-14 14:51:58 -04:00
bbab62ce39 Alerting: Select remote write path dependent on metrics backend type. (#101891)
The remote write path differs based on whether the data source is actually
Prometheus, Mimir, Cortex, or an older version of Cortex. We do not want
users to have to specify the path, so this change determines the path as
best it can.

It may be in the future we have to make this configurable per-datasource
to cater for setups where it's impossible to determine the correct path.
2025-03-11 13:45:16 +01:00
14ebec527c Alerting: Allow selection of recording rule write target on per-rule basis. (#101778)
* Alerting: Allow selection of recording rule write target on per-rule basis.

Introduces a new feature flag (`grafanaManagedRecordingRulesDatasources`),
disabled by default, to enable the ability to write recording rules data using
data source settings, and selecting the data source to use on a per-rule basis.

To cope with the scenario of users upgrading, a configuration file option
allows setting the default data source to use, if none is specified in the rule,
emulating the behaviour of recording rules without the flag enabled.

* Lint

* Update conf/sample.ini

Co-authored-by: Alexander Akhmetov <me@alx.cx>

---------

Co-authored-by: Alexander Akhmetov <me@alx.cx>
2025-03-07 14:30:40 +01:00
eed07cf503 Alerting: Refactor NewPrometheusWriter function. (#101706)
* Alerting: Refactor NewPrometheusWriter function.

In order to re-use PrometheusWriter, changing the function take a
PrometheusWriterConfig instead of RecordingRulesSettings, and adapt the old
interface onto the new interface.

* Make linter happy
2025-03-06 16:13:22 +01:00
807f94b2c7 Alerting: Remove feature toggle alertingNoNormalState (#99905) 2025-02-03 17:32:50 +02:00
d6c1e3bb45 Alerting: Use org store to read organization IDs (#99938) 2025-02-03 15:38:16 +01:00
f45265b5f7 Alerting: Read from both proto and simple DB instance stores on startup (#99855) 2025-01-31 23:34:00 +01:00
d71904cb27 Alerting: Expose updated_by in rules GET APIs (#99525)
---------

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-01-27 14:31:40 -05:00
cb43f4b696 Alerting: Add compressed protobuf-based alert state storage (#99193) 2025-01-27 18:47:33 +01:00
1f8f9a45d7 Alerting: Add state_periodic_save_batch_size config option (#98019)
* Alerting: Add state_periodic_save_batch_size config option

---------

Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>
2024-12-16 15:30:38 +01:00
1fdc48faba Alerting: Make context deadline on AlertNG service startup configurable (#96053)
* Make alerting context deadline configurable

* Remove debug logs

* Change default timeout

* Update tests
2024-11-07 18:23:55 +00:00
d0481bb568 Alerting: Refactor state manager Warm method to accept instance store as an argument (#95098) 2024-10-22 09:45:50 +02:00
18e66d22b1 Alerting: Add more tracing for receivers service (#94572) 2024-10-11 11:41:13 -04:00
393faa8732 Alerting: Move rule evaluation status logic out of prometheus API and into scheduler (#89141)
* Add health fields to rules and an aggregator method to the scheduler

* Move health, last error, and last eval time in together to minimize state processing

* Wire up a readonly scheduler to prom api

* Extract to exported function

* Use health in api_prometheus and fix up tests

* Rename health struct to status

* Fix tests one more time

* Several new tests

* Handle inactive rules

* Push state mapping into state manager

* rename to StatusReader

* Rectify cyclo complexity rebase

* Convert existing package local status implementation to models one

* fix tests

* undo RuleDefs rename
2024-09-30 16:52:49 -05:00
e86929eb0a Alerting: Managed receiver resource permission in config api (#93632)
* Alerting: Managed receiver resource permission in config api
2024-09-25 09:39:36 -04:00
e699348d39 Alerting: Managed receiver resource permission in provisioning (#93631)
* Alerting: Managed receiver resource permission in provisioning
2024-09-23 17:52:14 -04:00
6652233493 Alerting: Managed receiver resource permission in receiver_svc (#93556)
* Alerting: Managed receiver resource permission in receiver_svc
2024-09-23 21:12:25 +03:00
1ede1e32b8 Alerting: Receiver resource permissions service (#93552) 2024-09-20 18:31:42 -04:00
32f06c6d9c Alerting: Receiver API complete core implementation (#91738)
* Replace global authz abstraction with one compatible with uid scope

* Replace GettableApiReceiver with models.Receiver in receiver_svc

* GrafanaIntegrationConfig -> models.Integration

* Implement Create/Update methods

* Add optimistic concurrency to receiver API

* Add scope to ReceiversRead & ReceiversReadSecrets

migrates existing permissions to include implicit global scope

* Add receiver create, update, delete actions

* Check if receiver is used by rules before delete

* On receiver name change update in routes and notification settings

* Improve errors

* Linting

* Include read permissions are requirements for create/update/delete

* Alias ngalert/models to ngmodels to differentiate from v0alpha1 model

* Ensure integration UIDs are valid, unique, and generated if empty

* Validate integration settings on create/update

* Leverage UidToName to GetReceiver instead of GetReceivers

* Remove some unnecessary uses of simplejson

* alerting.notifications.receiver -> alerting.notifications.receivers

* validator -> provenanceValidator

* Only validate the modified receiver

stops existing invalid receivers from preventing modification of a valid
receiver.

* Improve error in Integration.Encrypt

* Remove scope from alert.notifications.receivers:create

* Add todos for receiver renaming

* Use receiverAC precondition checks in k8s api

* Linting

* Optional optimistic concurrency for delete

* make update-workspace

* More specific auth checks in k8s authorize.go

* Add debug log when delete optimistic concurrency is skipped

* Improve error message on authorizer.DecisionDeny

* Keep error for non-forbidden errutil errors
2024-08-26 10:47:53 -04:00
ac5ebe6e4d Alerting: Add enablement flag for recording rules (#92032)
* Add enablement flag

* Disable if toggle not enabled
2024-08-19 12:01:00 -05:00
b2eeb0dd6e Alerting: update rule versions on folder move (#88376)
* Alerting: update rule versions on folder move (#88361)
* Add tracing to folder.Move and folder.Update
2024-08-13 12:26:26 +02:00
53cfdf0ef8 Alerting: Remove option to return settings from api/v1/receivers and restrict provisioning action access (#90861)
* Remove provisioning action access to v1/receivers api

* Separate ListOnly functionality to its own method without decryption
2024-08-05 11:49:23 -04:00
a1ee84f757 Alerting: Remove duplicate tracing middleware from prom writer (#91353)
Remove duplicate tracing middleware from prom writer
2024-08-01 11:57:14 -04:00
4c71cadd5f Alerting: Detach condition validator from condition evaluator (#91150)
* Detach validator from evaluator

* Drop unnecessary interface and type
2024-07-30 10:55:37 -05:00
62f67e38b8 Alerting: Implement receiver auth service (#90857) 2024-07-29 15:49:10 -04:00
a1f0b599a7 Alerting: Refactor receiver_svc and provisioning config store into legacy_storage package (#90856)
* Add more receivers api tests

* Move provisioning config store to new legacy_storage package
2024-07-26 17:45:33 -04:00
418b077c59 Alerting: Integration testing for recording rules including writes (#90390)
* Add success case and tests for writer using metrics

* Use testable version of clock

* Assert a specific series was written

* Fix linter

* Fix manually constructed writer
2024-07-18 17:14:49 -05:00
0e269db8a9 Alerting: Expose recordingWriter on ngalert (#90573)
Expose recordingWriter on ngalert
2024-07-18 13:24:06 -05:00
970cafa20f Alerting: Time interval Delete API to check for usages in alert rules (#90500)
* Check if a time interval is used in alert rules before deleting it
* Add time interval to parameters of ListAlertRulesQuery and ListNotificationSettings of DbStore

== Refacorings == 
* refactor isMuteTimeInUse to accept a single route
* update getMuteTiming to not return err
* update delete to get the mute timing from config first
2024-07-17 10:53:54 -04:00
fce03cd724 Alerting: Send static headers to the remote Alertmanager (#89846) 2024-07-01 17:48:40 +02:00
06d5850396 Alerting: Update alerting state history API to authorize access using RBAC (#89579)
* add method CanReadAllRules to rule authorization service

* add alias type Namespace for Folder in ngalert's models package. It implements the Namespacer interface that is used by authz logic

* update state history's backends to authorize access to rules.
* update Loki to add folders UIDs to query. 
    * Update BuildLogQuery to drop filter by folders if it's too long and fall back to in-memory filtering.
2024-06-26 10:25:37 -04:00
fe1309dd96 Alerting: Send external URL to the remote Alertmanager (#89701)
* Alerting: Send external URL to the remote Alertmanager

* test that the URL is sent to the remote Alertmanager

* AppURL -> ExternalURL
2024-06-26 14:02:02 +02:00
fcfa89f864 Alerting: Implement Prometheus remote write for recording rules (#89189)
* Fix timestamp recorded by rule

* Implement prometheus remote write

* Create http client instead of transport

* Address PR comments

* Remove status code label
2024-06-25 17:23:42 +03:00