44 Commits

Author SHA1 Message Date
894944dcb0 Alerting: Refactor remote alertmanager to use Crypto interface (#107228) 2025-06-26 12:39:06 +02:00
5520f48153 Alerting: Use new receiver models for encrypt/decrypt in remote AM (#107042)
Several niche bugs have surfaced as a result of the decrypt code Grafana uses in receivers API being different than what is used to decrypt secrets before sending to remote AM. Example:
- Dingding notifier not abiding by new Patching added to local AM, thus causing missing url errors.

* noop refactor to simplify decryptConfiguration

* Move compat function package

* Use new receiver models to encrypt/decrypt in remote AM
2025-06-20 11:45:01 -04:00
3fe73b8de9 Remote Alertmanager: Send SMTP config (#106337)
* (WIP) Remote Alertmanager: Send SMTP config

* send SMTP configs separately

* bring back deleted fields

* actually send stuff over

* remove redundant type, fix comments

* smtp -> smtpConfig

* also send SmtpFrom an StaticHeaders separately

* tests

* restore defaults.ini
2025-06-13 12:44:39 -03:00
29128f7ae4 Alerting: Copy alertmanager configuration before decrypting (#105271) 2025-05-13 11:46:17 +02:00
638972c787 Alerting: fix tests (#105240) 2025-05-12 14:00:12 +03:00
51d7aa2bef Remote Alertmanager: Configure SMTP From address (#104925)
* Remote Alertmanager: Configure SMTP From address

* include smtp from address in config comparison

* updte tests

* trigger build

* make linter happy

* trigger build

* fix test
2025-05-12 10:37:27 +02:00
57640e40a2 Remote Alertmanager: Consider auto-gen routes when flagging a config as "default" (#105120)
* Remote Alertmanager: Consider auto-gen routes when flagging a config as 'default'

* remove always-nil error from isDefaultConfiguration

* remove unnecessary context.Background() in test

* pass orgID to autogenFn call during AM creation

* fix test

* make update-workspace
2025-05-08 21:04:30 +02:00
371ea5cda7 Alerting: Fix loss of TimeInterval location on remote AM apply (#102510)
* Alerting: Fix loss of TimeInterval location on remote AM apply

deepcopy.Copy does not correctly copy PostableUserConfig because it ignores
unexported fields. As a result, TimeInterval locations default to UTC instead
of retaining their original values.

* make update-workspace
2025-03-20 09:54:33 +01:00
9e408f842c Alerting: Skip sanitizing labels when sending alerts to the remote Alertmanager (#96251)
* Alerting: Skip sanitizing labels when sending alerts to the remote Alertmanager

* fix drone image name
2024-11-13 11:21:44 -03:00
4c15266a77 Alerting: Add X-Remote-Alertmanager header to the remote AM client (#94913) 2024-10-17 22:38:13 +02:00
80611b381c Alerting: Decrypt secure settings when testing receivers in the remote Alertmanager (#93864)
* Alerting: Decrypt secure settings when testing receivers in the remote Alertmanager

* go work sync

* make update-workspace

* point to latest main in grafana/alerting

* unit test

* import definitions only once
2024-09-30 13:28:30 -03:00
4a124469fa Check is config is default by comparing hashes (#92296) 2024-08-23 11:22:06 +02:00
8d725a641c Alerting: Integration test for testing template via remote alertmanager (#92147)
* Add integration test for testing template

* Update drone signature
2024-08-21 13:06:01 +01:00
5487ea444a Alerting: Update remote Alertmanager config marshalling in test (#91791) 2024-08-12 13:45:22 +02:00
e097ffc771 Alerting: Update grafana/alerting dependency (#90365)
* update grafana/alerting to latest main
* update alertmanager to  66ec17e3aa45
2024-07-12 14:05:17 -04:00
3bb861b9f0 Alerting: Remove empty/namespace labels when sending alerts to the remote Alertmanager (#90284)
* Alerting: Remove empty/namespace labels when sending alerts to the remote Alertmanager

* update tests

* fix typo in comment
2024-07-11 15:20:12 +02:00
fce03cd724 Alerting: Send static headers to the remote Alertmanager (#89846) 2024-07-01 17:48:40 +02:00
fe1309dd96 Alerting: Send external URL to the remote Alertmanager (#89701)
* Alerting: Send external URL to the remote Alertmanager

* test that the URL is sent to the remote Alertmanager

* AppURL -> ExternalURL
2024-06-26 14:02:02 +02:00
8eabef1f91 Alerting: Update remote alertmanager to use extended receivers API. (#89253)
* Alerting: Update remote alertmanager to use extended receivers API.

* Update integration test and Mimir image

* Update Mimir image in more places
2024-06-19 12:40:22 +03:00
8491e02caf Alerting: Instrument outbound requests for Loki Historian and Remote Alertmanager with tracing (#89185)
* Add TracedClient

* Handle errors and status codes

* Wire up tracing to normal ASH and loki annotation mapping

* Add tracing to remote alertmanager

* one more spot

* and not or

* More consistency with other grafana traces, lower cardinality name
2024-06-14 13:24:12 -05:00
dd3c3b5857 Alerting: Update grafana/alerting. (#88914) 2024-06-14 09:19:04 +02:00
12d5251c12 Alerting: Alertmanager configuration sync loop (#88822)
* make the config sync happen on each call to ApplyConfig(), fix tests

* send autogen config

* add fake autogen function for tests

* update stale comments, tidy things up, make linter happy

* add auto-gen routes only if the feature toggle is enabled

* remove unnecessary fake autogen function

* throttle configuration syncs

* restore pkg/services/store/entity/sqlstash/sql_storage_server.go

* test sync loop in ApplyConfig, skip invalid autogen routes

* restore conf/defaults.ini

* restore conf/defaults.ini

* avoid skipping invalid auto-gen routes in SaveAndApplyConfig

* test that autogenFn is called and its errors are returned

* add debug message about the sync interval not having elapsed

* collapse two log lines into one
2024-06-12 10:13:34 +02:00
9f9928d41a Alerting: Update grafana/alerting (#88363)
* Alerting: Update grafana/alerting

* make tests pass by implementing yaml unmarshallers and deleting fields with omitempty in their yaml tags

* go mod tidy

* fix tests by implementing not calling GettableApiAlertingConfig.UnmarshalYAML from GettableApiAlertingConfig.UnmarshalJSON

* cleanup, reduce diff

* fix more tests

* update grafana/alerting to latest commit, delete global section from configs in tests

* bring back YAML unmarshaller for GettableApiAlertingConfig

* update alerting package dependency to point to main

* skip test for sns notifier
2024-06-04 20:29:37 +02:00
e41434c332 Alerting: Promote configuration in the remote Alertmanager (#87388) 2024-05-16 12:06:03 +02:00
babfa2beac Alerting: Hook up GMA silence APIs to new authentication handler (#86625)
This PR connects the new RBAC authentication service to existing alertmanager API silence endpoints.
2024-05-03 15:32:30 -04:00
b76a9e4d31 Alerting: Implement GetStatus in the remote Alertmanager struct (#84887)
* Alerting: Implement GetStatus in the remote Alertmanager struct

* update tests

* fix tests, extract AlertmanagerConfig from PostableConfig

* get the remote AM config instead of the Grafana one from the remote AM

* pass grafana AM config in test

* return error in GetStatus instead of logging it (internal AM)
2024-05-03 13:59:02 +02:00
c77ab53819 Alerting: implement SaveAndApplyConfig in the remote Alertmanager struct (#84642)
* implement SaveAndApplyConfig in the remote Alertmanager struct

* remove ID from CreateGrafanaAlertmanagerConfig call

* decrypt, test that we decrypt, refactor

* fix duplicated declaration in test

* rephrase comment, remove unnecessary conversion to slice of bytes

* fix test
2024-04-23 14:37:10 +02:00
309a7e7684 Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct (#85005)
* Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct

* send the hash of the encrypted configuration

* tests, default config hash in AM struct

* add missing default config to test

* restore build directory

* go work file...

* fix broken test

* remove unnecessary conversion to []byte

* go work again...

* make things work again with latest main branch changes

* update error messages in tests for decrypting config
2024-04-19 15:11:07 +02:00
a2ce8fefed Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager (#86451)
* Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager

* remove '-distroless' from mimir image name
2024-04-19 13:04:18 +02:00
f79dd7c7f9 Alerting: Persist silence state immediately on Create/Delete (#84705)
* Alerting: Persist silence state immediately on Create/Delete

Persists the silence state to the kvstore immediately instead of waiting for the
 next maintenance run. This is used after Create/Delete to prevent silences from
 being lost when a new Alertmanager is started before the state has persisted.
 This can happen, for example, in a rolling deployment scenario.

* Fix test that requires real data

* Don't error if silence state persist fails, maintenance will correct
2024-04-09 13:39:34 -04:00
0c3c5c5607 Alerting: Stop persisting silences and nflog to disk (#84706)
With this change, we no longer need to persist silence/nflog states to disk in addition to the kvstore
2024-03-23 00:37:33 +02:00
4ad6d66479 Alerting: Remove ID from UserGrafanaConfig struct (#84602)
* Alerting: Remove ID from UserGrafanaConfig struct

* user custom mimir image withoud id in grafana config

* change mimir image name
2024-03-19 12:47:13 +01:00
c9bb18101c Alerting: Decrypt secrets before sending configuration to the remote Alertmanager (#83640)
* (WIP) Alerting: Decrypt secrets before sending configuration to the remote Alertmanager

* refactor, fix tests

* test decrypting secrets

* tidy up

* test SendConfiguration, quote keys, refactor tests

* make linter happy

* decrypt configuration before comparing

* copy configuration struct before decrypting

* reduce diff in TestCompareAndSendConfiguration

* clean up remote/alertmanager.go

* make linter happy

* avoid serializing into JSON to copy struct

* codeowners
2024-03-19 12:12:03 +01:00
9e78faa7ba Alerting: Add metrics to the remote Alertmanager struct (#79835)
* Alerting: Add metrics to the remote Alertmanager struct

* rephrase http_requests_failed description

* make linter happy

* remove unnecessary metrics

* extract timed client to separate package

* use histogram collector from dskit

* remove weaveworks dependency

* capture metrics for all requests to the remote Alertmanager (both clients)

* use the timed client in the MimirAuthRoundTripper

* HTTPRequestsDuration -> HTTPRequestDuration, clean up mimir client factory function

* refactor

* less git diff

* gauge for last readiness check in seconds

* initialize LastReadinesCheck to 0, tweak metric names and descriptions

* add counters for sync attempts/errors

* last config sync and last state sync timestamps (gauges)

* change latency metric name

* metric for remote Alertmanager mode

* code review comments

* move label constants to metrics package
2024-01-10 11:18:24 +01:00
a77ba40ed4 Alerting: Use the forked Alertmanager for remote secondary mode (#79646)
* (WIP) Alerting: Use the forked Alertmanager for remote secondary mode

* fall back to using internal AM in case of error

* remove TODOs, clean up .ini file, add orgId as part of remote AM config struct

* log warnings and errors, fall back to remoteSecondary, fall back to internal AM only

* extract logic to decide remote Alertmanager mode to a separate function, switch on mode

* tests

* make linter happy

* remove func to decide remote Alertmanager mode

* refactor factory function and options

* add default case to switch statement

* remove ineffectual assignment
2023-12-21 15:26:31 +01:00
9945514baa Alerting: Validate configuration for the remote Alertmanager struct (#79691)
* Alerting: Validate configuration for the remote Alertmanager struct

* add TenantID to test

* add OrgID to config struct in tests
2023-12-19 18:41:48 +01:00
cc3c0a2cc2 Alerting: Refactor readiness check (#78799)
* Alerting: Refactor readiness check

Moves the readiness check to the mimir client and removes the need to assert that we have senders - it already has a queue and can hold notifications until we're ready to send them.
---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-12-12 15:34:54 +00:00
73776f37eb Alerting: Send state to the remote Alertmanager (#78538)
* Alerting: Introduce a Mimir client as part of the Remote Alertmanager

Mimir client that understands the new APIs developed for mimir. Very much a WIP still.

* more wip

* appease the linter

* more linting

* add more code

* get state from kvstore, encode, send

* send state to the remote Alertmanager, extract fullstate logic into its own function

* pass kvstore to remote.NewAlertmanager()

* refactor

* add fake kvstore to tests

* tests

* use FileStore to get state

* always log 'completed state upload'

* refactor compareRemoteConfig

* base64-encode the state in the file store

* export silences and nflog filenames, refactor

* log 'completed state/config upload...' regardless of outcome

* add values to the state store in tests

* address code review comments

* log error from filestore

---------

Co-authored-by: gotjosh <josue.abreu@gmail.com>
2023-11-29 12:49:39 +01:00
01d274852c Alerting: Add GetFullState method to FileStore (#78701)
* Alerting: Add GetFullState method to FileStore

* make tests compile, create stateStore in NewAlertmanager

* return errors instead of logging, accept an arbitrary number of strings

* make NewAlertmanager() accept a stateStore
2023-11-28 15:34:45 +01:00
8120306fea Remote Alertmanager(refactor): Only parse the URL once (#78631)
* Remote Alertmanager(refactor): Only parse the URL once

Exactly what it says in the tin.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* use the existing tests

Signed-off-by: gotjosh <josue.abreu@gmail.com>

---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-11-24 11:05:13 +00:00
23fe8f4e9c Alerting: Introduce a Mimir client as part of the Remote Alertmanager (#78357)
* Alerting: Introduce a Mimir client as part of the Remote Alertmanager

This is our first attempt at making Grafana communicate use Mimir as a backend - it uses a new set of APIs that we've developed on the Mimir side to upload the grafana configuration and alertmanager state so that it can then be ported over.

Codewise, we've introduced a couple of things:

A client to isolate in its own package all the communication that happens with Mimir
A few changes to the remote/alertmanager to include uploading the configuration and state when it starts
A few refactors that align a bit better with the design approach that we're thinking
An integration tests again these newly developed APIs using a custom image

---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2023-11-23 16:59:36 +00:00
a6b9b27673 Alerting: Remove OrgID() from the Alertmanager interface (#77398) 2023-10-31 10:58:47 +01:00
01add144b8 Alerting: Send alerts to the remote Alertmanager (#77034)
* Alerting: Rename remote.ExternalAlertmanager to remote.Alertmanager

* Alerting: Send alerts to the remote Alertmanager

* add ticker to readiness check, add tests

* use options when creating a new sender.ExternaAlertmanager

* unexport defaultMaxQueueCapacity

* delete unused defaultConfig field

* add debug log line when sending alerts to the remote alertmanager

* move and refactor readiness check

* update tests to not include defaultConfig
2023-10-25 11:52:48 +02:00
488a60aee6 Alerting: Rename remote.ExternalAlertmanager to remote.Alertmanager (#76956) 2023-10-23 15:37:14 +02:00