40 Commits

Author SHA1 Message Date
a77ba40ed4 Alerting: Use the forked Alertmanager for remote secondary mode (#79646)
* (WIP) Alerting: Use the forked Alertmanager for remote secondary mode

* fall back to using internal AM in case of error

* remove TODOs, clean up .ini file, add orgId as part of remote AM config struct

* log warnings and errors, fall back to remoteSecondary, fall back to internal AM only

* extract logic to decide remote Alertmanager mode to a separate function, switch on mode

* tests

* make linter happy

* remove func to decide remote Alertmanager mode

* refactor factory function and options

* add default case to switch statement

* remove ineffectual assignment
2023-12-21 15:26:31 +01:00
c46da8ea9b Alerting: Update alerting package and imports from cluster and clusterpb (#79786)
* Alerting: Update alerting package

* update to latest commit

* alias for imports
2023-12-21 12:34:48 +01:00
0c9356a3c7 Unified Alerting: Set max_attempts to 1 by default (#79095)
* Unified Alerting: Set `max_attempts` to 1 by default

The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it.

I see two cons with this approach:

- Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating.
- Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-12-05 17:42:34 +00:00
5a80962de9 Alerting: Add clean_upgrade config and deprecate force_migration (#78324)
* Alerting: Add clean_upgrade config and deprecate force_migration

Upgrading to UA and rolling back will no longer delete any data by default. 
Instead, each set of tables will remain unchanged when switching between 
legacy and UA. As such, the force_migration config has been deprecated 
and no extra configuration is required to roll back to legacy anymore.

If clean_upgrade is set to true when upgrading from legacy alerting to Unified
Alerting, grafana will first delete all existing Unified Alerting resources,
thus re-upgrading all organizations from scratch. If false or unset,
organizations that have previously upgraded will not lose their existing Unified
 Alerting data when switching between legacy and Unified Alerting.

 Similar to force_migration, it should be kept false when not needed as it may
 cause unintended data-loss if left enabled.

---------

Co-authored-by: Christopher Moyer <35463610+chri2547@users.noreply.github.com>
2023-11-30 11:01:11 -05:00
7a34cdb3a2 Alerting: Add configuration options to migrate to an external Alertmanager (#71318)
* add configuration options to .ini file and parse them

* updates on config options, add external AM config to the main config struct

* separate external AM configs from general alerting configs, naming

* comments about usage of tenantID in basic auth & not using config options yet
2023-09-05 11:24:35 -03:00
dfba94e052 Alerting: Limit redis pool size to 5 and make configurable (#74057)
* Limit redis pool size to 5 and expose it in config ini

* Coerce negative pool sizes to the default
2023-08-29 14:59:12 -05:00
c7598cc6fb Alerting: Add ability to control scheduler tick interval via config (#71980)
* add ability to control scheduler interval via config
* add feature flag `configurableSchedulerTick`
2023-07-26 12:44:12 -04:00
7edbe72483 Alerting: Support concurrent queries for saving alert instances (#70525)
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.
2023-06-23 11:36:07 +01:00
8bb62a8316 Alerting: Add option for memberlist label (#67982) 2023-05-09 10:32:23 +02:00
bc11a484ed Alerting: Add support for running HA using Redis (#65267)
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
2023-04-19 17:05:26 +02:00
b2abb63286 Alerting: Introduce proper feature toggles for common state history backend combinations (#65497)
* define 3 feature toggles for rollout phases

* Pass feature toggles along

* Implement first feature toggle

* Try a different strategy with fall-throughs to specific configurations

* Apply toggle overrides once outside of backend composition

* Emit log messages when we coerce backends

* Run code generator for feature toggle files

* Improve wording in flag descs

* Re-run generator

* Use code-generated constants instead of plain strings

* Use converted enum values rather than strings for pre-parsing
2023-03-30 13:53:21 -05:00
a31672fa40 Alerting: Create new state history "fanout" backend that dispatches to multiple other backends at once (#64774)
* Rename RecordStatesAsync to Record

* Rename QueryStates to Query

* Implement fanout writes

* Implement primary queries

* Simplify error joining

* Add test for query path

* Add tests for writes and error propagation

* Allow fanout backend to be configured

* Touch up log messages and config validation

* Consistent documentation for all backend structs

* Parse and normalize backend names more consistently against an enum

* Touch-ups to documentation

* Improve clarity around multi-record blocking

* Keep primary and secondaries more distinct

* Rename fanout backend to multiple backend

* Simplify config keys for multi backend mode
2023-03-17 12:41:18 -05:00
e7ace4ed62 Alerting: Allow separate read and write path URLs for Loki state history (#62268)
Extract config parsing and add tests
2023-01-30 16:30:05 -06:00
b4682fe3cb Alerting: Configurable externalLabels for Loki state history (#62404)
* Add config option for external labels

* Remove redundant nilcheck
2023-01-30 14:24:45 -06:00
aebcecf538 Chore: Fix goimports grouping in other backend platform packages (#62422)
* fix goimports

* fix goimports order

* fix goimports order

* fix goimports order

* fix goimports order

* fix goimports order
2023-01-30 08:26:42 +00:00
44b11d3228 Alerting: support basic auth for the state history loki client (#61696) 2023-01-18 20:24:40 +01:00
1ac89ea040 Alerting: Add client configuration for remote Loki historian backend and test connection (#61114)
* Create loki client type and ping method

* Expose TestConnection on client

* Configure and ping Loki URL

* Close response body reader if present

* Add 30 second timeout

* Remove duplicate close
2023-01-17 13:58:52 -06:00
eb960d9725 Alerting: Add un-documented toggle for changing state history backend, add shells for remote loki and sql (#61072)
* Add toggle for state history backend and shells

* Extract some shared logic and add tests
2023-01-06 12:06:01 -06:00
8c3a5f6da0 Alerting: Allow state history to be disabled through configuration (#61006)
* Add configuration option for if state history should be enabled

* Inject no-op when history is disabled
2023-01-05 12:21:07 -06:00
9af7adef76 Alerting: Support customizable timeout for screenshots (#60981)
This commit adds a customizable timeout for screenshots called
capture_timeout. The default value is 10 seconds, and the maximum
value is 30 seconds. This timeout should be less than the minimum
Interval of all Evaluation Groups to avoid back pressure on alert
rule evaluation.
2023-01-05 16:07:46 +00:00
c02003af3c Refactor time durations (#58484)
This change uses `time.Second` in place of `1000 * time.Millisecond` and `time.Minute` in place of `60*time.Second`.
2022-11-22 15:09:15 +08:00
28dd413c1d Alerting: Add config disabled_labels to disable reserved labels (#51832)
* Alerting: Add config disabled_labels to disable reserved labels

[unified_alerting.reserved_labels]
disabled_labels

* Replace IsGrafanaFolderDisabled with more generic IsReservedLabelDisabled

* Simplify SchedulerCfg by including UnifiedAlertingSettings
2022-07-11 12:41:40 -04:00
434e94ef2b Alerting: Update default route groupBy to [grafana_folder, alertname] (#50052)
* Alerting: Update default route groupBy to [grafana_folder, alertname]

Default group by for new routes and migrations is now [grafana_folder, alertname]
2022-07-11 12:24:43 -04:00
bda47df4ad Alerting: Invalid setting of enabled for unified alerting should return error (#49876) 2022-06-10 08:59:58 +01:00
fd664e4beb Alerting: replace a duplicated configuration key (#50350)
This PR renames the configuration key enabled to capture. This is needed as we already have a configuration key with the name enabled.

Fixes #50328

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
2022-06-08 11:04:51 +08:00
3b7f871bf4 Alerting: Enable Unified Alerting for open source and enterprise (#49834)
This commit enables Unified Alerting for open source and enterprise unless disabled in configuration.
2022-05-30 16:47:15 +01:00
687e79538b Alerting: Add a general screenshot service and alerting-specific image service. (#49293)
This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service.

The screenshot package has the following services, most of which can be composed:

BrowserScreenshotService (Takes screenshots with headless Chrome)
CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService)
NoopScreenshotService (A no-op screenshot service for tests)
SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel)
ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable)
UploadingScreenshotService (A screenshot service that uploads taken screenshots)

The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296

This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.
2022-05-22 22:33:49 +08:00
e44ea3d589 Alerting: Rename setting AlertForDuration to DefaultRuleEvaluationInterval (#45569)
* fix AlertForDuration to DefaultRuleEvaluationInterval

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2022-02-18 10:05:06 -05:00
095ea44e97 Alerting: Move BaseInterval and MinInterval to UnifiedAlerting config (#45107)
* use base interval if legacy value is less than the base interval
2022-02-11 16:13:49 -05:00
5d66194ec5 FeatureFlags: define features outside settings.Cfg (take 3) (#44443) 2022-01-26 09:44:20 -08:00
65bdb3a899 FeatureFlags: Revert managing feature flags outside of settings.Cfg (#44382)
* Revert "FeatureToggles: register all enterprise feature toggles (#44336)"

This reverts commit f53b3fb0071c0d6d16a80d5e172a425aa3f72ca9.

* Revert "FeatureFlags: manage feature flags outside of settings.Cfg (#43692)"

This reverts commit f94c0decbd302140fffe351db200634a5c728545.
2022-01-24 16:08:05 +01:00
f94c0decbd FeatureFlags: manage feature flags outside of settings.Cfg (#43692) 2022-01-20 13:42:05 -08:00
005c8f8894 Alerting: Disable unified alerting by default in Enterprise Grafana (#42476)
* fallback to enable false if Enterprise is true

* anyBoolean
2021-11-29 20:51:15 +01:00
6523486122 Alerting: Make Unified Alerting enabled by default for those who do not use legacy alerting (#42200)
* update AlertingEnabled and UnifiedAlertingSettings.Enabled to be pointers
* add a pseudo migration to fix the AlertingEnabled and UnifiedAlertingSettings.Enabled if the latter is not defined
* update the default configuration file to make default value for both 'enabled' flags be undefined

Misc
* update Migrator to expose DB engine. This is needed for a ualert migration to access the database while the list of migrations is created.
* add more verbose failure when migrations do not match

Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2021-11-24 14:56:07 -05:00
012d4f0905 Alerting: Remove ngalert feature toggle and introduce two new settings for enabling Grafana 8 alerts and disabling them for specific organisations (#38746)
* Remove `ngalert` feature toggle

* Update frontend

Remove all references of ngalert feature toggle

* Update docs

* Disable unified alerting for specific orgs

* Add backend tests

* Apply suggestions from code review

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Disabled unified alerting by default

* Ensure backward compatibility with old ngalert feature toggle

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-29 16:16:40 +02:00
f6f3a54742 Alerting: tune rule evaluation via configuration (#35623)
* Alerting: Configure max evaluation retries

* Alerting: Enforce minimum rule evaluation interval

* Alerting: Disable rule evaluation from configuration

* Update docs

* Alerting: Configure rule evaluation timeout

* Move options on unified_alerting config section

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-28 13:00:16 +03:00
05eb30e323 Alerting: Move alertmanager default config to UnifiedAlertingSettings (#39597) 2021-09-23 13:52:20 -04:00
64c8d32fe7 Use sdk pkg for gtime (#39354) 2021-09-21 13:08:52 +02:00
2ad82b9354 Alerting: Move the unified alerting settings to its own struct (#39350) 2021-09-20 10:12:21 +03:00
7db97097c9 Alerting: Support Unified Alerting with Grafana HA (#37920)
* Alerting: Support Unified Alerting in Grafana's HA mode.
2021-09-16 15:33:51 +01:00