* Alerting: Add extended definition to prometheus alert rules api
This adds `isPaused` and `notificationSettings` to the paginated rules api to enable the paginated view of GMA rules.
refactor: make alert rule status and state retrieval extensible
This lets us get status from other sources than the local ruler.
* update swagger spec
* add safety checks in test
What is this feature?
This PR introduces a new alert rule configuration option, keep_firing_for (Prometheus documentation).
keep_firing_for prevents alerts from resolving immediately after the alert condition returns to normal. Instead, they transition into a "Recovering" state and are not considered resolved by the Alertmanager. Once the recovery period ends (or after the next evaluation if it is bigger than keep_firing_for), the alert transitions to "Normal" if it doesn't start alerting again:
Before
+----------+ +----------+
| Alerting |---->| Normal |
+----------+ +----------+
-----
After
+----------+ +------------+ +----------+
| Alerting |----->| Recovering |---->| Normal |
+----------+ +------------+ +----------+
Why do we need this feature?
This feature prevents flapping alerts by adding a recovery period. This helps avoid false resolutions caused by brief alert
* add action set resolver
* rename variables
* some fixes and some tests
* more tests
* more tests, and put action set storing behind a feature toggle
* undo change from cfg to feature mgmt - will cover it in a separate PR due to the amount of test changes
* fix dependency cycle, update some tests
* add one more test
* fix for feature toggle check not being set on test configs
* linting fixes
* check that action set name can be split nicely
* clean up tests by turning GetActionSetNames into a function
* undo accidental change
* test fix
* more test fixes
* Alerting: Consistently return Prometheus-style responses from rules APIs.
This commit is part refactor and part fix. The /rules API occasionally returns
error responses which are inconsistent with other error responses. This fixes
that, and adds a function to map from Prometheus error type and HTTP code.
* Fix integration tests
* Linter happiness
* Make linter more happy
* Fix up one more place returning non-Prometheus responses
* replace sqlstore with db interface in a few packages
* remove from stats
* remove sqlstore in admin test
* remove sqlstore from api plugin tests
* fix another createUser
* remove sqlstore in publicdashboards
* remove sqlstore from orgs
* clean up orguser test
* more clean up in sso
* clean up service accounts
* further cleanup
* more cleanup in accesscontrol
* last cleanup in accesscontrol
* clean up teams
* more removals
* split cfg from db in testenv
* few remaining fixes
* fix test with bus
* pass cfg for testing inside db as an option
* set query retries when no opts provided
* revert golden test data
* rebase and rollback
* update storage's method InstertRules to return ids of added rules as slice to keep the same order as rules in the argument
* schematize response of update rule group endpoint, add created, updated, deleted fields that contain UID of affected rules.
* update integration tests to use the new fields
* add a feature toggle
* add the fields for attribute, kind and identifier to permission
Co-authored-by: Kalle Persson <kalle.persson@grafana.com>
* set the new fields when new permissions are stored
* add migrations
Co-authored-by: Kalle Persson <kalle.persson@grafana.com>
* remove comments
* Update pkg/services/accesscontrol/migrator/migrator.go
Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
* feedback: put column migrations behind the feature toggle, added an index, changed how wildcard scopes are split
* PR feedback: add a comment and revert an accidentally changed file
* PR feedback: handle the case with : in resource identifier
* switch from checking feature toggle through cfg to checking it through featuremgmt
* don't put the column migrations behind a feature toggle after all - this breaks permission queries from db
---------
Co-authored-by: Kalle Persson <kalle.persson@grafana.com>
Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
This commit adds support for limits and filters to the Prometheus Rules
API.
Limits:
It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:
- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.
Filters:
Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.
A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.
Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.
Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.
The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:
{
"name": "test",
"value": "value1",
"isRegex": false,
"isEqual": true
}
The `isRegex` and `isEqual` options work as follows:
| IsEqual | IsRegex | Operator |
| ------- | -------- | -------- |
| true | false | = |
| true | true | =~ |
| false | true | !~ |
| false | false | != |
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:
1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule
It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
* copy AlertQuery from ngmodels to the definition package
* replaces usages of ngmodels.AlertQuery in API models
* create a converter between models of AlertQuery
---------
Co-authored-by: Alex Moreno <alexander.moreno@grafana.com>
* Use suggested value for uid
* update the snapshot
* use __expr__
* replace all -100 with __expr__
* update snapshot
* more changes
* revert redundant change
* Use expr.DatasourceUID where it's possible
* generate files
* Rename file to store
* Move resource permission specific database functions to
resourcepermissions package
* Wire: Remove interface bind
* RBAC: Remove injection of resourcepermission Store
* RBAC: Export store constructor
* Tests: Use resource permission package to initiate store used in tests
* RBAC: Remove internal types package and move to resourcepermissions
package
* RBAC: Run database tests as itegration tests
* Move SignedInUser to user service and RoleType and Roles to org
* Use go naming convention for roles
* Fix some imports and leftovers
* Fix ldap debug test
* Fix lint
* Fix lint 2
* Fix lint 3
* Fix type and not needed conversion
* Clean up messages in api tests
* Clean up api tests 2
* make 'for' pointer to distinguish between missing field and 0
* set 'for' to -1 if the value is missing but not allow negative in the request + path -1 with the value from original rule
* update store validation to not allow negative 'for'
* update usages to use pointer
* Split Create User
* Use new create user and User from package user
* Add service to wire
* Making create user work
* Replace user from user pkg
* One more
* Move Insert to orguser Service/Store
* Remove unnecessary conversion
* Cleaunp
* Fix Get User and add fakes
* Fixing get org id for user logic, adding fakes and other adjustments
* Add some tests for ourguser service and store
* Fix insert org logic
* Add comment about deprecation
* Fix after merge with main
* Move orguser service/store to org service/store
* Remove orguser from wire
* Unimplement new Create user and use User from pkg user
* Fix wire generation
* Fix lint
* Fix lint - use only User and CrateUserCommand from user pkg
* Remove User and CreateUserCommand from models
* Fix lint 2
* Add RBAC section to settings
* Default to RBAC enabled settings to true
* Update tests to respect RBAC
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
* use common traceID context value for opentracing and opentelemetry
* support sampled trace IDs as well
* inject traceID into NormalResponse on errors
* Finally the test passed
* fix the test
* fix linter
* change the function parameter
Co-authored-by: Ying WANG <ying.wang@grafana.com>
* Alerting: Remove internal labels from prometheus compatible API responses
* Appease the linter
* Fix integration tests
* Fix API documentation & linter
* move removal of internal labels to the models
* Update API controller
- add validation of rules API model
- add function to calculate changes between the submitted alerts and existing alerts
- update RoutePostNameRulesConfig to validate input models, calculate changes and apply in a transaction
* Update DBStore
- delete unused storage method. All the logic is moved upstream.
- upsert to not modify fields of new by values from the existing alert
- if rule has UID do not try to pull it from db. (it is done upstream)
* Add rule generator
* Add providers to folder and dashboard services
* Refactor folder and dashboard services
* Move store implementation to its own file due wire cannot allow us to cast to SQLStore
* Add store in some places and more missing dependencies
* Bad merge fix
* Remove old functions from tests and few fixes
* Fix provisioning
* Remove store from http server and some test fixes
* Test fixes
* Fix dashboard and folder tests
* Fix library tests
* Fix provisioning tests
* Fix plugins manager tests
* Fix alert and org users tests
* Refactor service package and more test fixes
* Fix dashboard_test tets
* Fix api tests
* Some lint fixes
* Fix lint
* More lint :/
* Move dashboard integration tests to dashboards service and fix dependencies
* Lint + tests
* More integration tests fixes
* Lint
* Lint again
* Fix tests again and again anda again
* Update searchstore_test
* Fix goimports
* More go imports
* More imports fixes
* Fix lint
* Move UnprovisionDashboard function into dashboard service and remove bus
* Use search service instead of bus
* Fix test
* Fix go imports
* Use nil in tests
* Separate Tracer interface to TracerService and Tracer
* Fix lint
* Fix:Make it possible to start spans for both opentracing and opentelemetry in ds proxy
* Add span methods, use span interface for rest of tracing
* Fix logs in tracing
* Fix tests that are related to tracing
* Fix resourcepermissions test
* Fix some tests
* Fix more tests
* Add TracingService to wire cli runner
* Remove GlobalTracer from bus
* Renaming test function
* Remove GlobalTracer from TSDB
* Replace GlobalTracer in api
* Adjust tests to the InitializeForTests func
* Remove GlobalTracer from services
* Remove GlobalTracer
* Remove bus.NewTest
* Remove Tracer interface
* Add InitializeForBus
* Simplify tests
* Clean up tests
* Rename TracerService to Tracer
* Update pkg/middleware/request_tracing.go
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Initialize tracer before passing it to SQLStore initialization in commands
* Remove tests for opentracing
* Set span attributes correctly, remove unnecessary trace initiliazation form test
* Add tracer instance to newSQLStore
* Fix changes due to rebase
* Add modified tracing middleware test
* Fix opentracing implementation tags
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Fixes#30144
Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
* Alerting: modify table and accessors to limit org access appropriately
* Update migration to create multiple Alertmanager configs
* Apply suggestions from code review
Co-authored-by: gotjosh <josue@grafana.com>
* replace mg.ClearMigrationEntry()
mg.ClearMigrationEntry() would create a new session.
This commit introduces a new migration for clearing an entry from migration log for replacing mg.ClearMigrationEntry() so that all dashboard alert migration operations will run inside the same transaction.
It adds also `SkipMigrationLog()` in Migrator interface for skipping adding an entry in the migration_log.
Co-authored-by: gotjosh <josue@grafana.com>
* [Alerting]: forbid viewers for updating rules if viewers can edit
check for CanSave instead of CanEdit
* Clear ngalert tables when deleting the folder
* Apply suggestions from code review
* Log failure to check save permission
Co-authored-by: gotjosh <josue@grafana.com>
* Quota: Extend service to set limit on alerts
* Add test for applying quota to alert rules
* Apply suggestions from code review
Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>
* Get used alert quota only if naglert is enabled
* Set alert limit to zero if nglalert is not enabled
Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>