63 Commits

Author SHA1 Message Date
6262c56132 chore(perf): Pre-allocate where possible (enable prealloc linter) (#88952)
* chore(perf): Pre-allocate where possible (enable prealloc linter)

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>

* fix TestAlertManagers_buildRedactedAMs

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>

* prealloc a slice that appeared after rebase

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>

---------

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
2024-06-14 14:16:36 -04:00
58fdb24b0b Alerting: Recording rules appear as type=recording in Prometheus API + better abstraction for type (#88805)
* Wire status through to prom API

* Regenerate swagger
2024-06-07 11:24:06 -05:00
b66cd7ef79 Alerting: Add filters for RouteGetRuleStatuses (#88295)
* Placeholder commit with rule_uid change

* Add new filters to grafana rule state API

* Revert type change

* Split rule_group and rule_name params

* remove debug line

* Change how query params are parsed

* Comment
2024-06-04 10:57:55 +01:00
a6ad2380bf Alerting: Refactor api_prometheus.go request handlers. (#86639)
This splits the request handlers into two functions, one which is the actual
handler and one which is independent from the Grafana `ReqContext` object. This
is to make it easier to reuse the implementation in other code.

Part of the refactoring changes the functions which get query parameters from
the request to operate on a `url.Values` instead of the request object.

The change also makes the code consistently use `req.Form` instead of a
combination of `req.URL.Query()` and `req.Form`, though I have left
`api_ruler` as-is to avoid this PR growing too large.
2024-04-23 14:50:26 +02:00
6ea97e41fb Alerting: Consistently return Prometheus-style responses from rules APIs. (#86600)
* Alerting: Consistently return Prometheus-style responses from rules APIs.

This commit is part refactor and part fix. The /rules API occasionally returns
error responses which are inconsistent with other error responses. This fixes
that, and adds a function to map from Prometheus error type and HTTP code.

* Fix integration tests

* Linter happiness

* Make linter more happy

* Fix up one more place returning non-Prometheus responses
2024-04-19 21:03:20 +02:00
5f7612834e Alerting: Refactoring in api_prometheus.go to allow code reuse. (#86575)
Preparing these functions to be used by some other part of the codebase,
which does not have a `contextmodel.ReqContext`, only the normal request
structure (`url.Values`, etc). This is slightly messy because of how
Grafana allows url parameters to be in the URL or in the request body,
so we need to make sure to invoke the form parsing logic in `ReqContext`.
2024-04-19 12:52:01 +02:00
73873f5a8a Alerting: Optimize rule status gathering APIs when a limit is applied. (#86568)
* Alerting: Optimize rule status gathering APIs when a limit is applied.

The frontend very commonly calls the `/rules` API with `limit_alerts=16`. When
there are a very large number of alert instances present, this API is quite
slow to respond, and profiling suggests that a big part of the problem is
sorting the alerts by importance, in order to select the first 16.

This changes the application of the limit to use a more efficient heap-based
top-k algorithm. This maintains a slice of only the highest ranked items whilst
iterating the full set of alert instances, which substantially reduces the
number of comparisons needed. This is particularly effective, as the
`AlertsByImportance` comparison is quite complex.

I've included a benchmark to compare the new TopK function to the existing
Sort/limit strategy. It shows that for small limits, the new approach is
much faster, especially at high numbers of alerts, e.g.

100K alerts / limit 16: 1.91s vs 0.02s (-99%)

For situations where there is no effective limit, sorting is marginally faster,
therefore in the API implementation, if there is either a) no limit or b) no
effective limit, then we just sort the alerts as before. There is also a space
overhead using a heap which would matter for large limits.

* Remove commented test cases

* Make linter happy
2024-04-19 11:51:22 +02:00
47546a4c72 Alerting: Update API to use folders' full paths (#81214)
* update GetUserVisibleNamespaces to use FolderSeriver
* update GetNamespaceByUID to use FolderService.GetFolders
* update GetAlertRulesForScheduling to use FolderService.GetFolders 

* Update API and GetAlertRulesForScheduling to use the folder's full path
* get full path of folder in RouteTestGrafanaRuleConfig

* fix escaping of titles for MySQL
2024-02-06 17:12:13 -05:00
d1dab5828d Alerting: Update rule API to address folders by UID (#74600)
* Change ruler API to expect the folder UID as namespace

* Update example requests

* Fix tests

* Update swagger

* Modify FIle field in /api/prometheus/grafana/api/v1/rules

* Fix ruler export

* Modify folder in responses to be formatted as <parent UID>/<title>

* Add alerting test with nested folders

* Apply suggestion from code review

* Alerting: use folder UID instead of title in rule API (#77166)

Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>

* Drop a few more latent uses of namespace_id

* move getNamespaceKey to models package

* switch GetAlertRulesForScheduling to use folder table

* update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`.

* fi tests

* add tests for GetAlertRulesForScheduling when parent uid

* fix integration tests after merge

* fix test after merge

* change format of the namespace to JSON array

this is needed for forward compatibility, when we migrate to full paths

* update EF code to decode nested folder

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2024-01-17 11:07:39 +02:00
64feeddc23 Alerting: Update rule access control to return errutil errors (#78284)
* update rule access control to return errutil errors
* use alerting in msgID
2023-12-02 01:42:11 +02:00
7cec741bae Alerting: Extract alerting rules authorization logic to a service (#77006)
* extract alerting authorization logic to separate package
* convert authorization logic to service
2023-11-15 18:54:54 +02:00
Jo
580477bf8e NGAlerting: Use identity.Requester interface instead of SignedInUser (#76360)
* unfurl SignedInUserAttrs services

* replace signedInUser with Requester

replace signedInUser with requester

* fix tests

* linting

---------

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>
2023-11-14 14:47:34 +00:00
Jo
dcd0c6b11e Identity: Unfurl OrgID in pkg/services to allow using identity.Requester interface (#76113)
Unfurl OrgID in pkg/services to allow using identity.Requester interface
2023-10-09 10:40:19 +02:00
58f6648505 Chore: capitalise messages for alerting (#74335) 2023-09-04 18:46:34 +02:00
d98813796c RBAC: Remove legacy AC from HasAccess permission check (#68995)
* remove unused HasAdmin and HasEdit permission methods

* remove legacy AC from HasAccess method

* remove unused function

* update alerting tests to work with RBAC
2023-05-30 14:39:09 +01:00
eddd4f4508 Alerting: Add totalsFiltered to RuleResponse for hidden by filters count (#66883)
Alerting: Add totalsFiltered to RuleResponse to facilitate hidden by filters count

Currently, when both a limit_alerts and a matcher/state filter is applied, there is not enough information to determine how many alert instances were hidden by the filters. Only enough to determine the total hidden by the limit and filter combined.

This change adds a separate totalsFiltered field alongside the AlertRule totals that will contain the count of instances after filters but before limits.
2023-04-21 09:35:12 +01:00
19ebb079ba Alerting: Add limits and filters to Prometheus Rules API (#66627)
This commit adds support for limits and filters to the Prometheus Rules
API.

Limits:

It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:

- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule

It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.

Filters:

Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.

A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.

Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.

Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.

The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:

{
    "name": "test",
    "value": "value1",
    "isRegex": false,
    "isEqual": true
}

The `isRegex` and `isEqual` options work as follows:

| IsEqual | IsRegex  | Operator |
| ------- | -------- | -------- |
| true    | false    |    =     |
| true    | true     |    =~    |
| false   | true     |    !~    |
| false   | false    |    !=    |
2023-04-17 17:45:06 +01:00
bd29071a0d Revert "Alerting: Add limits to the Prometheus Rules API" (#65842) 2023-04-03 15:20:37 +00:00
d96b0a71d3 Alerting: Add limits to the Prometheus Rules API (#65169)
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:

1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule

It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
2023-04-03 10:17:02 +01:00
0beb768427 Chore: Remove result fields from ngalert (#65410)
* remove result fields from ngalert

* remove duplicate imports
2023-03-28 10:34:35 +02:00
d6d4097567 Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
6c5a573772 Chore: Move ReqContext to contexthandler service (#62102)
* Chore: Move ReqContext to contexthandler service

* Rename package to contextmodel

* Generate ngalert files

* Remove unused imports
2023-01-27 08:50:36 +01:00
080ea88af7 Nested Folders: Support getting of nested folder in folder service wh… (#58597)
* Nested Folders: Support getting of nested folder in folder service when feature flag is set

* Fix lint

* Fix some tests

* Fix ngalert test

* ngalert fix

* Fix API tests

* Fix some tests and lint

* Fix lint 2

* Fix library elements and panels

* Add access control to get folder

* Cleanup and minor test change
2022-11-11 14:28:24 +01:00
cc8c1380e2 Alerting: Persist annotations from multidimensional rules in batches (#56575)
* Reduce piecemeal state fields

* Read data directly off state instead of rule

* Unify state and context into single struct

* Expose contextual information to layer above setNextState

* Work in terms of ContextualState and call historian in batches

* Call annotations service in batches

* Export format state and reason and remove workaround in unrelated test package

* Add new method to annotation service for batch inserting

* Fix loop variable aliasing bug caught by linter, didn't change behavior

* Incl timerange on annotation tests

* Insert one at a time if tags are present

* Point to rule from ContextualState rather than copy fields

* Build annotations and copy data prior to starting goroutine

* Rename to StateTransition

* Use new bulk-insert utility

* Remove rule from StateTransition and pass in directly to historian

* Simplify annotations logic since we have only one rule

* Fix logs and context, nilcheck, simplify method name

* Regenerate mock
2022-11-04 10:39:26 -05:00
3ddb28bad9 Find-and-replace 'err' logs to 'error' to match log search conventions (#57309) 2022-10-19 17:36:54 -04:00
d17ab82b98 Alerting: Break up store.RuleStore interface, delete dead code (#55776)
* Refactor state manager to not depend on rule store interface

* Refactor grafana and proxied ruler APIs to not depend on store.RuleStore

* Refactor folder subscription logic to not use store.RuleStore

* Delete dead code

* Delete store.RuleStore
2022-09-27 08:56:30 -05:00
a14621fff6 Chore: Add user service method SetUsingOrg and GetSignedInUserWithCacheCtx (#53343)
* Chore: Add user service method SetUsingOrg

* Chore: Add user service method GetSignedInUserWithCacheCtx

* Use method GetSignedInUserWithCacheCtx from user service

* Fix lint after rebase

* Fix lint

* Fix lint error

* roll back some changes

* Roll back changes in api and middleware

* Add xorm tags to SignedInUser ID fields
2022-08-11 13:28:55 +02:00
4d02f73e5f Alerting: Persist rule position in the group (#50051)
Migrations:
* add a new column alert_group_idx to alert_rule table
* add a new column alert_group_idx to alert_rule_version table
* re-index existing rules during migration

API:
* set group index on update. Use the natural order of items in  the array as group index
* sort rules in the group on GET
* update the version of all rules of all affected groups. This will make optimistic lock work in the case of multiple concurrent request touching the same groups.

UI:
* update UI to keep the order of alerts in a group
2022-06-22 10:52:46 -04:00
0cde283505 Alerting: Logs should not be capitalized and the errors key should be "err" (#50333)
* Alerting: decapitalize log lines and use "err" as the key for errors

Found using (logger|log).(Warn|Debug|Info|Error)\([A-Z] and (logger|log).(Warn|Debug|Info|Error)\(.+"error"
2022-06-07 19:54:23 +02:00
ad25e2a20c Alerting: Update RBAC for alert rules to consider access to rule as access to group it belongs (#49033)
* update authz to exclude entire group if user does not have access to rule
* change rule update authz to not return changes because if user does not have access to any rule in group, they do not have access to the rule
* a new query that returns alerts in group by UID of alert that belongs to that group
* collect all affected groups during calculate changes
* update authorize to check access to groups
* update tests for calculateChanges to assert new fields
* add authorization tests
2022-06-01 10:23:54 -04:00
1cc034d960 Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
f7f2253072 Alerting: Fix anonymous access to alerting (#49203)
* introduce a fallback handler that checks that role is Viewer.
* update UI nav links to allow alerting tabs for anonymous user
* update rule api to check for Viewer role instead of SignedIn when RBAC is disabled
2022-05-19 09:22:26 -04:00
952cb4fc0b Alerting: introduce AlertRuleGroupKey and use it in API handlers (#48945)
* create AlertGroupKey structure
* update PrometheusSrv.
  - extract creation of RuleGroup to a separate method. Use group key for grouping
* update RuleSrv 
 - update calculateChanges to use groupKey
 - authorize to use groupkey
2022-05-16 15:45:45 -04:00
c5547123bc Remove redundant queries in GetAlertRules and GetOrgAlertRules and replace with ListAlertRules (#48108) 2022-04-25 11:42:42 +01:00
af9353caec Alerting: Add check for datasource permission in alert rule read API (#47087)
* add check for access to rule's data source in GET APIs

* use more general method GetAlertRules instead of GetNamespaceAlertRules.
* remove unused GetNamespaceAlertRules.

Tests:
* create a method to generate permissions for rules
* extract method to create RuleSrv
* add tests for RouteGetNamespaceRulesConfig
2022-04-11 17:37:44 -04:00
48519f9ebb Alerting: reduce database calls in prometheus-comptible rules API (#47080)
* move validation at the beginning of method
* remove usage of GetOrgRuleGroups because it is not necessary. All information is already available in memory.
* remove unused method
2022-04-11 10:54:29 -04:00
cb6124c921 Alerting: Accurately set value for prom-compatible APIs (#47216)
* Alerting: Accurately set value for prom-compatible APIs

Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames.

* Fix an extra test

* Ensure a consitent ordering

* Address review comments

* address review comments
2022-04-05 19:36:42 +01:00
51114527dc Alerting: handle folder permissions when fine-grained access enabled (#47035)
* Use alert:create action for folder search with edit permissions. This matches the action that is used to query dashboards (the update will be addressed later)
* Update rule store to use FindDashboards instead of folder service to list folders the user has access to view alerts. Folder service does not support query type and additional filters. 
* Do not check whether the user can save to folder if FGAC is enabled because it is checked on API level.
2022-04-01 19:33:26 -04:00
a338c78ca8 Alerting: Remove internal labels from prometheus compatible API responses (#46548)
* Alerting: Remove internal labels from prometheus compatible API responses

* Appease the linter

* Fix integration tests

* Fix API documentation & linter

* move removal of internal labels to the models
2022-03-16 16:04:19 +00:00
a75d4fcbd8 Alerting: Display query from grafana-managed alert rules on /api/v1/rules (#45969)
* Aleting: Extract query from alerting rule model for api/v1/rules

* more changes and fixtures

* appease the linter
2022-03-14 10:39:20 +00:00
8d4a0a0396 Alerting: Include annotations in prometheus Alert response. (#45970)
* Alerting: Include annotations in prometheus Alert response.

* add tests

* re-order depedencies
2022-03-09 18:20:29 +00:00
a9399ab3cd Alerting: Add context.Context to RuleStore (#45004)
Alerting: Add context.Context to RuleStore
2022-02-08 08:52:03 +00:00
984c95de63 Do not store EvaluationString in Evaluation. (#44606)
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state
2022-02-02 19:18:20 +01:00
6220872633 Alerting: fix bug where user is able to access rules from namespaces user is not part of (#41403)
* Add fix
* Add tests
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
2021-11-08 14:26:08 +01:00
2a4c1b1aa6 You can now get alert rules for a dashboard or a panel using /api/v1/rules endpoints. (#39476)
Get alert rules for a dashboard and panel in /api/v1/rules
2021-10-04 16:33:55 +01:00
e343b62665 Alerting: make /api/prometheus/grafana/api/v1/rules faster (#39660) 2021-10-01 16:39:04 +03:00
fa9857499b Chore: GetDashboardQuery should be dispatched using DispatchCtx (#36877)
* Chore: GetDashboardQuery should be dispatched using DispatchCtx

* Fix after merge

* Changes after review

* Various fixes

* Use GetDashboardCtx function instead of GetDashboard
2021-09-14 16:08:04 +02:00
7815ed511f Alerting: Refactor API endpoints for fetching alert rules (#37055)
* Refactor ruler API endpoint for listing rules

* Refactor prometheus API endpoint for listing rules

* Update HTTP API docs
2021-07-22 09:53:14 +03:00
fa0bed7118 do not over write alerting rule duration (#36930) 2021-07-20 11:49:35 +05:30
8a3edf280e Alerting: Fix prometheus API to check folder permissions (#36301) 2021-07-05 10:49:14 +03:00