mirror of
https://github.com/grafana/grafana.git
synced 2025-08-03 06:12:20 +08:00
Alerting docs: update Introduction > Notification policies
(#88656)
* Alerting docs: Notification policies and grouping * Update docs/sources/alerting/fundamentals/notifications/group-alert-notifications.md Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com> * Update docs/sources/alerting/fundamentals/notifications/group-alert-notifications.md Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com> * Update docs/sources/alerting/fundamentals/notifications/group-alert-notifications.md Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com> * Update docs/sources/alerting/fundamentals/notifications/notification-policies.md Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com> * Update docs/sources/alerting/fundamentals/notifications/notification-policies.md Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com> * Change `alt` text in a diagram * Clarify `siblings` and `child` policies * Fix spelling error * minor change * Rewrite routing * Update docs/sources/alerting/fundamentals/notifications/notification-policies.md Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com> * Update docs/sources/alerting/fundamentals/notifications/notification-policies.md Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com> * Rewrite Routing * extend routing * Minor `Group by` example * Clarify how Grafana groups alerts by the alert rule * Skip bold style for `group` options --------- Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>
This commit is contained in:
@ -20,6 +20,11 @@ refs:
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rule-evaluation/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rule-evaluation/
|
||||
group-alert-notifications:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/group-alert-notifications/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/group-alert-notifications/
|
||||
templates:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/templates/
|
||||
@ -37,9 +42,9 @@ refs:
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/notification-policies/
|
||||
notification-timings:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/notification-policies/#timing-options
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/group-alert-notifications/#timing-options
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/notification-policies/#timing-options
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/group-alert-notifications/#timing-options
|
||||
silences:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/configure-notifications/create-silence/
|
||||
@ -95,21 +100,21 @@ The notification policy tree is responsible for:
|
||||
Each notification policy handles specific tasks:
|
||||
|
||||
- Deciding which contact point receives the alert notification.
|
||||
- Controlling when to send notifications based on its [notification timings](ref:notification-timings).
|
||||
- [Grouping multiple alerts](#group-alert-notifications) into a single notification to reduce alert noise.
|
||||
- Controlling when to send notifications based on its notification timing options.
|
||||
- Grouping multiple alerts into a single notification to reduce alert noise.
|
||||
|
||||
{{< figure src="/media/docs/alerting/alerting-notification-policy-diagram-v2.png" max-width="750px" alt="A diagram of the notification policy component" >}}
|
||||
{{< figure src="/media/docs/alerting/alerting-notification-policy-diagram-v5.png" max-width="750px" alt="A diagram of the notification policy component" >}}
|
||||
|
||||
### Group alert notifications
|
||||
|
||||
When something fails in our system, our alerting setup can easily trigger hundreds or even thousands of alert instances (notifications). Several alert rules often fail simultaneously. Additionally, each alert rule may generate multiple alert instances.
|
||||
|
||||
Grouping alert notifications is commonly necessary to avoid bombarding our alert inbox. Grouping combines similar alert instances in a given period into one single notification.
|
||||
[Grouping alert notifications](ref:group-alert-notifications) is commonly necessary to avoid bombarding our alert inbox. Grouping combines similar alert instances in a given period into one single notification.
|
||||
|
||||
Notification grouping uses:
|
||||
|
||||
- **Matching labels**: Group alert instances of the same type by matching their labels.
|
||||
- **[Notification timings](ref:notification-timings)**: Wait for a specified period before sending the notification, allowing for the grouping of incoming alert instances.
|
||||
- **Labels**: Group alert instances of the same type by using labels.
|
||||
- **Timing options**: Wait for a specified period before sending the notification, allowing for the grouping of incoming alert instances.
|
||||
|
||||
### Templates, silences and mute timings
|
||||
|
||||
|
@ -0,0 +1,90 @@
|
||||
---
|
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/fundamentals/notifications/group-alert-notifications/
|
||||
description: Learn about how notification policies group alert notifications
|
||||
keywords:
|
||||
- grafana
|
||||
- alerting
|
||||
- notification policies
|
||||
labels:
|
||||
products:
|
||||
- cloud
|
||||
- enterprise
|
||||
- oss
|
||||
title: Group alert notifications
|
||||
menuTitle: Grouping
|
||||
weight: 114
|
||||
refs:
|
||||
alert-labels:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/annotation-label/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/annotation-label/
|
||||
notification-policies:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/notification-policies/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/notification-policies/
|
||||
silences:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/configure-notifications/create-silence/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/configure-notifications/create-silence/
|
||||
---
|
||||
|
||||
# Group alert notifications
|
||||
|
||||
Grouping is an important feature of Grafana Alerting as it allows you to batch relevant alerts together into a smaller number of notifications. This is particularly important if notifications are delivered to first-responders, such as engineers on-call, where receiving lots of notifications in a short period of time can be overwhelming and in some cases can negatively impact a first-responders ability to respond to an incident. For example, consider a large outage where many of your systems are down. In this case, grouping can be the difference between receiving 1 phone call and 100 phone calls.
|
||||
|
||||
## Group notifications
|
||||
|
||||
Grouping combines similar alert instances within a specific period into a single notification, reducing alert noise.
|
||||
|
||||
In the notification policy, you can configure how to group multiple alerts into a single notification:
|
||||
|
||||
- The `Group by` option specifies the criteria for grouping incoming alerts within the policy. The default is by alert rule.
|
||||
- [Timing options](#timing-options) determine when to sent the notification.
|
||||
|
||||
{{< figure src="/media/docs/alerting/alerting-notification-policy-diagram-with-labels-v3.png" max-width="750px" alt="A diagram about the components of a notification policy, including labels and groups" >}}
|
||||
|
||||
Alert instances are grouped together if they have the same exact label values for the labels configured in the `Group by` option.
|
||||
|
||||
For example, given the `Group by` option set to the `team` label:
|
||||
|
||||
- `alertname:foo, team=frontend`, and `alertname:bar, team=frontend` are in one group.
|
||||
- `alertname:foo, team=backend`, and `alertname:qux, team=backend` are in another group.
|
||||
|
||||
### Group by alert rule or labels
|
||||
|
||||
By default, notification policies in Grafana group alerts by the alert rule. Specifically, they are grouped using the `alertname` and `grafana_folder` labels, as alert rule names are not unique across folders.
|
||||
|
||||
If you want to group alerts by other labels, something other than the alert rule, change the `Group by` option to any other combination of labels.
|
||||
|
||||
### A single group for all alerts
|
||||
|
||||
If you want to group all alerts handled by the notification policy in a single group (without grouping notifications by alert rule or other labels), you can do so by leaving `Group by` empty.
|
||||
|
||||
### Disable grouping
|
||||
|
||||
If you want to receive every alert as a separate notification, you can do so by grouping by a special label called `...`.
|
||||
|
||||
## Timing options
|
||||
|
||||
The timing options decide how often notifications are sent for each group of alerts. There are three timers that you need to know about: Group wait, Group interval, and Repeat interval.
|
||||
|
||||
#### Group wait
|
||||
|
||||
Group wait is the amount of time Grafana waits before sending the first notification for a new group of alerts. The longer Group wait is the more time you have for other alerts to arrive. The shorter Group wait is the earlier the first notification is sent, but at the risk of sending incomplete notifications. You should always choose a Group wait that makes the most sense for your use case.
|
||||
|
||||
**Default** 30 seconds
|
||||
|
||||
#### Group interval
|
||||
|
||||
Once the first notification has been sent for a new group of alerts, the Group interval timer starts. This is the amount of wait time before notifications about changes to the group are sent. For example, another firing alert might have just been added to the group while an existing alert might have resolved. If an alert was too late to be included in the first notification due to Group wait, it is included in subsequent notifications after Group interval. Once Group interval has elapsed, Grafana resets the Group interval timer. This repeats until there are no more alerts in the group after which the group is deleted.
|
||||
|
||||
**Default** 5 minutes
|
||||
|
||||
#### Repeat interval
|
||||
|
||||
Repeat interval decides how often notifications are repeated if the group has not changed since the last notification. You can think of these as reminders that some alerts are still firing. Repeat interval is closely related to Group interval, which means your Repeat interval must not only be greater than or equal to Group interval, but also must be a multiple of Group interval. If Repeat interval is not a multiple of Group interval it is coerced into one. For example, if your Group interval is 5 minutes, and your Repeat interval is 9 minutes, the Repeat interval is rounded up to the nearest multiple of 5 which is 10 minutes.
|
||||
|
||||
**Default** 4 hours
|
@ -18,130 +18,94 @@ labels:
|
||||
title: Notification policies
|
||||
weight: 113
|
||||
refs:
|
||||
alert-labels:
|
||||
contact-points:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/annotation-label/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/contact-points/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/annotation-label/
|
||||
notification-policies:
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/contact-points/
|
||||
notification-timings:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/notification-policies/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/group-alert-notifications/#timing-options
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/notification-policies/
|
||||
silences:
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/group-alert-notifications/#timing-options
|
||||
mute-timings:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/configure-notifications/create-silence/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/configure-notifications/mute-timings/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/configure-notifications/create-silence/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/configure-notifications/mute-timings/
|
||||
group-alert-notifications:
|
||||
- pattern: /docs/grafana/
|
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/group-alert-notifications/
|
||||
- pattern: /docs/grafana-cloud/
|
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/group-alert-notifications/
|
||||
---
|
||||
|
||||
# Notification policies
|
||||
|
||||
Notification policies provide you with a flexible way of routing alerts to various different receivers. Using label matchers, you can modify alert notification delivery without having to update every individual alert rule.
|
||||
Notification policies provide you with a flexible way of designing how to handle notifications and minimize alert noise.
|
||||
|
||||
Learn more about how notification policies work and are structured, so that you can make the most out of setting up your notification policies.
|
||||
Using label matchers, alert instances are [routed to notification policies](#routing). The notification policy can then [group multiple alert instances into a single notification](ref:group-alert-notifications) and deliver it to the contact point.
|
||||
|
||||
## Policy tree
|
||||
{{< figure src="/media/docs/alerting/how-alerting-works.png" max-width="750px" alt="How Alerting works" >}}
|
||||
|
||||
Notification policies are _not_ a list, but rather are structured according to a [tree structure](https://en.wikipedia.org/wiki/Tree_structure). This means that each policy can have child policies, and so on. The root of the notification policy tree is called the **Default notification policy**.
|
||||
Notification policies are _not_ a list, but rather are structured according to a [tree structure](https://en.wikipedia.org/wiki/Tree_structure):
|
||||
|
||||
Each policy consists of a set of label matchers (0 or more) that specify which labels they are or aren't interested in handling.
|
||||
- The root of the notification policy tree is the **Default notification policy**.
|
||||
- Each policy can have child policies.
|
||||
- Each policy can have sibling policies, sharing the same parent and hierarchical level.
|
||||
|
||||
Each policy consists of a set of label matchers (0 or more) that specify which alerts they are or aren't interested in handling. A matching policy refers to a notification policy with label matchers that match the alert instance’s labels.
|
||||
|
||||
{{< docs/shared lookup="alerts/how_label_matching_works.md" source="grafana" version="<GRAFANA_VERSION>" >}}
|
||||
|
||||
{{% admonition type="note" %}}
|
||||
If you haven't configured any label matchers for your notification policy, your notification policy matches _all_ alert instances. This may prevent child policies from being evaluated unless you have enabled **Continue matching siblings** on the notification policy.
|
||||
{{% /admonition %}}
|
||||
{{< figure src="/media/docs/alerting/notification-routing.png" max-width="750px" caption="Matching alert instances with notification policies" alt="Example of a notification policy tree" >}}
|
||||
|
||||
## Routing
|
||||
|
||||
To determine which notification policy handles which alert instances, you have to start by looking at the existing set of notification policies, starting with the default notification policy.
|
||||
To determine which notification policies handle an alert instance, the system looks for matching policies starting from the top of the tree—beginning with the default notification policy.
|
||||
|
||||
If no policies other than the default policy are configured, the default policy handles the alert instance.
|
||||
If a matching policy is found, the system continues to evaluate its child policies in the order they are displayed. If a child policy matches the alert, the system then evaluates its child policies recursively until no more matching child policies are found. In this case, only the deepest matching child policy handles the alert instance.
|
||||
|
||||
If policies other than the default policy are defined, it evaluates those notification policies in the order they are displayed.
|
||||
By default, once a matching policy is found, the system does not continue to look for sibling policies. If you want sibling policies of one matching policy to handle the alert instance as well, then enable **Continue matching siblings** on the particular matching policy.
|
||||
|
||||
If a notification policy has label matchers that match the labels of the alert instance, it descends in to its child policies and, if there are any, continues to look for any child policies that might have label matchers that further narrow down the set of labels, and so forth until no more child policies have been found.
|
||||
{{% admonition type="note" %}}
|
||||
|
||||
If no child policies are defined in a notification policy or if none of the child policies have any label matchers that match the alert instance's labels, the default notification policy is used.
|
||||
The default notification policy matches all alert instances. It always handles alert instances if there are no child policies or if none of the child policies match the alert instance's labels—this prevents any alerts from being missed.
|
||||
|
||||
As soon as a matching policy is found, the system does not continue to look for other matching policies. If you want to continue to look for other policies that may match, enable **Continue matching siblings** on that particular policy.
|
||||
{{% /admonition %}}
|
||||
|
||||
Lastly, if none of the notification policies are selected the default notification policy is used.
|
||||
{{< collapse title="Routing example" >}}
|
||||
|
||||
### Routing example
|
||||
|
||||
Here is an example of a relatively simple notification policy tree and some alert instances.
|
||||
|
||||
{{< figure src="/media/docs/alerting/notification-routing.png" max-width="750px" caption="Notification policy routing" >}}
|
||||
|
||||
Here's a breakdown of how these policies are selected:
|
||||
Here's a breakdown of the previous example:
|
||||
|
||||
**Pod stuck in CrashLoop** does not have a `severity` label, so none of its child policies are matched. It does have a `team=operations` label, so the first policy is matched.
|
||||
|
||||
The `team=security` policy is not evaluated a match was already found and **Continue matching siblings** was not configured for that policy.
|
||||
The `team=security` policy is not a match and **Continue matching siblings** was not configured for that policy.
|
||||
|
||||
**Disk Usage – 80%** has both a `team` and `severity` label, and matches a child policy of the operations team.
|
||||
|
||||
**Unauthorized log entry** has a `team` label but does not match the first policy (`team=operations`) since the values are not the same, so it will continue searching and match the `team=security` policy. It does not have any child policies, so the additional `severity=high` label is ignored.
|
||||
|
||||
{{< /collapse >}}
|
||||
|
||||
This routing and tree structure make it easy to organize and handle alerts for dedicated teams, while also narrowing down specific cases within the team by applying additional labels.
|
||||
|
||||
## Inheritance
|
||||
|
||||
In addition to child policies being a useful concept for routing alert instances, they also inherit properties from their parent policy. This also applies to any policies that are child policies of the default notification policy.
|
||||
In addition to child policies being a useful concept for routing alert instances, they also inherit properties from their parent policy. This also applies to child policies of the default notification policy.
|
||||
|
||||
The following properties are inherited by child policies:
|
||||
By default, a child policy inherits the following notification properties from its parent:
|
||||
|
||||
- Contact point
|
||||
- Grouping options
|
||||
- Timing options
|
||||
- Mute timings
|
||||
- [Contact point](ref:contact-points)
|
||||
- [Grouping options](ref:group-alert-notifications)
|
||||
- [Timing options](ref:notification-timings)
|
||||
|
||||
Each of these properties can be overwritten by an individual policy if you want to override the inherited properties.
|
||||
Then, each policy can overwrite these properties if needed.
|
||||
|
||||
To inherit a contact point from the parent policy, leave it blank. To override the inherited grouping options, enable **Override grouping**. To override the inherited timing options, enable **Override general timings**.
|
||||
The inheritance of notification properties, together with the routing process, is an effective method for grouping related notifications and handling specific cases through child policies.
|
||||
|
||||
### Inheritance example
|
||||
**Inheritance example**
|
||||
|
||||
The example below shows how the notification policy tree from the previous example allows the child policies of the `team=operations` to inherit its contact point.
|
||||
{{< figure src="/media/docs/alerting/notification-inheritance.png" max-width="750px" alt="Simple example inhering notification settings" >}}
|
||||
|
||||
In this way, you can avoid having to specify the same contact point multiple times for each child policy.
|
||||
|
||||
{{< figure src="/media/docs/alerting/notification-inheritance.png" max-width="750px" caption="Notification policy inheritance" >}}
|
||||
|
||||
## Additional configuration options
|
||||
|
||||
### Grouping
|
||||
|
||||
Grouping is an important feature of Grafana Alerting as it allows you to batch relevant alerts together into a smaller number of notifications. This is particularly important if notifications are delivered to first-responders, such as engineers on-call, where receiving lots of notifications in a short period of time can be overwhelming and in some cases can negatively impact a first-responders ability to respond to an incident. For example, consider a large outage where many of your systems are down. In this case, grouping can be the difference between receiving 1 phone call and 100 phone calls.
|
||||
|
||||
Choose how alerts are grouped together using the Group by option in a notification policy. By default, notification policies in Grafana group alerts together by alert rule using the `alertname` and `grafana_folder` labels (since alert names are not unique across multiple folders). If you want to group alerts by something other than the alert rule, change the grouping to any other combination of labels.
|
||||
|
||||
#### Disable grouping
|
||||
|
||||
If you want to receive every alert as a separate notification, you can do so by grouping by a special label called `...`. This is useful when your alerts are being delivered to an automated system instead of a first-responder.
|
||||
|
||||
#### A single group for all alerts
|
||||
|
||||
If you want to receive all alerts together in a single notification, you can do so by leaving Group by empty.
|
||||
|
||||
### Timing options
|
||||
|
||||
The timing options decide how often notifications are sent for each group of alerts. There are three timers that you need to know about: Group wait, Group interval, and Repeat interval.
|
||||
|
||||
#### Group wait
|
||||
|
||||
Group wait is the amount of time Grafana waits before sending the first notification for a new group of alerts. The longer Group wait is the more time you have for other alerts to arrive. The shorter Group wait is the earlier the first notification is sent, but at the risk of sending incomplete notifications. You should always choose a Group wait that makes the most sense for your use case.
|
||||
|
||||
**Default** 30 seconds
|
||||
|
||||
#### Group interval
|
||||
|
||||
Once the first notification has been sent for a new group of alerts, the Group interval timer starts. This is the amount of wait time before notifications about changes to the group are sent. For example, another firing alert might have just been added to the group while an existing alert might have resolved. If an alert was too late to be included in the first notification due to Group wait, it is included in subsequent notifications after Group interval. Once Group interval has elapsed, Grafana resets the Group interval timer. This repeats until there are no more alerts in the group after which the group is deleted.
|
||||
|
||||
**Default** 5 minutes
|
||||
|
||||
#### Repeat interval
|
||||
|
||||
Repeat interval decides how often notifications are repeated if the group has not changed since the last notification. You can think of these as reminders that some alerts are still firing. Repeat interval is closely related to Group interval, which means your Repeat interval must not only be greater than or equal to Group interval, but also must be a multiple of Group interval. If Repeat interval is not a multiple of Group interval it is coerced into one. For example, if your Group interval is 5 minutes, and your Repeat interval is 9 minutes, the Repeat interval is rounded up to the nearest multiple of 5 which is 10 minutes.
|
||||
|
||||
**Default** 4 hours
|
||||
This example shows how the notification policy tree from the previous example allows the child policies of the `team=operations` to inherit its contact point. In this way, you can avoid specifying the same contact point multiple times for each child policy.
|
||||
|
@ -17,7 +17,7 @@ labels:
|
||||
- enterprise
|
||||
- oss
|
||||
title: Templates
|
||||
weight: 114
|
||||
weight: 115
|
||||
refs:
|
||||
variables-label-annotation:
|
||||
- pattern: /docs/grafana/
|
||||
|
Reference in New Issue
Block a user