Files
antonio f670cf7920 update + sandboxfy tutorial (#92419)
* update + sandboxfy tutorial

* fix formatting

* update

* update2

* numbering

* final draft v1

* draft v1

* added jay's feedback

* query

* latest fixes

* typo

* all pretty no pity
2024-08-30 13:15:35 +02:00

13 KiB
Raw Blame History

Feedback Link authors categories description id labels tags title weight killercoda
https://github.com/grafana/tutorials/issues/new
melori_arellano
alerting
Create alerts with Logs grafana-alerts-with-loki
products
enterprise
oss
cloud
loki
alerting
advanced
How to create alerts with log data 70
title description preprocessing backend
How to create alerts with log data Learn how to use Loki with Grafana Alerting to keep track of whats happening in your environment with real log data.
substitutions
regexp replacement
docker compose docker-compose
imageid
ubuntu

How to create alert rules with log data

Loki stores your logs and only indexes labels for each log stream. Using Loki with Grafana Alerting is a powerful way to keep track of whats happening in your environment. You can create metric alert rules based on content in your log lines to notify your team. Whats even better is that you can add label data from the log message directly into your alert notification.

In this tutorial, you'll:

  • Generate sample logs and pull them with Promtail to Grafana.
  • Create an alert rule based on a Loki query (LogQL).
  • Create a Webhook contact point to send alert notifications to.

{{< admonition type="tip" >}} Check out our advanced alerting tutorial to explore advanced topics such as alert instances and notification routing. {{< /admonition >}}

{{< docs/ignore >}}

Check out our advanced alerting tutorial to explore advanced topics such as alert instances and notification routing.

{{< /docs/ignore >}}

Before you begin

Grafana Cloud users

As a Grafana Cloud user, you don't have to install anything.

Continue to Generate sample logs.

Grafana OSS users

In order to run a Grafana stack locally, ensure you have the following applications installed.

To demonstrate the observation of data using the Grafana stack, download the files to your local machine.

  1. Download and save a Docker compose file to run Grafana, Loki and Promtail.

    wget https://raw.githubusercontent.com/grafana/loki/v2.8.0/production/docker-compose.yaml -O docker-compose.yaml
    
  2. Run the Grafana stack.

    docker compose up -d
    

The first time you run docker compose up -d, Docker downloads all the necessary resources for the tutorial. This might take a few minutes, depending on your internet connection.

{{< admonition type="note" >}}

If you already have Grafana, Loki, or Prometheus running on your system, you might see errors, because the Docker image is trying to use ports that your local installations are already using. If this is the case, stop the services, then run the command again.

{{< /admonition >}}

{{< docs/ignore >}}

If you already have Grafana, Loki, or Prometheus running on your system, you might see errors, because the Docker image is trying to use ports that your local installations are already using. If this is the case, stop the services, then run the command again.

{{< /docs/ignore >}}

{{< admonition type="tip" >}} Alternatively, you can try out this example in our interactive learning environment: Get started with Grafana Alerting.

It's a fully configured environment with all the dependencies already installed.

Interactive

Provide feedback, report bugs, and raise issues in the Grafana Killercoda repository. {{< /admonition >}}

Generate sample logs

  1. Download and save a Python file that generates logs.

    wget https://raw.githubusercontent.com/grafana/tutorial-environment/master/app/loki/web-server-logs-simulator.py
    
  2. Execute the log-generating Python script.

    python3 ./web-server-logs-simulator.py | sudo tee -a /var/log/web_requests.log
    

Troubleshooting the script

If you don't see the sample logs in Explore:

  • Does the output file exist, check /var/log/web_requests.log to see if it contains logs.
  • If the file is empty, check that you followed the steps above to create the file.
  • If the file exists, verify that promtail container is running.
  • In Grafana Explore, check that the time range is only for the last 5 minutes.

Create a contact point

Besides being an open-source observability tool, Grafana has its own built-in alerting service. This means that you can receive notifications whenever there is an event of interest in your data, and even see these events graphed in your visualizations.

In this step, we'll set up a new contact point. This contact point will use the webhooks integration. In order to make this work, we also need an endpoint for our webhook integration to receive the alert. We will use Webhook.site to quickly set up that test endpoint. This way we can make sure that our alert is actually sending a notification somewhere.

  1. In your browser, sign in to your Grafana Cloud account.

    OSS users: To log in, navigate to http://localhost:3000, where Grafana is running.

  2. In another tab, go to Webhook.site.

  3. Copy Your unique URL.

{{< docs/ignore >}}

  1. Navigate to http://localhost:3000, where Grafana is running.
  2. In another tab, go to Webhook.site.
  3. Copy Your unique URL. {{< /docs/ignore >}}

Your webhook endpoint is now waiting for the first request.

Next, let's configure a contact point in Grafana's Alerting UI to send notifications to our webhook endpoint.

  1. Return to Grafana. In Grafana's sidebar, hover over the Alerting (bell) icon and then click Contact points.

  2. Click + Add contact point.

  3. In Name, write Webhook.

  4. In Integration, choose Webhook.

  5. In URL, paste the endpoint to your webhook endpoint.

  6. Click Test, and then click Send test notification to send a test alert to your webhook endpoint.

  7. Navigate back to Webhook.site. On the left side, there's now a POST / entry. Click it to see what information Grafana sent.

    {{< figure src="/media/docs/alerting/alerting-webhook-detail.png" max-width="1200px" caption="A POST entry in Webhook.site" >}}

  8. Return to Grafana and click Save contact point.

We have created a dummy Webhook endpoint and created a new Alerting contact point in Grafana. Now, we can create an alert rule and link it to this new integration.

Create an alert ruke

Next, we'll establish an alert rule within Grafana Alerting to notify us whenever alert rules are triggered and resolved.

  1. In Grafana, navigate to Alerting > Alert rules.
  2. Click on New alert rule.
  3. Enter alert rule name for your alert rule. Make it short and descriptive as this will appear in your alert notification. For instance, web-requests-logs

Define query and alert condition

In this section, we define queries, expressions (used to manipulate the data), and the condition that must be met for the alert to be triggered.

  1. Select the Loki datasource from the drop-down.

  2. In the Query editor, switch to Code mode by clicking the button on the right.

  3. Paste the query below.

    sum by (message)(count_over_time({filename="/var/log/web_requests.log"} != "status=200" | pattern "<_> <message> duration<_>" [10m]))
    

This query will count the number of log lines with a status code that is not 200 (OK), then sum the result set by message type using an instant query and the time interval indicated in brackets. It uses the LogQL pattern parser to add a new label called message that contains the level, method, url, and status from the log line.

You can use the explain query toggle button for a full explanation of the query syntax. The optional log-generating script creates a sample log line similar to the one below:

2023-04-22T02:49:32.562825+00:00 level=info method=GET url=test.com status=200 duration=171ms

{{% admonition type="note" %}}

If you're using your own logs, modify the LogQL query to match your own log message. Refer to the Loki docs to understand the pattern parser.

{{% / admonition %}}

{{< docs/ignore >}}

If you're using your own logs, modify the LogQL query to match your own log message. Refer to the Loki docs to understand the pattern parser.

{{< /docs/ignore >}}

  1. Remove the B Reduce expression (click the bin icon). The Reduce expression comes by default, and in this case, it is not needed since the queried data is already reduced. Note that the Threshold expression is now your Alert condition.

  2. In the C Threshold expression:

    • Change the Input to 'A' to select the data source.
    • Enter 0 as the threshold value. This is the value above which the alert rule should trigger.
  3. Click Preview to run the queries.

    It should return alert instances from log lines with a status code that is not 200 (OK), and that has met the alert condition. The condition for the alert rule to fire is any occurrence that goes over the threshold of 0. Since the Loki query has returned more than zero alert instances, the alert rule is Firing.

    {{< figure src="/media/docs/alerting/expression-loki-alert.png" max-width="1200px" caption="Preview of a firing alert instances" >}}

Set evaluation behavior

An evaluation group defines when an alert rule fires, and its based on two settings:

  • Evaluation group: how frequently the alert rule is evaluated.
  • Evaluation interval: how long the condition must be met to start firing. This allows your data time to stabilize before triggering an alert, helping to reduce the frequency of unnecessary notifications.

To set up the evaluation:

  1. In Folder, click + New folder and enter a name. For example: web-server-alerts. This folder will contain our alerts.
  2. In the Evaluation group, repeat the above step to create a new evaluation group. We will name it 1m-evaluation.
  3. Choose an Evaluation interval (how often the alert will be evaluated). For example, every 1m (1 minute).
  4. Set the pending period to, 0s (zero seconds), so the alert rule fires the moment the condition is met.

Configure labels and notifications

Choose the contact point where you want to receive your alert notifications.

  1. Under Contact point, select Webhook from the drop-down menu.
  2. Click Save rule and exit at the top right corner.

Trigger the alert rule

Since the Python script will continue to generate log data that matches the alert rule condition, once the evaluation interval has concluded, you should receive an alert notification in the Webhook endpoint.

{{< figure src="/media/docs/alerting/alerting-webhook-firing-alert.png" max-width="1200px" caption="Firing alert notification details" >}}

{{< admonition type="tip" >}}

Advance your skills by exploring alert instances and notification routing in Part 2 of your learning journey.

{{< /admonition >}}

{{< docs/ignore >}}

Advance your skills by exploring alert instances and notification routing in Part 2 of your learning journey.

{{< /docs/ignore >}}