mirror of
https://github.com/grafana/loki.git
synced 2026-03-13 09:33:58 +08:00
feat(operator): Add warning alert for when LokiStack is not getting ready (#19258)
Signed-off-by: Joao Marcal <jmarcal@redhat.com>
This commit is contained in:
@@ -411,3 +411,48 @@ The schema configuration does not contain the most recent schema version and nee
|
||||
### Steps
|
||||
|
||||
- Add a new object storage schema V13 with a future EffectiveDate
|
||||
|
||||
## Lokistack Components Not Ready Warning
|
||||
|
||||
### Impact
|
||||
|
||||
One or more LokiStack components are not ready, which can disrupt ingestion or querying and lead to degraded service.
|
||||
|
||||
### Summary
|
||||
|
||||
The LokiStack reports that some components have not reached the `Ready` state. This might be related to Kubernetes resources (Pods/Deployments), configuration, or external dependencies.
|
||||
|
||||
### Severity
|
||||
|
||||
`Warning`
|
||||
|
||||
### Access Required
|
||||
|
||||
- Console access to the cluster
|
||||
- Edit or view access in the namespace where the LokiStack is deployed:
|
||||
- OpenShift
|
||||
- `openshift-logging` (LokiStack)
|
||||
|
||||
### Steps
|
||||
|
||||
- Inspect the LokiStack conditions and events
|
||||
- Describe the LokiStack resource and review status conditions:
|
||||
- `kubectl -n <namespace> describe lokistack <name>`
|
||||
- Check for conditions that would lead to some pods not being in the `Ready` state
|
||||
- Check operator and reconciliation status
|
||||
- Ensure the Loki Operator is running and not reporting errors:
|
||||
- `kubectl -n <operator-namespace> logs deploy/loki-operator-controller-manager`
|
||||
- Look for reconcile errors related to missing permissions, invalid fields, or failed rollouts.
|
||||
- Verify component Pods and Deployments
|
||||
- Ensure all core components are running and Ready in the LokiStack namespace:
|
||||
- `distributor`, `ingester`, `querier`, `query-frontend`, `index-gateway`, `compactor`, `gateway`
|
||||
- Check Pod readiness and recent restarts:
|
||||
- `kubectl -n <namespace> get pods`
|
||||
- `kubectl -n <namespace> describe pod <pod>`
|
||||
- Examine Kubernetes events for failures
|
||||
- `kubectl -n <namespace> get events --sort-by=.lastTimestamp`
|
||||
- Common causes: image pull backoffs, failed mounts, readiness probe failures, or insufficient resources
|
||||
- Validate configuration and referenced resources
|
||||
- Confirm referenced `Secrets` and `ConfigMaps` exist and have correct keys
|
||||
- Look into the Pod logs of the component that still not `Ready`:
|
||||
- `kubectl -n <namespace> logs <pod>`
|
||||
@@ -227,3 +227,20 @@ groups:
|
||||
for: 1m
|
||||
labels:
|
||||
severity: warning
|
||||
- alert: LokistackComponentsNotReadyWarning
|
||||
annotations:
|
||||
description: |-
|
||||
The LokiStack "{{ $labels.stack_name }}" in namespace "{{ $labels.namespace }}" has components that are not ready.
|
||||
summary: "One or more LokiStack components are not ready."
|
||||
runbook_url: "[[ .RunbookURL ]]#Lokistack-Components-Not-Ready-Warning"
|
||||
expr: |
|
||||
sum (
|
||||
label_replace(
|
||||
lokistack_status_condition{reason="ReadyComponents", status="false"},
|
||||
"namespace", "$1", "stack_namespace", "(.+)"
|
||||
)
|
||||
) by (stack_name, namespace)
|
||||
> 0
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
|
||||
@@ -66,6 +66,9 @@ tests:
|
||||
- series: 'loki_discarded_samples_total{namespace="my-ns", tenant="application", reason="line_too_long"}'
|
||||
values: '0x5 0+120x25 3000'
|
||||
|
||||
- series: 'lokistack_status_condition{stack_name="mystack", stack_namespace="my-ns", reason="ReadyComponents", status="false"}'
|
||||
values: '1+0x25'
|
||||
|
||||
- series: 'loki_ingester_chunks_flush_failures_total{namespace="my-ns", pod="ingester-0"}'
|
||||
values: '0+25x20'
|
||||
- series: 'loki_ingester_chunks_flush_requests_total{namespace="my-ns", pod="ingester-0"}'
|
||||
@@ -200,6 +203,17 @@ tests:
|
||||
summary: Loki is discarding samples during ingestion because they fail validation.
|
||||
runbook_url: "[[ .RunbookURL]]#Loki-Discarded-Samples-Warning"
|
||||
- eval_time: 16m
|
||||
alertname: LokistackComponentsNotReadyWarning
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
namespace: my-ns
|
||||
stack_name: mystack
|
||||
severity: warning
|
||||
exp_annotations:
|
||||
description: 'The LokiStack "mystack" in namespace "my-ns" has components that are not ready.'
|
||||
summary: "One or more LokiStack components are not ready."
|
||||
runbook_url: "[[ .RunbookURL ]]#Lokistack-Components-Not-Ready-Warning"
|
||||
- eval_time: 16m
|
||||
alertname: LokiIngesterFlushFailureRateCritical
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
|
||||
Reference in New Issue
Block a user