**What this PR does / why we need it**:
Previously we relied only on the index of the tenant's queue to read the
requests from the same tenant's queue.
However, as long as the queue is aggressively used by the consumers in
parallel, there are a few edge cases when knowing the index of last used
tenant's queue is not enough to guarantee that we dequeue items from
exactly the same tenant's queue.
**Special notes for your reviewer**:
**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e3ec)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a4b0)
---------
Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
**What this PR does / why we need it**:
Just a minor spelling fix in a comment, no functional change.
**Which issue(s) this PR fixes**:
Fixes #<issue number>
**Special notes for your reviewer**:
**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e3ec)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a4b0)
**What this PR does / why we need it**:
We would like to know how long a querier worker is idle to understand if
workstealing would have an impact. The original metric was too noisy and
its cardinality was too high. Instead, we are going to log the wait
time.
**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e3ec)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a4b0)
---------
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
**What this PR does / why we need it**:
Adds a new config `frontend.max-query-capacity` that allows users to
configure what portion of the the available querier replicas can be used
by a tenant. `max_query_capacity` is the corresponding YAML option that
can be configured in limits or runtime overrides.
For example, setting this to 0.5 would allow a tenant to use half of the
available queriers.
This complements the existing `frontend.max-queriers-per-tenant`. When
both are configured, the smaller value of the resulting querier replica
count is considered:
```
min(frontend.max-queriers-per-tenant, ceil(querier_replicas * frontend.max-query-capacity))
```
*All* queriers will handle requests for a tenant if neither limits are
applied.
**Which issue(s) this PR fixes**:
Fixes #<issue number>
**Special notes for your reviewer**:
noticed that we don't pass down the shuffle sharding limits for frontend
(only using it with schedulers)
26f097162a/pkg/loki/modules.go (L895)
but the
[docs](26f097162a/pkg/validation/limits.go (L276))
mention that`frontend.max-queriers-per-tenant` applies to frontend as
well.
```
This option only works with queriers connecting to the query-frontend / query-scheduler, not when using downstream URL.
```
**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [x] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e3ec)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a4b0)
---------
Co-authored-by: J Stickler <julie.stickler@grafana.com>
Co-authored-by: Danny Kopping <danny.kopping@grafana.com>
This change adds an internal request queue to the bloom gateway. Instead of executing every single request individually, which involves resolving bloom blocks, downloading them if needed and executing the chunk filtering, requests are now enqueued to the internal, per-tenant queue. The queue implements the same shuffle sharding mechanism as the queue in the query scheduler component.
Workers then dequeue a batch of requests for a single tenant and multiplex them into a single processing task for each day. This has the big advantage that the chunks of multiple requests can be processed in a single sequential scan through a set a bloom blocks, without needing to skip back and forth within the binary stream of the block.
---------
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**:
Use the metrics namespace setting instead of hardcoding to `cortex`.
This is a follow up to (and based on)
https://github.com/grafana/loki/pull/11014.
**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [X] Tests updated
- [x] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e3ec)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory. <!--
TODO(salvacorts): Add example PR -->
---------
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
Co-authored-by: Ashwanth <iamashwanth@gmail.com>
Instead of calling the bloom store directly on each and every request to filter chunk refs based on the given filters, we want to queue requests in per-tenant queues and process batches of requests that can be multiplexed to avoid excessive seeking in the bloom block queriers when checking chunk matches.
This PR re-uses the request queue implementation used in the query scheduler. To do so, it moves the queue related code from the scheduler into a separate package `pkg/queue` and renames occurrences of "querier" to "consumer" to be more generic.
The bloom gateway instantiates the request queue when starting the service. The gRPC method `FilterChunkRefs` then enqueues incoming requests to that queue.
**Special notes for your reviewer**:
For testing purposes, this PR also contains a dummy implementation of the workers. The worker implementation - which includes multiplexing of multiple tasks - is subject to a separate PR.
---------
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>