Add a temporary translation layer []dataset.Row -> columnar.RecordBatch -> arrow.RecordBatch.
- reader_adapter.go is temporary that's why I put it into the old columnar package
- I added builders to reader_adapter.go and not to pkg/columnar.... as I'm not sure if we really would need them anywhere except for this adapter. If so I'm happy to move them to pkg/columnar.
- I only updated indexpointers reader for now but will add support for more readers in the next PRs
Changing metastore implementation to work with arrow.RecordBatch without intermediate materialization to structs.
metastore.Sections now uses new metastore.GetIndexes, metastore.IndexSectionsReader and metastore.CollectSections.
We'd like to run metastore queries in a distributed manner using engine v2. In order to achieve that we need to split the work that metastore.Sections does internally, schedule it on different workers, and collect the results. The engine achieves that via pipelines that produce arrow.RecordBatch-es. In this PR the metastore.Sections is updated such a way to make it easier to integrate metastore with the engine.
This PR is a continuation of https://github.com/grafana/loki/pull/19992. I use `pointers.Reader` to refine the sections that match bloom filters.
I also fix minor issues that I introduced in the previous PR:
- convert the columns only once per column (instead of doing it for every row)
- ignore the sections that are of a wrong "type"
Adding columnar reader for (log)pointers section. Using it to read the stream section pointers.
New reader is slower than existing RowReader due to double-translation of the data (columns->rows->rows vs columns->rows->columns->rows). The idea though is to get rid of the double-translation in the future.
This PR changes the evaluation of regex-match binary expression evaluations so that the regexp on right hand side of the expression is only compiled once for each batch evaluation, rather than for each row.
This saves a ton of allocations and improves general performance of queries with a regexp line or label filter.
This PR introduces a new configuration option `-dataobj-consumer.dataobj-sort-order` which controls the sort order of logs in the logs section of the dataobj when building new objects.
There are two available sort options:
1. `timestamp DESC, streamID ASC` (configuration value `timestamp-desc`)
2. `streamID ASC, timestamp DESC` (configuration value `stream-asc`)
The new setting is undocumented and may be removed in the future without notice.
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Sorting logs globally (object-wide per tenant) removes overlapping time ranges of sections in the objects.
Sections contain logs from multiple streams, and the ingest lag of streams may vary. This means that although logs of individual sections are sorted by timestamp, the overall sorting of logs is not guaranteed. Therefore sections are sorted with a k-way merge (SortMerge), which does over-query data in case the query would reach its result limit early.
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
This PR introduces an option for the column builder to limit the number of rows in a page.
It can be set for logs object and index objects separately using `-dataobj-consumer.max-page-rows` and `-dataobj-index-builder.max-page-rows` respectively.
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
dataset.readerDownloader was originally introduced in #16429, an attempt to
balance peak memory usage of reading a section with read times by downloading a
configurable size of pages in advance.
In practice, each roundtrip to object storage adds too much of a latency hit,
and we've started to set the cache limit high enough to ensure that each reader
only needs a single prefetch. Given what we've found, it no longer makes sense
to control peak memory usage via the prefetch size. Other options, such as
downloading directly to disk, may be explored in the future.
In the meantime, this PR removes the ability to specify a cache size. All
non-pruned pages will be bulk requested using the range reader (#19067) on the
first read call. Pages which have left the potential read window will continue
to be eagerly removed for garbage collection.
However, we don't want to prefetch when the dataset is entirely in memory,
which is the case when the logs section builder is performing k-way merge over
in-memory sections. To lower the memory usage of builders, prefetching is
configurable. For this initial PR, prefetching is only disabled for the logs
section builder; all other reads force prefetching.
Signed-off-by: Robert Fratto <robertfratto@gmail.com>