Changing metastore implementation to work with arrow.RecordBatch without intermediate materialization to structs.
metastore.Sections now uses new metastore.GetIndexes, metastore.IndexSectionsReader and metastore.CollectSections.
We'd like to run metastore queries in a distributed manner using engine v2. In order to achieve that we need to split the work that metastore.Sections does internally, schedule it on different workers, and collect the results. The engine achieves that via pipelines that produce arrow.RecordBatch-es. In this PR the metastore.Sections is updated such a way to make it easier to integrate metastore with the engine.
This PR is a continuation of https://github.com/grafana/loki/pull/19992. I use `pointers.Reader` to refine the sections that match bloom filters.
I also fix minor issues that I introduced in the previous PR:
- convert the columns only once per column (instead of doing it for every row)
- ignore the sections that are of a wrong "type"
Adding columnar reader for (log)pointers section. Using it to read the stream section pointers.
New reader is slower than existing RowReader due to double-translation of the data (columns->rows->rows vs columns->rows->columns->rows). The idea though is to get rid of the double-translation in the future.
This PR introduces an option for the column builder to limit the number of rows in a page.
It can be set for logs object and index objects separately using `-dataobj-consumer.max-page-rows` and `-dataobj-index-builder.max-page-rows` respectively.
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
dataset.readerDownloader was originally introduced in #16429, an attempt to
balance peak memory usage of reading a section with read times by downloading a
configurable size of pages in advance.
In practice, each roundtrip to object storage adds too much of a latency hit,
and we've started to set the cache limit high enough to ensure that each reader
only needs a single prefetch. Given what we've found, it no longer makes sense
to control peak memory usage via the prefetch size. Other options, such as
downloading directly to disk, may be explored in the future.
In the meantime, this PR removes the ability to specify a cache size. All
non-pruned pages will be bulk requested using the range reader (#19067) on the
first read call. Pages which have left the potential read window will continue
to be eagerly removed for garbage collection.
However, we don't want to prefetch when the dataset is entirely in memory,
which is the case when the logs section builder is performing k-way merge over
in-memory sections. To lower the memory usage of builders, prefetching is
configurable. For this initial PR, prefetching is only disabled for the logs
section builder; all other reads force prefetching.
Signed-off-by: Robert Fratto <robertfratto@gmail.com>