Commit Graph

34 Commits

Author SHA1 Message Date
Ashwanth
c24e1b27e8 chore: End index reader span on Close (#20854) 2026-02-18 20:20:08 +05:30
Robert Fratto
0437dabfb9 chore(metastore): fix metastore.indexSectionsReader Open semantics (#20835) 2026-02-17 13:32:34 -05:00
Ashwanth
73961f18a3 chore(xcap): leaner xcap API (#20771) 2026-02-17 19:42:33 +05:30
benclive
cd16b029db chore: fix panic on read err in batchDecoratorReader (#20787) 2026-02-13 16:13:17 +00:00
Robert Fratto
d85fb0317f chore(dataobj): Add Open semantics to section readers (#20790)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2026-02-12 13:50:28 -05:00
Robert Fratto
7e03400309 chore(dataobj): rename dataset.ReaderOptions to dataset.RowReaderOptions (#20756) 2026-02-10 13:15:18 -05:00
benclive
a029934ef2 chore: resolve predicate columns in distributed metastore (#20504) 2026-02-04 12:30:11 +00:00
Robert Fratto
736ccdbf93 chore(dataobj): rename dataset.Reader to RowReader (#20629) 2026-01-30 16:43:44 -05:00
Robert Fratto
747d75aa78 chore: make memory, columnar APIs more idiomatic (#20622) 2026-01-30 10:04:51 -05:00
Ivan Kalita
86f87e447c chore(dataobj): use reader adapter (#20572)
Continuation of #20525, I'm updating logs, pointers, streams readers to use reader adapter. Also adding binary support to arrowconv.
2026-01-27 10:49:32 +01:00
Ivan Kalita
e4ec8441d3 feat(metastore): use arrow for scanning and blooms (#20234)
Changing metastore implementation to work with arrow.RecordBatch without intermediate materialization to structs.
metastore.Sections now uses new metastore.GetIndexes, metastore.IndexSectionsReader and metastore.CollectSections.

We'd like to run metastore queries in a distributed manner using engine v2. In order to achieve that we need to split the work that metastore.Sections does internally, schedule it on different workers, and collect the results. The engine achieves that via pipelines that produce arrow.RecordBatch-es. In this PR the metastore.Sections is updated such a way to make it easier to integrate metastore with the engine.
2025-12-18 13:30:15 +01:00
Ivan Kalita
ee58c333b3 chore(metastore): use column reader for blooms (#20053)
This PR is a continuation of https://github.com/grafana/loki/pull/19992. I use `pointers.Reader` to refine the sections that match bloom filters.

I also fix minor issues that I introduced in the previous PR:
- convert the columns only once per column (instead of doing it for every row)
- ignore the sections that are of a wrong "type"
2025-12-01 14:08:58 +01:00
Ivan Kalita
3559d4bd42 chore(metastore): add column reader for pointers section (#19992)
Adding columnar reader for (log)pointers section. Using it to read the stream section pointers.

New reader is slower than existing RowReader due to double-translation of the data (columns->rows->rows vs columns->rows->columns->rows). The idea though is to get rid of the double-translation in the future.
2025-11-28 13:47:59 +01:00
benclive
dfb60dcaf0 fix: Metastore timerange filter (#19171) 2025-09-11 12:02:43 +00:00
Christian Haudum
4c81ebc7c3 chore(dataobj): Add option to set target row limit for pages (#19128)
This PR introduces an option for the column builder to limit the number of rows in a page.

It can be set for logs object and index objects separately using `-dataobj-consumer.max-page-rows` and `-dataobj-index-builder.max-page-rows` respectively.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
2025-09-09 17:55:36 +02:00
Robert Fratto
f6091a67d1 chore(engine): move to toggleable section prefetching (#19142)
dataset.readerDownloader was originally introduced in #16429, an attempt to
balance peak memory usage of reading a section with read times by downloading a
configurable size of pages in advance.

In practice, each roundtrip to object storage adds too much of a latency hit,
and we've started to set the cache limit high enough to ensure that each reader
only needs a single prefetch. Given what we've found, it no longer makes sense
to control peak memory usage via the prefetch size. Other options, such as
downloading directly to disk, may be explored in the future.

In the meantime, this PR removes the ability to specify a cache size. All
non-pruned pages will be bulk requested using the range reader (#19067) on the
first read call. Pages which have left the potential read window will continue
to be eagerly removed for garbage collection.

However, we don't want to prefetch when the dataset is entirely in memory,
which is the case when the logs section builder is performing k-way merge over
in-memory sections. To lower the memory usage of builders, prefetching is
configurable. For this initial PR, prefetching is only disabled for the logs
section builder; all other reads force prefetching.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2025-09-09 11:54:34 -04:00
benclive
0b9b4c1ee5 chore: Support multi-tenancy in indexobj builder (#18880) 2025-08-26 14:05:14 +01:00
Robert Fratto
595715146f chore(dataobj)!: introduce new centralized dataset encoding format (#18871)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2025-08-15 16:29:07 -04:00
benclive
19d90ef023 chore: Add sorting & tenant info to remaining section types (#18861) 2025-08-14 16:00:45 +00:00
Robert Fratto
642d826a00 chore(dataobj): make section tenancy a global construct (#18857)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
2025-08-14 14:29:04 +00:00
Ashwanth
5203a0c9ad chore(dataobj): add reader stats (#18694) 2025-08-14 18:29:02 +05:30
Robert Fratto
100e259a8d chore(dataobj): introduce the concept of section info "extensions" (#18832) 2025-08-13 10:14:04 -04:00
Robert Fratto
2ca8e58b0e chore(dataobj): add ability to buffer pending sections to disk (#18780)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2025-08-12 11:57:04 -04:00
Robert Fratto
40057f5ce2 chore(dataobj): permit reading subset of section metadata (#18815)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2025-08-12 10:12:56 -04:00
Robert Fratto
d4b027ae0c chore(dataobj): return Object on Builder.Flush (#18622) 2025-08-06 12:52:28 -04:00
Robert Fratto
061d1c1d22 chore(dataobj): make copy of pages from downloaded window (#18566) 2025-07-24 10:55:28 -04:00
Ashwanth
da4e59f8c6 chore(dataobj): store sort order information in section metadata (#18499) 2025-07-24 18:25:36 +05:30
benclive
41d93f8a86 fix: Fix empty streamIDs in multi-section index objects (#18556) 2025-07-24 13:30:54 +01:00
benclive
077e4196ef chore: Add indexes to LogQL benchmarks (#18500) 2025-07-18 19:03:41 +00:00
benclive
517c317c97 chore: Integrate indexes with the new query engine (#18427) 2025-07-17 15:04:40 +00:00
benclive
d0a3cd86a7 chore: Remove lookback from stats in unsupported sections (#18489) 2025-07-17 16:00:37 +01:00
benclive
1f0edcd4ac feat: Initial index-builder implementation (#18297) 2025-07-11 15:21:38 +01:00
benclive
a238816751 perf: Use map for InPredicate when reading dataobj (#18325) 2025-07-07 09:45:11 +01:00
benclive
9642af3ac6 chore: Add new pointers section type for dataobj (#18243) 2025-06-27 14:00:51 +01:00