Commit Graph

78 Commits

Author SHA1 Message Date
Ashwanth
c24e1b27e8 chore: End index reader span on Close (#20854) 2026-02-18 20:20:08 +05:30
Robert Fratto
0437dabfb9 chore(metastore): fix metastore.indexSectionsReader Open semantics (#20835) 2026-02-17 13:32:34 -05:00
Ashwanth
73961f18a3 chore(xcap): leaner xcap API (#20771) 2026-02-17 19:42:33 +05:30
benclive
cd16b029db chore: fix panic on read err in batchDecoratorReader (#20787) 2026-02-13 16:13:17 +00:00
Robert Fratto
d85fb0317f chore(dataobj): Add Open semantics to section readers (#20790)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2026-02-12 13:50:28 -05:00
Robert Fratto
7e03400309 chore(dataobj): rename dataset.ReaderOptions to dataset.RowReaderOptions (#20756) 2026-02-10 13:15:18 -05:00
benclive
a029934ef2 chore: resolve predicate columns in distributed metastore (#20504) 2026-02-04 12:30:11 +00:00
Robert Fratto
9d9d8ce14d chore(engine): introduce experimental expression engine (#20626) 2026-02-02 09:31:41 -05:00
Sandeep Sukhani
7f1450d8c5 feat(dataobj-compactor): add a planner for compaction of data objects (#20527) 2026-02-02 16:37:24 +05:30
Robert Fratto
736ccdbf93 chore(dataobj): rename dataset.Reader to RowReader (#20629) 2026-01-30 16:43:44 -05:00
Robert Fratto
747d75aa78 chore: make memory, columnar APIs more idiomatic (#20622) 2026-01-30 10:04:51 -05:00
Robert Fratto
08aaaa3754 chore(columnar): add scalars and builders (#20583) 2026-01-27 17:50:59 +00:00
Robert Fratto
3e3a293215 chore(columnar): switch to generic Number type (#20580) 2026-01-27 09:17:02 -05:00
Ivan Kalita
86f87e447c chore(dataobj): use reader adapter (#20572)
Continuation of #20525, I'm updating logs, pointers, streams readers to use reader adapter. Also adding binary support to arrowconv.
2026-01-27 10:49:32 +01:00
Ivan Kalita
aab681617b chore(dataobj): add reader_adapter (#20525)
Add a temporary translation layer []dataset.Row -> columnar.RecordBatch -> arrow.RecordBatch.

- reader_adapter.go is temporary that's why I put it into the old columnar package
- I added builders to reader_adapter.go and not to pkg/columnar.... as I'm not sure if we really would need them anywhere except for this adapter. If so I'm happy to move them to pkg/columnar.
- I only updated indexpointers reader for now but will add support for more readers in the next PRs
2026-01-23 10:51:09 +01:00
yy
0574159216 chore: typos in comments (#20481) 2026-01-20 08:24:56 -05:00
Stas Spiridonov
c2116618bd chore: Improving memory allocations for new query engine (#20321) 2026-01-07 12:34:15 -05:00
Ivan Kalita
e4ec8441d3 feat(metastore): use arrow for scanning and blooms (#20234)
Changing metastore implementation to work with arrow.RecordBatch without intermediate materialization to structs.
metastore.Sections now uses new metastore.GetIndexes, metastore.IndexSectionsReader and metastore.CollectSections.

We'd like to run metastore queries in a distributed manner using engine v2. In order to achieve that we need to split the work that metastore.Sections does internally, schedule it on different workers, and collect the results. The engine achieves that via pipelines that produce arrow.RecordBatch-es. In this PR the metastore.Sections is updated such a way to make it easier to integrate metastore with the engine.
2025-12-18 13:30:15 +01:00
Ashwanth
7da7f000ff chore: use projection for indexobj streams reader (#20205) 2025-12-16 07:31:10 +00:00
Ashwanth
54f9723af1 chore(xcap): adds exporter to summarise capture as a structured log line (#20099) 2025-12-04 13:34:55 +00:00
Ashwanth
728b308edf chore: update read stats to use xcap (#20095) 2025-12-04 18:34:58 +05:30
Ivan Kalita
827292d8ed chore(metastore): add columnar indexpointers reader (#20068)
Similar to https://github.com/grafana/loki/pull/20053 and https://github.com/grafana/loki/pull/19992 I add a columnar reader for index pointers sections.
2025-12-03 13:19:40 +01:00
Ivan Kalita
ee58c333b3 chore(metastore): use column reader for blooms (#20053)
This PR is a continuation of https://github.com/grafana/loki/pull/19992. I use `pointers.Reader` to refine the sections that match bloom filters.

I also fix minor issues that I introduced in the previous PR:
- convert the columns only once per column (instead of doing it for every row)
- ignore the sections that are of a wrong "type"
2025-12-01 14:08:58 +01:00
Ivan Kalita
3559d4bd42 chore(metastore): add column reader for pointers section (#19992)
Adding columnar reader for (log)pointers section. Using it to read the stream section pointers.

New reader is slower than existing RowReader due to double-translation of the data (columns->rows->rows vs columns->rows->columns->rows). The idea though is to get rid of the double-translation in the future.
2025-11-28 13:47:59 +01:00
benclive
2ce207fbdb chore: Use shared zstd encoders during dataobj build (#20044) 2025-11-28 10:31:40 +00:00
renovate-sh-app[bot]
d76b3bf495 fix(deps): update module github.com/apache/arrow-go/v18 to v18.4.1 (main) (#19750)
Signed-off-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com>
Co-authored-by: renovate-sh-app[bot] <219655108+renovate-sh-app[bot]@users.noreply.github.com>
Co-authored-by: Paul Rogers <paul.rogers@grafana.com>
2025-11-09 11:53:00 -05:00
Christian Haudum
73bc30d9a8 perf(engine): Improve regexp expression evaluation (#19644)
This PR changes the evaluation of regex-match binary expression evaluations so that the regexp on right hand side of the expression is only compiled once for each batch evaluation, rather than for each row.

This saves a ton of allocations and improves general performance of queries with a regexp line or label filter.
2025-10-30 13:26:26 +01:00
Ashwanth
5a06d44826 chore: table build fill all columns to max rows (#19614) 2025-10-28 14:19:21 +05:30
Stas Spiridonov
160dc2c493 chore: removed arrow-go allocators/retain/release (#19569) 2025-10-22 15:41:12 -04:00
Ashwanth
fa8bd848f6 chore: update buildTable to sort rows using BuilderOptions.SortOrder (#19515) 2025-10-16 14:53:14 +05:30
benclive
b0f5ca7f8c chore: Dedup log objects during build (#19378) 2025-10-02 17:10:07 +00:00
Christian Haudum
386d4e1653 chore(dataobj): Make logs sort order in dataobjects configurable (#19373)
This PR introduces a new configuration option `-dataobj-consumer.dataobj-sort-order` which controls the sort order of logs in the logs section of the dataobj when building new objects.
There are two available sort options:
1. `timestamp DESC, streamID ASC` (configuration value `timestamp-desc`)
2. `streamID ASC, timestamp DESC` (configuration value `stream-asc`)

The new setting is undocumented and may be removed in the future without notice.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
2025-10-02 13:25:27 +02:00
benclive
c5a0a3827d chore: Memory improvements in dataobj page downloads (#19301) 2025-10-01 12:13:54 +00:00
benclive
2dac5ef8a7 chore: Write empty message values in logs section writer (#19359) 2025-09-30 17:45:55 +01:00
Christian Haudum
a3017f2b13 chore(dataobj-consumer): Sort logs object-wide (#19231)
Sorting logs globally (object-wide per tenant) removes overlapping time ranges of sections in the objects.

Sections contain logs from multiple streams, and the ingest lag of streams may vary. This means that although logs of individual sections are sorted by timestamp, the overall sorting of logs is not guaranteed. Therefore sections are sorted with a k-way merge (SortMerge), which does over-query data in case the query would reach its result limit early.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
2025-09-23 09:01:37 +02:00
Sophie Waldman
98b411a649 chore(linter): Fix issues flagged by updated golangci-lint version (#19206) 2025-09-15 16:58:01 -04:00
benclive
dfb60dcaf0 fix: Metastore timerange filter (#19171) 2025-09-11 12:02:43 +00:00
Periklis Tsirakidis
a082b8a061 chore: Replace per section impl tenant function with generic one (#19139) 2025-09-10 11:43:32 +02:00
Christian Haudum
4c81ebc7c3 chore(dataobj): Add option to set target row limit for pages (#19128)
This PR introduces an option for the column builder to limit the number of rows in a page.

It can be set for logs object and index objects separately using `-dataobj-consumer.max-page-rows` and `-dataobj-index-builder.max-page-rows` respectively.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
2025-09-09 17:55:36 +02:00
Robert Fratto
f6091a67d1 chore(engine): move to toggleable section prefetching (#19142)
dataset.readerDownloader was originally introduced in #16429, an attempt to
balance peak memory usage of reading a section with read times by downloading a
configurable size of pages in advance.

In practice, each roundtrip to object storage adds too much of a latency hit,
and we've started to set the cache limit high enough to ensure that each reader
only needs a single prefetch. Given what we've found, it no longer makes sense
to control peak memory usage via the prefetch size. Other options, such as
downloading directly to disk, may be explored in the future.

In the meantime, this PR removes the ability to specify a cache size. All
non-pruned pages will be bulk requested using the range reader (#19067) on the
first read call. Pages which have left the potential read window will continue
to be eagerly removed for garbage collection.

However, we don't want to prefetch when the dataset is entirely in memory,
which is the case when the logs section builder is performing k-way merge over
in-memory sections. To lower the memory usage of builders, prefetching is
configurable. For this initial PR, prefetching is only disabled for the logs
section builder; all other reads force prefetching.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2025-09-09 11:54:34 -04:00
Robert Fratto
ba4bd5dc95 chore: introduce efficient byte range reader in new engine (#19024)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2025-08-28 08:15:44 -04:00
benclive
0b9b4c1ee5 chore: Support multi-tenancy in indexobj builder (#18880) 2025-08-26 14:05:14 +01:00
Robert Fratto
515ebea4f2 chore(dataobj): track total download time of page metadata/data (#18896) 2025-08-18 15:32:33 +00:00
George Robinson
05f6ad1eb0 feat: add tenant to section builders (#18864) 2025-08-18 14:00:06 +01:00
Robert Fratto
595715146f chore(dataobj)!: introduce new centralized dataset encoding format (#18871)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2025-08-15 16:29:07 -04:00
benclive
19d90ef023 chore: Add sorting & tenant info to remaining section types (#18861) 2025-08-14 16:00:45 +00:00
Robert Fratto
642d826a00 chore(dataobj): make section tenancy a global construct (#18857)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
2025-08-14 14:29:04 +00:00
Ashwanth
5203a0c9ad chore(dataobj): add reader stats (#18694) 2025-08-14 18:29:02 +05:30
George Robinson
414538ae24 fix: mistake in error, should have said logs section (#18844) 2025-08-14 06:47:18 -04:00
George Robinson
36b2615156 feat: decode tenant when reading streams and logs sections (#18846) 2025-08-14 09:33:16 +00:00