21 Commits

Author SHA1 Message Date
47fe57c928 Fix "start" for D, Rust, etc
The new DWARF indexer broke "start" for some languages.

For D, it is broken because, while the code in cooked_index_shard::add
specifically excludes Ada, it fails to exclude D.  This means that the
C "main" will be detected as "main" here -- whereas what is intended
is for the code in find_main_name to use d_main_name to find the name.

The Rust compiler, on the other hand, uses DW_AT_main_subprogram.
However, the code in dwarf2_build_psymtabs_hard fails to create a
fully-qualified name, so the name always ends up as plain "main".

For D and Ada, a very simple approach suffices: remove the check
against "main" from cooked_index_shard::add.  This also has the
benefit of slightly speeding up DWARF indexing.  I assume this
approach will work for Pascal and Modula-2 as well, but I don't have a
way to test those at present.

For Rust, though, this is not sufficient.  And, computing the
fully-qualified name in dwarf2_build_psymtabs_hard will crash, because
cooked_index_entry::full_name uses the canonical name -- and that is
not computed until after canonicalization.

However, we don't want to wait for canonicalization to be done before
computing the main name.  That would remove any benefit from doing
canonicalization is the background.

This patch solves this dilemma by noticing that languages using
DW_AT_main_subprogram are, currently, disjoint from languages
requiring canonicalization.  Because of this, we can add a parameter
to full_name to let us avoid crashes, slowdowns, and races here.

This is kind of tricky and ugly, so I've tried to comment it
sufficiently.

While doing this, I had to change gdb.dwarf2/main-subprogram.exp.  A
different possibility here would be to ignore the canonicalization
needs of C in this situation, because those only affect certain types.
However, I chose this approach because the test case is artificial
anyhow.

A long time ago, in an earlier threading attempt, I changed the global
current_language to be a function (hidden behind a macro) to let us
attempt lazily computing the current language.  Perhaps this approach
could still be made to work.  However, that also seemed rather tricky,
more so than this patch.

Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30116
2023-02-18 15:41:38 -07:00
307733cc0f Let user C-c when waiting for DWARF index finalization
In PR gdb/29854, Simon pointed out that it would be good to be able to
use C-c when the DWARF cooked index is waiting for finalization.  The
idea here is to be able to interrupt a command like "break" -- not to
stop the finalization process itself, which runs in a worker thread.

This patch implements this idea, by changing the index wait functions
to, by default, allow a quit.  Polling is done, because there doesn't
seem to be a better way to interrupt a wait on a std::future.

For v2, I realized that the thread compatibility code in thread-pool.h
also needed an update.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29854
2023-02-09 07:21:52 -07:00
19455ee11d gdb/dwarf: rename cooked_index_vector to cooked_index
See previous patch's commit message for rationale.

Change-Id: I6b8cdc045dffccc1c01ed690ff258af09f6ff076
Approved-By: Tom Tromey <tom@tromey.com>
2023-01-31 22:03:40 -05:00
a8dc671839 gdb/dwarf: rename cooked_index to cooked_index_shard
I propose to rename cooked_index_vector and cooked_index such that the
"main" object, that is the entry point to the index, is called
cooked_index.  The fact that the cooked index is implemented as a vector
of smaller indexes is an implementation detail.

This patch renames cooked_index to cooked_index_shard.  The following
patch renames cooked_index_vector to cooked_index.

Change-Id: Id650f97dcb23c48f8409fa0974cd093ca0b75177
Approved-By: Tom Tromey <tom@tromey.com>
2023-01-31 22:03:40 -05:00
902d61e328 gdb: fix dwarf2/cooked-index.c compilation on 32-bit systems
The i386 builder shows:

    ../../binutils-gdb/gdb/dwarf2/cooked-index.c: In member function ‘void cooked_index_vector::dump(gdbarch*) const’:
    ../../binutils-gdb/gdb/dwarf2/cooked-index.c:492:40: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘std::__underlying_type_impl<sect_offset, true>::type’ {aka ‘long long unsigned int’} [-Werror=format=]
      492 |       gdb_printf ("    DIE offset: 0x%lx\n",
          |                                      ~~^
          |                                        |
          |                                        long unsigned int
          |                                      %llx
      493 |     to_underlying (entry->die_offset));
          |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                   |
          |                   std::__underlying_type_impl<sect_offset, true>::type {aka long long unsigned int}

The die_offset's underlying type is uint64, so use PRIx64 in the format
string.

Change-Id: Ibdde4c624ed1bb50eced9a514a4e37aec70a1323
2023-01-30 16:15:17 -05:00
7d82b08e9e gdb/dwarf: dump cooked index contents in cooked_index_functions::dump
As I am investigating a crash I see with the cooked index, I thought it
would be useful to have a way to dump the index contents.  For those not
too familiar with it (that includes me), it can help get a feel of what
it contains and how it is structured.

The cooked_index_functions::dump function is called as part of the
"maintenance print objfiles" command.  I tried to make the output
well structured and indented to help readability, as this prints a lot
of text.

The dump function first dumps all cooked index entries, like this:

    [25] ((cooked_index_entry *) 0x621000121220)
    name:       __ioinit
    canonical:  __ioinit
    DWARF tag:  DW_TAG_variable
    flags:      0x2 [IS_STATIC]
    DIE offset: 0x21a4
    parent:     ((cooked_index_entry *) 0x6210000f9610) [std]

Then the information about the main symbol:

    main: ((cooked_index_entry *) 0x621000123b40) [main]

And finally the address map contents:

    [1] ((addrmap *) 0x6210000f7910)

      [0x0] ((dwarf2_per_cu_data *) 0)
      [0x118a] ((dwarf2_per_cu_data *) 0x60c000007f00)
      [0x1cc7] ((dwarf2_per_cu_data *) 0)
      [0x1cc8] ((dwarf2_per_cu_data *) 0x60c000007f00)
      [0x1cdf] ((dwarf2_per_cu_data *) 0)
      [0x1ce0] ((dwarf2_per_cu_data *) 0x60c000007f00)

The display of address maps above could probably be improved, to show it
more as ranges, but I think this is a reasonable start.

Note that this patch depends on Pedro Alves' patch "enum_flags
to_string" [1].  If my patch is to be merged before Pedro's series, I
will cherry-pick this patch from his series and merge it before mine.

[1] https://inbox.sourceware.org/gdb-patches/20221212203101.1034916-8-pedro@palves.net/

Change-Id: Ida13e479fd4c8d21102ddd732241778bc3b6904a
2023-01-30 15:04:44 -05:00
c121e82c39 Fix comparator bug in cooked index
Simon pointed out that the cooked index template-matching patch
introduced a failure in libstdc++ debug mode.  In particular, the new
code violates the assumption of std::lower_bound and std::upper_bound
that the range is sorted with respect to the comparison.

When I first debugged this, I thought the problem was unfixable as-is
and that a second layer of filtering would have to be done.  However,
on irc, Simon pointed out that it could perhaps be solved if the
comparison function were assured that one operand always came from the
index, with the other always being the search string.

This patch implements this idea.

First, a new mode is introduced: a sorting mode for
cooked_index_entry::compare.  In this mode, strings are compared
case-insensitively, but we're careful to always sort '<' before any
other printable character.  This way, two names like "func" and
"func<param>" will be sorted next to each other -- i.e., "func1" will
not be seen between them.  This is important when searching.

Second, the compare function is changed to work in a strcmp-like way.
This makes it easier to test and (IMO) understand.

Third, the compare function is modified so that in non-sorting modes,
the index entry is always the first argument.  This allows consistency
in compares.

I regression tested this in libstdc++ debug mode on x86-64 Fedora 36.
It fixes the crash that Simon saw.

This is v2.  I believe it addresses the review comments, except for
the 'enum class' change, as I mentioned in email on the list.

Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-01-30 10:46:14 -07:00
70ca3a6bc9 Make addrmap const-correct in cooked index
After the cooked index is created, the addrmaps should be const.

Change-Id: I8234520ab346ced40a8dd6e478ba21fc438c2ba2
2023-01-30 11:55:07 -05:00
35e1763185 More const-correctness in cooked indexer
I noticed that iterating over the index yields non-const
cooked_index_entry objects.  However, after finalization, they should
not be modified.  This patch enforces this by adding const where
needed.

v2 makes the find, all_entries, and wait methods const as well.
2023-01-27 14:12:01 -07:00
ac37b79cc4 Fix parameter-less template regression in new DWARF reader
PR c++/29896 points out a regression in the new DWARF reader.  It does
not properly handle a case like "break fn", where "fn" is a template
function.

This happens because the new index uses strncasecmp to compare.
However, to make this work correctly, we need a custom function that
ignores template parameters.

This patch adds a custom comparison function and fixes the bug.  A new
test case is included.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29896
2023-01-17 07:03:26 -07:00
5a89072f36 Move hash_entry and eq_entry into cooked_index::do_finalize
I was briefly confused by the hash_entry and eq_entry functions in the
cooked index.  They are only needed in a single method, and that
method already has a couple of local lambdas for a different hash
table.  So, it seemed cleaner to move these there as well.
2023-01-17 07:03:26 -07:00
213516ef31 Update copyright year range in header of all files managed by GDB
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
2023-01-01 17:01:16 +04:00
55fc1623f9 Add name canonicalization for C
PR symtab/29105 shows a number of situations where symbol lookup can
result in the expansion of too many CUs.

What happens is that lookup_signed_typename will try to look up a type
like "signed int".  In cooked_index_functions::expand_symtabs_matching,
when looping over languages, the C++ case will canonicalize this type
name to be "int" instead.  Then this method will proceed to expand
every CU that has an entry for "int" -- i.e., nearly all of them.  A
crucial component of this is that the caller, objfile::lookup_symbol,
does not do this canonicalization, so when it tries to find the symbol
for "signed int", it fails -- causing the loop to continue.

This patch fixes the problem by introducing name canonicalization for
C.  The idea here is that, by making C and C++ agree on the canonical
name when a symbol name can have multiple spellings, we avoid the bad
behavior in objfile::lookup_symbol (and any other such code -- I don't
know if there is any).

Unlike C++, C only has a few situations where canonicalization is
needed.  And, in particular, due to the lack of overloading (thus
avoiding any issues in linespec) and due to the way c-exp.y works, I
think that no canonicalization is needed during symbol lookup -- only
during symtab construction.  This explains why lookup_name_info is not
touched.

The stabs reader is modified on a "best effort" basis.

The DWARF reader needed one small tweak in dwarf2_name to avoid a
regression in dw2-unusual-field-names.exp.  I think this is adequately
explained by the comment, but basically this is a scenario that should
not occur in real code, only the gdb test suite.

lookup_signed_typename is simplified.  It used to search for two
different type names, but now gdb can search just for the canonical
form.

gdb.dwarf2/enum-type.exp needed a small tweak, because the
canonicalizer turns "unsigned integer" into "unsigned int integer".
It seems better here to use the correct C type name.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
Tested-by: Simon Marchi <simark@simark.ca>
Reviewed-by: Andrew Burgess <aburgess@redhat.com>
2022-12-01 11:16:41 -07:00
bed34ce705 Refactor cooked_index::do_finalize
This refactors cooked_index::do_finalize, reordering an 'if' to make
it a little less redundant.  This change makes a subsequent patch
easier to read.

Reviewed-by: Andrew Burgess <aburgess@redhat.com>
2022-12-01 11:16:41 -07:00
2c474c4694 [gdb/symtab] Add get/set functions for per_cu->lang/unit_type
The dwarf2_per_cu_data fields lang and unit_type both have a dont-know
initial value (respectively language_unknown and (dwarf_unit_type)0), which
allows us to add certain checks, f.i. checking that that a field is not read
before written.

Add get/set member functions for the two fields as a convenient location to
add such checks, make the fields private to enforce using the member
functions, and add the m_ prefix.

Tested on x86_64-linux.
2022-07-04 10:28:42 +02:00
20a26f4e01 Finalize each cooked index separately
After DWARF has been scanned, the cooked index code does a
"finalization" step in a worker thread.  This step combines all the
index entries into a single master list, canonicalizes C++ names, and
splits Ada names to synthesize package names.

While this step is run in the background, gdb will wait for the
results in some situations, and it turns out that this step can be
slow.  This is PR symtab/29105.

This can be sped up by parallelizing, at a small memory cost.  Now
each index is finalized on its own, in a worker thread.  The cost
comes from name canonicalization: if a given non-canonical name is
referred to by multiple indices, there will be N canonical copies (one
per index) rather than just one.

This requires changing the users of the index to iterate over multiple
results.  However, this is easily done by introducing a new "chained
range" class.

When run on gdb itself, the memory cost seems rather low -- on my
current machine, "maint space 1" reports no change due to the patch.

For performance testing, using "maint time 1" and "file" will not show
correct results.  That approach measures "time to next prompt", but
because the patch only affects background work, this shouldn't (and
doesn't) change.  Instead, a simple way to make gdb wait for the
results is to set a breakpoint.

Before:

    $ /bin/time -f%e ~/gdb/install/bin/gdb -nx -q -batch \
        -ex 'break main' /tmp/gdb
    Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28.
    2.00

After:

    $ /bin/time -f%e ./gdb/gdb -nx -q -batch \
        -ex 'break main' /tmp/gdb
    Breakpoint 1 at 0x43ec30: file ../../binutils-gdb/gdb/gdb.c, line 28.
    0.65

Regression tested on x86-64 Fedora 34.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
2022-05-26 07:35:30 -06:00
72b580b8f4 Micro-optimize cooked_index_entry::full_name
I noticed that cooked_index_entry::full_name can return the canonical
string when there is no parent entry.

Regression tested on x86-64 Fedora 34.
2022-04-20 06:21:06 -06:00
a8b7a13911 gdb: fix "passing NULL to memcpy" UBsan error in dwarf2/cooked-index.c
Reading a simple file compiled with :

    $ gcc -DONE=1 -gdwarf-4 -g3  test.c
    $ gcc --version
    gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0

I get:

    Reading symbols from /tmp/cwd/a.out...
    /home/smarchi/src/binutils-gdb/gdb/dwarf2/cooked-index.c:332:11: runtime error: null pointer passed as argument 2, which is declared to never be null

It looks like even if the size is 0 (the size of the `entries` vector is
0), we shouldn't be passing a NULL pointer to memcpy.  And
`entries.data ()` returns NULL.

Fix that by using std::vector::insert to insert the items of entries
into m_entries.  I haven't checked, but it should essentially compile
down to a memcpy, since the vector elements are trivially copyiable.

Change-Id: I75f1c901e9b522e42e89eb5936e2c70d68eb21e5
2022-04-12 14:42:02 -04:00
7e75279093 "Finalize" the DWARF index in the background
After scanning the CUs, the DWARF indexer merges all the data into a
single vector, canonicalizing C++ names as it proceeds.  While not
necessarily single-threaded, this process is currently done in just
one thread, to keep memory costs lower.

However, this work is all done without reference to any data outside
of the indexes.  This patch improves the apparent performance of GDB
by moving it to the background.  All uses of the index are then made
to wait for this process to complete.

In our ongoing example, this reduces the scanning time on gdb itself
to 0.173937 (wall).  Recall that before this patch, the time was
0.668923; and psymbol reader does this in 1.598869.  That is, at the
end of this series, we see about a 10x speedup.
2022-04-12 09:31:16 -06:00
46114cb7be Parallelize DWARF indexing
This parallelizes the new DWARF indexer.  The indexer's storage was
designed so that each storage object and each indexer is fully
independent.  This setup makes it simple to scan different CUs
independently.

This patch creates a new cooked index storage object per thread, and
then scans a subset of all the CUs in each such thread, using gdb's
existing thread pool.

In the ongoing "gdb gdb" example, this patch reduces the wall time
down to 0.668923, from 0.903534.  (Note that the 0.903534 is the time
for the new index -- that is, when the "enable the new index" patch is
rebased to before this one.  However, in the final series, that patch
appears toward the end.  Hopefully this isn't too confusing.)
2022-04-12 09:31:16 -06:00
51f5a4b8e9 Introduce the new DWARF index class
This patch introduces the new DWARF index class.  It is called
"cooked" to contrast against a "raw" index, which is mapped from disk
without extra effort.

Nothing constructs a cooked index yet.  The essential idea here is
that index entries are created via the "add" method; then when all the
entries have been read, they are "finalize"d -- name canonicalization
is performed and the entries are added to a sorted vector.

Entries use the DWARF name (DW_AT_name) or linkage name, not the full
name as is done for partial symbols.

These two facets -- the short name and the deferred canonicalization
-- help improve the performance of this approach.  This will become
clear in later patches, when parallelization is added.

Some special code is needed for Ada, because GNAT only emits mangled
("encoded", in the Ada lingo) names, and so we reconstruct the
hierarchical structure after the fact.  This is also done in the
finalization phase.

One other aspect worth noting is that the way the "main" function is
found is different in the new code.  Currently gdb will notice
DW_AT_main_subprogram, but won't recognize "main" during reading --
this is done later, via explicit symbol lookup.  This is done
differently in the new code so that finalization can be done in the
background without then requiring a synchronization to look up the
symbol.
2022-04-12 09:31:16 -06:00