10265 Commits

Author SHA1 Message Date
1bcaeecb7f [gdb/testsuite] Add xfail case in gdb.python/py-record-btrace.exp
I came across:
...
gdb) PASS: gdb.python/py-record-btrace.exp: prepare record: stepi 100
python insn = r.instruction_history^M
warning: Non-contiguous trace at instruction 1 (offset = 0x3e10).^M
(gdb) FAIL: gdb.python/py-record-btrace.exp: prepare record: python insn = r.i\
nstruction_history
...

I'm assuming it's the same root cause as for the already present XFAIL.

Fix this by recognizing above warning in the xfail regexp.

Tested on x86_64-linux, although sofar I was not able to trigger the warning
again.

Approved-By: Markus T. Metzger <markus.t.metzger@intel.com>
2023-02-20 11:16:02 +01:00
13d4a4bd5a [gdb/testsuite] Fix gdb.threads/schedlock.exp for gcc 4.8.5
Since commit 9af467b8240 ("[gdb/testsuite] Fix gdb.threads/schedlock.exp on
fast cpu"), the test-case fails for gcc 4.8.5.

The problem is that for gcc 4.8.5, the commit turned a two-line loop:
...
(gdb) next
78          while (*myp > 0)
(gdb) next
81              MAYBE_CALL_SOME_FUNCTION(); (*myp) ++;
(gdb) next
78          while (*myp > 0)
...
into a three-line loop:
...
(gdb) next
83              MAYBE_CALL_SOME_FUNCTION(); (*myp) ++;
(gdb) next
84              cnt++;
(gdb) next
85            }
(gdb) next
83              MAYBE_CALL_SOME_FUNCTION(); (*myp) ++;
(gdb)
...
and the test-case doesn't expect this.

Fix this by reverting back to the original loop shape as much as possible by:
- removing the cnt++ line
- replacing "while (1)" with "while (one)", where one is a volatile variable
  set to 1.

Tested on x86_64-linux, using compilers:
- gcc 4.8.5, 7.5.0, 12.2.1
- clang 4.0.1, 13.0.1
2023-02-20 11:16:02 +01:00
47fe57c928 Fix "start" for D, Rust, etc
The new DWARF indexer broke "start" for some languages.

For D, it is broken because, while the code in cooked_index_shard::add
specifically excludes Ada, it fails to exclude D.  This means that the
C "main" will be detected as "main" here -- whereas what is intended
is for the code in find_main_name to use d_main_name to find the name.

The Rust compiler, on the other hand, uses DW_AT_main_subprogram.
However, the code in dwarf2_build_psymtabs_hard fails to create a
fully-qualified name, so the name always ends up as plain "main".

For D and Ada, a very simple approach suffices: remove the check
against "main" from cooked_index_shard::add.  This also has the
benefit of slightly speeding up DWARF indexing.  I assume this
approach will work for Pascal and Modula-2 as well, but I don't have a
way to test those at present.

For Rust, though, this is not sufficient.  And, computing the
fully-qualified name in dwarf2_build_psymtabs_hard will crash, because
cooked_index_entry::full_name uses the canonical name -- and that is
not computed until after canonicalization.

However, we don't want to wait for canonicalization to be done before
computing the main name.  That would remove any benefit from doing
canonicalization is the background.

This patch solves this dilemma by noticing that languages using
DW_AT_main_subprogram are, currently, disjoint from languages
requiring canonicalization.  Because of this, we can add a parameter
to full_name to let us avoid crashes, slowdowns, and races here.

This is kind of tricky and ugly, so I've tried to comment it
sufficiently.

While doing this, I had to change gdb.dwarf2/main-subprogram.exp.  A
different possibility here would be to ignore the canonicalization
needs of C in this situation, because those only affect certain types.
However, I chose this approach because the test case is artificial
anyhow.

A long time ago, in an earlier threading attempt, I changed the global
current_language to be a function (hidden behind a macro) to let us
attempt lazily computing the current language.  Perhaps this approach
could still be made to work.  However, that also seemed rather tricky,
more so than this patch.

Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30116
2023-02-18 15:41:38 -07:00
e8eca7a6b6 Fix crash in go_symbol_package_name
go_symbol_package_name package name asserts that it is only passed a
Go symbol, but this is not enforced by one caller.  It seems simplest
to just check and return early in this case.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17876
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-02-17 19:05:04 -05:00
733da2ced8 gdb: fix regression in gdb.xml/maint_print_struct.exp
A regression in gdb.xml/maint_print_struct.exp was introduced with
commit:

  commit 81b86eced24f905545b58aa6c27478104c364976
  Date:   Fri Jan 6 09:30:40 2023 -0700

      Do not record a rejected target description

The test relied on an invalid target description being stored within
the tdesc_info of the current inferior, the above commit stopped this
behaviour.

Update the test to check that the invalid architecture is NOT stored,
and then check printing the target description directly from the
file.

Approved-By: Tom Tromey <tromey@adacore.com>
2023-02-17 22:29:09 +00:00
ab3fdfe6e4 [gdb/testsuite] Simplify gdb.arch/amd64-disp-step-avx.exp
On SLE-11, with glibc 2.11.3, I run into:
...
(gdb) PASS: gdb.arch/amd64-disp-step-avx.exp: vex3: \
  var128 has expected value after
continue^M
Continuing.^M
^M
Program received signal SIGSEGV, Segmentation fault.^M
0x0000000000400283 in _exit (status=0) at \
  ../sysdeps/unix/sysv/linux/_exit.c:33^M
33      ../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.^M
(gdb) FAIL: gdb.arch/amd64-disp-step-avx.exp: \
  continue until exit at amd64-disp-step-avx
...

This is not related to gdb, we get the same result by just running the exec.

The problem is that the test-case:
- calls glibc's _exit, and
- uses -nostartfiles -static, putting the burden for any necessary
  initialization for calling glibc's _exit on the test-case itself.

So, when we get to the second insn in _exit:
...
000000000040acb0 <_exit>:
  40acb0:       48 63 d7                movslq %edi,%rdx
  40acb3:       64 4c 8b 14 25 00 00    mov    %fs:0x0,%r10
...
no glibc-related initialization is done, and we run into the segfault.

Adding this (borrowed from __libc_start_main) in _start in the .S file is
sufficient to fix it:
...
         .rept 200
         nop
+        call __pthread_initialize_minimal
         .endr
...
But that already doesn't compile with say glibc 2.31, and regardless I think
this sort of fix is too fragile.

We could of course fix this by simply not running to exit.  But ideally we'd
have an exec that doesn't segfault when you just run it.

Alternatively, we could hand-code an _exit syscall and bypass glibc
all together.  But I'd rather fix this in a way that simplifies the test-case.

Taking a step back, the -nostartfiles -static was added to address that the
xmm registers were not zero at main (which AFAICT is a valid thing to happen).

[ The change itself silently broke the test-case, needing further fixing by
commit 40310f30a51 ("gdb: make gdb.arch/amd64-disp-step-avx.exp actually test
displaced stepping"). ]

Instead, simplify things by reverting to the original situation:
- no -nostartfiles -static compilation flags,
- no _start in the .S file,
- use exit instead of _exit in the .S file,
and fix the original problem by setting the xmm registers to zero rather than
checking that they're zero.

Now that we're no longer forcing -static, add nopie to the flags to prevent
compilation failure with target board unix/-fPIE/-pie.

Tested on x86_64-linux.

PR testsuite/30132
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30132
2023-02-17 15:33:18 +01:00
141cd15842 Don't throw quit while handling inferior events, part II
I noticed that if Ctrl-C was typed just while GDB is evaluating a
breakpoint condition in the background, and GDB ends up reaching out
to the Python interpreter, then the breakpoint condition would still
fail, like:

  c&
  Continuing.
  (gdb) Error in testing breakpoint condition:
  Quit

That happens because while evaluating the breakpoint condition, we
enter Python, and end up calling PyErr_SetInterrupt (it's called by
gdbpy_set_quit_flag, in frame #0):

 (top-gdb) bt
 #0  gdbpy_set_quit_flag (extlang=0x558c68f81900 <extension_language_python>) at ../../src/gdb/python/python.c:288
 #1  0x0000558c6845f049 in set_quit_flag () at ../../src/gdb/extension.c:785
 #2  0x0000558c6845ef98 in set_active_ext_lang (now_active=0x558c68f81900 <extension_language_python>) at ../../src/gdb/extension.c:743
 #3  0x0000558c686d3e56 in gdbpy_enter::gdbpy_enter (this=0x7fff2b70bb90, gdbarch=0x558c6ab9eac0, language=0x0) at ../../src/gdb/python/python.c:212
 #4  0x0000558c68695d49 in python_on_memory_change (inferior=0x558c6a830b00, addr=0x555555558014, len=4, data=0x558c6af8a610 "") at ../../src/gdb/python/py-inferior.c:146
 #5  0x0000558c6823a071 in std::__invoke_impl<void, void (*&)(inferior*, unsigned long, long, unsigned char const*), inferior*, unsigned long, long, unsigned char const*> (__f=@0x558c6a8ecd98: 0x558c68695d01 <python_on_memory_change(inferior*, CORE_ADDR, ssize_t, bfd_byte const*)>) at /usr/include/c++/11/bits/invoke.h:61
 #6  0x0000558c68237591 in std::__invoke_r<void, void (*&)(inferior*, unsigned long, long, unsigned char const*), inferior*, unsigned long, long, unsigned char const*> (__fn=@0x558c6a8ecd98: 0x558c68695d01 <python_on_memory_change(inferior*, CORE_ADDR, ssize_t, bfd_byte const*)>) at /usr/include/c++/11/bits/invoke.h:111
 #7  0x0000558c68233e64 in std::_Function_handler<void (inferior*, unsigned long, long, unsigned char const*), void (*)(inferior*, unsigned long, long, unsigned char const*)>::_M_invoke(std::_Any_data const&, inferior*&&, unsigned long&&, long&&, unsigned char const*&&) (__functor=..., __args#0=@0x7fff2b70bd40: 0x558c6a830b00, __args#1=@0x7fff2b70bd38: 93824992247828, __args#2=@0x7fff2b70bd30: 4, __args#3=@0x7fff2b70bd28: 0x558c6af8a610 "") at /usr/include/c++/11/bits/std_function.h:290
 #8  0x0000558c6830a96e in std::function<void (inferior*, unsigned long, long, unsigned char const*)>::operator()(inferior*, unsigned long, long, unsigned char const*) const (this=0x558c6a8ecd98, __args#0=0x558c6a830b00, __args#1=93824992247828, __args#2=4, __args#3=0x558c6af8a610 "") at /usr/include/c++/11/bits/std_function.h:590
 #9  0x0000558c6830a620 in gdb::observers::observable<inferior*, unsigned long, long, unsigned char const*>::notify (this=0x558c690828c0 <gdb::observers::memory_changed>, args#0=0x558c6a830b00, args#1=93824992247828, args#2=4, args#3=0x558c6af8a610 "") at ../../src/gdb/../gdbsupport/observable.h:166
 #10 0x0000558c68309d95 in write_memory_with_notification (memaddr=0x555555558014, myaddr=0x558c6af8a610 "", len=4) at ../../src/gdb/corefile.c:363
 #11 0x0000558c68904224 in value_assign (toval=0x558c6afce910, fromval=0x558c6afba6c0) at ../../src/gdb/valops.c:1190
 #12 0x0000558c681e3869 in expr::assign_operation::evaluate (this=0x558c6af8e150, expect_type=0x0, exp=0x558c6afcfe60, noside=EVAL_NORMAL) at ../../src/gdb/expop.h:1902
 #13 0x0000558c68450c89 in expr::logical_or_operation::evaluate (this=0x558c6afab060, expect_type=0x0, exp=0x558c6afcfe60, noside=EVAL_NORMAL) at ../../src/gdb/eval.c:2330
 #14 0x0000558c6844a896 in expression::evaluate (this=0x558c6afcfe60, expect_type=0x0, noside=EVAL_NORMAL) at ../../src/gdb/eval.c:110
 #15 0x0000558c6844a95e in evaluate_expression (exp=0x558c6afcfe60, expect_type=0x0) at ../../src/gdb/eval.c:124
 #16 0x0000558c682061ef in breakpoint_cond_eval (exp=0x558c6afcfe60) at ../../src/gdb/breakpoint.c:4971
 ...


The fix is to disable cooperative SIGINT handling while handling
inferior events, so that SIGINT is saved in the global quit flag, and
not in the extension language, while handling an event.

This commit augments the testcase added by the previous commit to test
this scenario as well.

Approved-By: Tom Tromey <tom@tromey.com>
Change-Id: Idf8ab815774ee6f4b45ca2d0caaf30c9b9f127bb
2023-02-15 20:58:10 +00:00
0ace6ace1b Don't throw quit while handling inferior events
This implements what I suggested here:

 https://inbox.sourceware.org/gdb-patches/ab97c553-f406-b094-cdf3-ba031fdea925@palves.net/

Here is the current default quit_handler, a function that ends up
called by the QUIT macro:

 void
 default_quit_handler (void)
 {
   if (check_quit_flag ())
     {
       if (target_terminal::is_ours ())
 	quit ();
       else
 	target_pass_ctrlc ();
      }
 }

As we can see above, when the inferior is running in the foreground,
then a Ctrl-C is translated into a call to target_pass_ctrlc().

The target_terminal::is_ours() case above is there to handle the
scenario where GDB has the terminal, meaning it is handling some
command the user typed, like "list", or "p a + b" or some such.

However, when the inferior is running on the background, say with
"c&", GDB also has the terminal.  Run control handling is now done in
the "background".  The CLI is responsive to user commands.  If users
type Ctrl-C, they're expecting it to interrupt whatever command they
next type in the CLI, which again, could be "list", "p a + b", etc.
It's as if background run control was handled by a separate thread,
and the Ctrl-C is meant to go to the main thread, handling the CLI.

However, when handling an event, inside fetch_inferior_event &
friends, a Ctrl-C _also_ results in a Quit exception, from the same
default_quit_handler function shown above.  This quit aborts run
control handling, breakpoint condition evaluation, etc., and may even
leave run control in an inconsistent state.

The testcase added by this patch illustrates this.  The test program
just loops a number of times calling the "foo" function.

The idea is to set a breakpoint in the "foo" function with a condition
that sends SIGINT to GDB, and then evaluates to false, which results
in the program being re-resumed in the background.  The SIGINT-sending
emulates pressing Ctrl-C just while GDB was evaluating the breakpoint
condition, except, it's more deterministic.

It looks like this:

  (gdb) p $counter = 0
  $1 = 0
  (gdb) b foo if $counter++ == 10 || $_shell("kill -SIGINT `pidof gdb`") != 0
  Breakpoint 2 at 0x555555555131: file gdb.base/bg-exec-sigint-bp-cond.c, line 21.
  (gdb) c&
  Continuing.
  (gdb)

After that background continue, the breakpoint should be hit 10 times,
and we should see 10 "Quit" being printed on the screen.  As if the
user typed Ctrl-C on the prompt a number of times with no inferior
running:

  (gdb)        <<< Ctrl-C
  (gdb) Quit   <<< Ctrl-C
  (gdb) Quit   <<< Ctrl-C
  (gdb)

However, here's what you see instead:

  (gdb) c&
  Continuing.
  (gdb) Quit
  (gdb)

Just one Quit, and nothing else.  If we look at the thread's state, we see:

  (gdb) info threads
    Id   Target Id                                            Frame
  * 1    Thread 0x7ffff7d6f740 (LWP 112192) "bg-exec-sigint-" foo () at gdb.base/bg-exec-sigint-bp-cond.c:21

So the thread stopped, but we didn't report a stop...

Issuing another continue shows the same immediate-and-silent-stop:

  (gdb) c&
  Continuing.
  (gdb) Quit
  (gdb) p $counter
  $2 = 2

As mentioned, since the run control handling, and breakpoint and
watchpoint evaluation, etc. are running in the background from the
perspective of the CLI, when users type Ctrl-C in this situation,
they're thinking of aborting whatever other command they were typing
or running at the prompt, not the run control side, not the previous
"c&" command.

So I think that we should install a custom quit_handler while inside
fetch_inferior_event, where we already disable pagination and other
things for a similar reason.  This custom quit handler does nothing if
GDB has the terminal, and forwards Ctrl-C to the inferior otherwise.

With the patch implementing that, and the same testcase, here's what
you see instead:

 (gdb) p $counter = 0
 $1 = 0
 (gdb) b foo if $counter++ == 10 || $_shell("kill -SIGINT `pidof gdb`") != 0
 Breakpoint 2 at 0x555555555131: file gdb.base/bg-exec-sigint-bp-cond.c, line 21.
 (gdb) c&
 Continuing.
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb) Quit
 (gdb)
 Breakpoint 2, foo () at gdb.base/bg-exec-sigint-bp-cond.c:21
 21        return 0;

Approved-By: Tom Tromey <tom@tromey.com>
Change-Id: I1f10d99496a7d67c94b258e45963e83e439e1778
2023-02-15 20:58:10 +00:00
91265a7d7c Add new "$_shell(CMD)" internal function
For testing a following patch, I wanted a way to send a SIGINT to GDB
from a breakpoint condition.  And I didn't want to do it from a Python
breakpoint or Python function, as I wanted to exercise non-Python code
paths.  So I thought I'd add a new $_shell internal function, that
runs a command under the shell, and returns the exit code.  With this,
I could write:

  (gdb) b foo if $_shell("kill -SIGINT $gdb_pid") != 0 || <other condition>

I think this is generally useful, hence I'm proposing it here.

Here's the new function in action:

 (gdb) p $_shell("true")
 $1 = 0
 (gdb) p $_shell("false")
 $2 = 1
 (gdb) p $_shell("echo hello")
 hello
 $3 = 0
 (gdb) p $_shell("foobar")
 bash: line 1: foobar: command not found
 $4 = 127
 (gdb) help function _shell
 $_shell - execute a shell command and returns the result.
 Usage: $_shell (command)
 Returns the command's exit code: zero on success, non-zero otherwise.
 (gdb)

NEWS and manual changes included.

Approved-By: Andrew Burgess <aburgess@redhat.com>
Approved-By: Tom Tromey <tom@tromey.com>
Approved-By: Eli Zaretskii <eliz@gnu.org>
Change-Id: I7e36d451ee6b428cbf41fded415ae2d6b4efaa4e
2023-02-15 20:58:00 +00:00
751495be92 Make "ptype INTERNAL_FUNCTION" in Ada print like other languages
Currently, printing the type of an internal function in Ada shows
double <>s, like:

 (gdb) with language ada -- ptype $_isvoid
 type = <<internal function>>

while all other languages print it with a single <>, like:

 (gdb) with language c -- ptype $_isvoid
 type = <internal function>

I don't think there's a reason that Ada needs to be different.  We
currently print the double <>s because we take this path in
ada_print_type:

    switch (type->code ())
      {
      default:
	gdb_printf (stream, "<");
	c_print_type (type, "", stream, show, level, language_ada, flags);
	gdb_printf (stream, ">");
	break;

... and the type's name already has the <>s.

Fix this by simply adding an early check for
TYPE_CODE_INTERNAL_FUNCTION.

Approved-By: Andrew Burgess <aburgess@redhat.com>
Approved-By: Tom Tromey <tom@tromey.com>
Change-Id: Ic2b6527b9240a367471431023f6e27e6daed5501
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30105
2023-02-15 20:56:57 +00:00
a975d4e6bc Fix "ptype INTERNAL_FUNC" (PR gdb/30105)
Currently, looking at the type of an internal function, like below,
hits an odd error:

 (gdb) ptype $_isvoid
 type = <internal function>type not handled in c_type_print_varspec_prefix()

That is an error thrown from
c-typeprint.c:c_type_print_varspec_prefix, where it reads:

    ...
    case TYPE_CODE_DECFLOAT:
    case TYPE_CODE_FIXED_POINT:
      /* These types need no prefix.  They are listed here so that
	 gcc -Wall will reveal any types that haven't been handled.  */
      break;
    default:
      error (_("type not handled in c_type_print_varspec_prefix()"));
      break;

Internal function types have type code TYPE_CODE_INTERNAL_FUNCTION,
which is not explicitly handled by that switch.

That comment quoted above says that gcc -Wall will reveal any types
that haven't been handled, but that's not actually true, at least with
modern GCCs.  You would need to enable -Wswitch-enum for that, which
we don't.  If I do enable that warning, then I see that we're missing
handling for the following type codes:

   TYPE_CODE_INTERNAL_FUNCTION,
   TYPE_CODE_MODULE,
   TYPE_CODE_NAMELIST,
   TYPE_CODE_XMETHOD

TYPE_CODE_MODULE and TYPE_CODE_NAMELIST and Fortran-specific, so it'd
be a little weird to handle them here.

I tried to reach this code with TYPE_CODE_XMETHOD, but couldn't figure
out how to.  ptype on an xmethod isn't treated specially, it just
complains that the method doesn't exist.  I've extended the
gdb.python/py-xmethods.exp testcase to make sure of that.

My thinking is that whatever type code we add next, the most likely
scenario is that it won't need any special handling, so we'd just be
adding another case to that "do nothing" list.  If we do need special
casing for whatever type code, I think that tests added at the same
time as the feature would uncover it anyhow.  If we do miss adding the
special casing, then it still looks better to me to print the type
somewhat incompletely than to error out and make it harder for users
to debug whatever they need.  So I think that the best thing to do
here is to just remove all those explicit "do nothing" cases, along
with the error default case.

After doing that, I decided to write a testcase that iterates over all
supported languages doing "ptype INTERNAL_FUNC".  That revealed that
Pascal has a similar problem, except the default case hits a
gdb_assert instead of an error:

 (gdb) with language pascal -- ptype $_isvoid
 type =
 ../../src/gdb/p-typeprint.c:268: internal-error: type_print_varspec_prefix: unexpected type
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.

That is fixed by this patch in the same way.

You'll notice that the new testcase special-cases the Ada expected
output:

	} elseif {$lang == "ada"} {
	    gdb_test "ptype \$_isvoid" "<<internal function>>"
	} else {
	    gdb_test "ptype \$_isvoid" "<internal function>"
	}

That will be subject of the following patch.

Approved-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I81aec03523cceb338b5180a0b4c2e4ad26b4c4db
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30105
2023-02-15 20:56:52 +00:00
5bed9dc992 [gdb/testsuite] Add xfail in gdb.python/py-record-btrace.exp
There's a HW bug affecting Processor Trace on some Intel processors
(Ice Lake to Raptor Lake microarchitectures).

The bug was exposed by linux kernel commit 670638477aed
("perf/x86/intel/pt: Opportunistically use single range output mode"),
added in version v5.5.0, and was worked around by commit ce0d998be927
("perf/x86/intel/pt: Fix sampling using single range output") in version
6.1.0.

The bug manifests (on a Performance-core of an i7-1250U, an Alder Lake cpu) in
a single test-case:
...
(gdb) python insn = r.instruction_history^M
warning: Decode error (-20) at instruction 33 (offset = 0x3d6a, \
  pc = 0x400501): compressed return without call.^M
(gdb) FAIL: gdb.python/py-record-btrace.exp: prepare record: \
  python insn = r.instruction_history
...

Add a corresponding XFAIL.

Note that the i7-1250U has both Performance-cores and Efficient-cores, and on
an Efficient-Core the test-case runs without any problems, so if the testsuite
run is not pinned to a specific cpu, the test may either PASS or XFAIL.

Tested on x86_64-linux:
- openSUSE Leap 15.4 with linux kernel version 5.14.21
- openSUSE Tumbleweed with linux kernel version 6.1.8

PR testsuite/30075
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30075
2023-02-14 13:15:49 +01:00
37d75d4552 [gdb/testsuite] Factor out proc linux_kernel_version
Factor out new proc linux_kernel_version from test-case
gdb.arch/i386-pkru.exp.

Tested on x86_64-linux.
2023-02-14 11:53:54 +01:00
382d927ffc Rename all fields of struct value
This renames all the fields of struct value, in preparation for the
coming changes.

Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-02-13 15:21:06 -07:00
9ae4519da9 gdb/python: deallocate tui window factories at Python shut down
The previous commit relied on spotting when a Python defined TUI
window factory was deleted.  I spotted that the window factories are
not deleted when GDB shuts down its Python environment, they are only
deleted when one window factory replaces another.  Consider this
example Python script:

  class TestWindowFactory:
      def __init__(self, msg):
          self.msg = msg
          print("Entering TestWindowFactory.__init__: %s" % self.msg)

      def __call__(self, tui_win):
          print("Entering TestWindowFactory.__call__: %s" % self.msg)
          return TestWindow(tui_win, self.msg)

      def __del__(self):
          print("Entering TestWindowFactory.__del__: %s" % self.msg)

  gdb.register_window_type("test_window", TestWindowFactory("A"))
  gdb.register_window_type("test_window", TestWindowFactory("B"))

And this GDB session:

  (gdb) source tui.py
  Entering TestWindowFactory.__init__: A
  Entering TestWindowFactory.__init__: B
  Entering TestWindowFactory.__del__: B
  (gdb) quit

Notice that when the 'B' window replaces the 'A' window we see the 'A'
object being deleted.  But, when Python is shut down (after the
'quit') the 'B' object is never deleted.

Instead, GDB retains a reference to the window factory object, which
forces the Python object to remain live even after the Python
interpreter itself has been shut down.

The references themselves are held in a dynamically allocated
std::unordered_map (in tui/tui-layout.c) which is never deallocated,
thus the underlying Python references are never decremented to zero,
and so GDB never tries to delete these Python objects.

This commit is the first half of the work to clean up this edge case.

All gdbpy_tui_window_maker objects (the objects that implement the
TUI window factory callback for Python defined TUI windows), are now
linked together into a global list using the intrusive list mechanism.

When GDB shuts down the Python interpreter we can now walk this global
list and release the reference that is held to the underlying Python
object.  By releasing this reference the Python object will now be
deleted.

I've added a new assert in gdbpy_tui_window_maker::operator(), this
will catch the case where we somehow end up in here after having
reset the reference to the underlying Python object.  I don't think
this should ever happen though as we only clear the references when
shutting down the Python interpreter, and the ::operator() function is
only called when trying to apply a new TUI layout - something that
shouldn't happen while GDB itself is shutting down.

This commit does not update the std::unordered_map in tui-layout.c,
that will be done in the next commit.

Reviewed-By: Tom Tromey <tom@tromey.com>
2023-02-13 14:50:46 +00:00
d159d87072 gdb/python: allow Python TUI windows to be replaced
The documentation for gdb.register_window_type says:

  "...  It's an error to try to replace one of the built-in windows,
  but other window types can be replaced. ..."

I take this to mean that if I imported a Python script like this:

  gdb.register_window_type('my_window', FactoryFunction)

Then GDB would have a new TUI window 'my_window', which could be
created by calling FactoryFunction().  If I then, in the same GDB
session imported a script which included:

  gdb.register_window_type('my_window', UpdatedFactoryFunction)

Then GDB would replace the old 'my_window' factory with my new one,
GDB would now call UpdatedFactoryFunction().

This is pretty useful in practice, as it allows users to iterate on
their window implementation within a single GDB session.

However, right now, this is not how GDB operates.  The second call to
register_window_type is basically ignored and the old window factory
is retained.

This is because in tui_register_window (tui/tui-layout.c) we use
std::unordered_map::emplace to insert the new factory function, and
emplace doesn't replace an existing element in an unordered_map.

In this commit, before the emplace call, I now search for an already
existing element, and delete any matching element from the map, the
emplace call will then add the new factory function.

Reviewed-By: Tom Tromey <tom@tromey.com>
2023-02-13 14:50:37 +00:00
97c1951915 gdb/testsuite: handle differences in guile error string output
A new guile test added in commit:

  commit 0a9ccb9dd79384f3ba3f8cd75940e8868f3b526f
  Date:   Mon Feb 6 13:04:16 2023 +0000

      gdb: only allow one of thread or task on breakpoints or watchpoints

fails for some versions of guile.  It turns out that some versions of
guile emit an error like this:

  (gdb) guile (set-breakpoint-thread! bp 1)
  ERROR: In procedure set-breakpoint-thread!:
  In procedure gdbscm_set_breakpoint_thread_x: cannot set both task and thread attributes
  Error while executing Scheme code.

while other versions of guile emit the error like this:

  (gdb) guile (set-breakpoint-thread! bp 1)
  ERROR: In procedure set-breakpoint-thread!:
  ERROR: In procedure gdbscm_set_breakpoint_thread_x: cannot set both task and thread attributes
  Error while executing Scheme code.

notice the extra 'ERROR: ' on the second line of output.  This commit
updates the test regexp to handle this optional 'ERROR: ' string.
2023-02-13 11:19:57 +00:00
f9767e607d gdb/testsuite: look for hipcc in env(ROCM_PATH)
If the hipcc compiler cannot be found in dejagnu's tool_root_dir, look
for it in $::env(ROCM_PATH) (if set).  If hipcc is still not found,
fallback to "hipcc" so the compiler will be searched in the PATH.  This
removes the fallback to the hard-coded "/opt/rocm/bin" prefix.

This change is done so ROCM tools are searched in a uniform manner.

Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-02-13 09:42:14 +00:00
39f6d7c6b0 gdb/testsuite: allow_hipcc_tests tests the hipcc compiler
Update allow_hipcc_tests so all gdb.rocm tests are skipped if we do not
have a working hipcc compiler available.

To achieve this, adjust gdb_simple_compile to ensure that the hip
program is saved in a ".cpp" file before calling hipcc otherwise
compilation will fail.

One thing to note is that it is possible to have a hipcc installed with
a CUDA backend.  Compiling with this back-end will successfully result
in an application, but GDB cannot debug it (at least for the offload
part). In the context of the gdb.rocm tests, we want to detect such
situation where gdb_simple_compile would give a false positive.

To achieve this, this patch checks that there is at least one AMDGPU
device available and that hipcc can compile for this or those targets.
Detecting the device is done using the rocm_agent_enumerator tool which
is installed with the all ROCm installations (it is used by hipcc to
detect identify targets if this is not specified on the comand line).

This patch also makes the allow_hipcc_tests proc a cached proc.

Co-Authored-By: Pedro Alves <pedro@palves.net>
Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-02-13 09:42:14 +00:00
310943c20c gdb/testsuite: require amd-dbgapi support to run rocm tests
Update allow_hipcc_tests to check that GDB has the amd-dbgapi support
built-in.  Without this support, all tests using hipcc and the rocm
stack will fail.

Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-02-13 09:42:13 +00:00
09ad7eb8cc gdb/testsuite: Rename skip_hipcc_tests to allow_hipcc_tests
Rename skip_hipcc_tests to allow_hipcc_tests so it can be used as a
"require" predicate in tests.

Use require in gdb.rocm/simple.exp.

Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-02-13 09:42:13 +00:00
f0bdf68d3f gdb/c++: fix handling of breakpoints on @plt symbols
This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
Additionally, PR gdb/17199 relates to this area of code, but is more
of a request to refactor some parts of GDB, this commit does not
address that request, but it is probably worth reading that PR when
looking at this commit.

When the current language is C++, and the user places a breakpoint on
a function in a shared library, GDB will currently find two locations
for the breakpoint, one location will be within the function itself as
we would expect, but the other location will be within the PLT table
for the call to the named function.  Consider this session:

  $ gdb -q /tmp/breakpoint-shlib-func
  Reading symbols from /tmp/breakpoint-shlib-func...
  (gdb) start
  Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
  Starting program: /tmp/breakpoint-shlib-func

  Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
  20	  int answer = foo ();
  (gdb) break foo
  Breakpoint 2 at 0x401030 (2 locations)
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   <MULTIPLE>
  2.1                         y   0x0000000000401030 <foo()@plt>
  2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20

This is not the expected behaviour.  If we compile the same test using
a C compiler then we see this:

  (gdb) break foo
  Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20

Here's what's happening.  When GDB parses the symbols in the main
executable and the shared library we see a number of different symbols
for foo, and use these to create entries in GDB's msymbol table:

  - In the main executable we see a symbol 'foo@plt' that points at
    the plt entry for foo, from this we add two entries into GDB's
    msymbol table, one called 'foo@plt' which points at the plt entry
    and has type mst_text, then we create a second symbol, this time
    called 'foo' with type mst_solib_trampoline which also points at
    the plt entry,

  - Then, when the shared library is loaded we see another symbol
    called 'foo', this one points at the actual implementation in the
    shared library.  This time GDB creates a msymbol called 'foo' with
    type mst_text that points at the implementation.

This means that GDB creates 3 msymbols to represent the 2 symbols
found in the executable and shared library.

When the user creates a breakpoint on 'foo' GDB eventually ends up in
search_minsyms_for_name (linespec.c), this function then calls
iterate_over_minimal_symbols passing in the name we are looking for
wrapped in a lookup_name_info object.

In iterate_over_minimal_symbols we iterate over two hash tables (using
the name we're looking for as the hash key), first we walk the hash
table of symbol linkage names, then we walk the hash table of
demangled symbol names.

When the language is C++ the symbols for 'foo' will all have been
mangled, as a result, in this case, the iteration of the linkage name
hash table will find no matching results.

However, when we walk the demangled hash table we do find some
results.  In order to match symbol names, GDB obtains a symbol name
matching function by calling the get_symbol_name_matcher method on the
language_defn class.  For C++, in this case, the matching function we
use is cp_fq_symbol_name_matches, which delegates the work to
strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
language set to language_cplus.

The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
skip any parameters in the demangled symbol name when checking for a
match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
is done is that the strings are matched character by character, but,
once the string we are looking for ('foo' here) is exhausted, if we
are looking at '(' then we consider the match a success.

Lets consider the 3 symbols GDB created.  If the function declaration
is 'void foo ()' then from the main executable we added symbols
'_Z3foov@plt' and '_Z3foov', while from the shared library we added
another symbol call '_Z3foov'.  When these are demangled they become
'foo()@plt', 'foo()', and 'foo()' respectively.

Now, the '_Z3foov' symbol from the main executable has the type
mst_solib_trampoline, and in search_minsyms_for_name, we search for
any symbols of type mst_solib_trampoline and filter these out of the
results.

However, the '_Z3foov@plt' symbol (from the main executable), and the
'_Z3foov' symbol (from the shared library) both have type mst_text.

During the demangled name matching, due to the use of MATCH_PARAMS
mode, we stop the comparison as soon as we hit a '(' in the demangled
name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.

By contrast, for C, there are no demangled hash table entries to be
iterated over (in iterate_over_minimal_symbols), we only consider the
linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
symbol obviously matches when we are looking for 'foo', but in this
case the 'foo@plt' will not match due to the '@plt' suffix.

And so, when the user asks for a breakpoint in 'foo', and the language
is C, search_minsyms_for_name, returns a single msymbol, the mst_text
symbol for foo in the shared library, while, when the language is C++,
we get two results, '_Z3foov' for the shared library function, and
'_Z3foov@plt' for the plt entry in the main executable.

I propose to fix this in strncmp_iw_with_mode.  When the mode is
MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
success, GDB will instead search forward for the matching, closing,
')', effectively skipping the parameter list, and then resume
matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
effectively compare against 'foo@plt' (skipping the parameter list),
and the match will fail, just as it does when the language is C.

There is one slight complication, which is revealed by the test
gdb.linespec/cpcompletion.exp, when searching for the symbol of a
const member function, the demangled symbol will have 'const' at the
end of its name, e.g.:

  struct_with_const_overload::const_overload_fn() const

Previously, the matching would stop at the '(' character, but after my
change the whole '()' is skipped, and the match resumes.  As a result,
the 'const' modifier results in a failure to match, when previously
GDB would have found a match.

To work around this issue, in strncmp_iw_with_mode, when mode is
MATCH_PARAMS, after skipping the parameter list, if the next character
is '@' then we assume we are looking at something like '@plt' and
return a value indicating the match failed, otherwise, we return a
value indicating the match succeeded, this allows things like 'const'
to be skipped.

With these changes in place I now see GDB correctly setting a
breakpoint only at the implementation of 'foo' in the shared library.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199

Tested-By: Bruno Larsen <blarsen@redhat.com>
Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-02-12 06:19:53 +00:00
0a9ccb9dd7 gdb: only allow one of thread or task on breakpoints or watchpoints
After this mailing list posting:

  https://sourceware.org/pipermail/gdb-patches/2023-February/196607.html

it seems to me that in practice an Ada task maps 1:1 with a GDB
thread, and so it doesn't really make sense to allow uses to give both
a thread and a task within a single breakpoint or watchpoint
condition.

This commit updates GDB so that the user will get an error if both
are specified.

I've added new tests to cover the CLI as well as the Python and Guile
APIs.  For the Python and Guile testing, as far as I can tell, this
was the first testing for this corner of the APIs, so I ended up
adding more than just a single test.

For documentation I've added a NEWS entry, but I've not added anything
to the docs themselves.  Currently we document the commands with a
thread-id or task-id as distinct command, e.g.:

  'break LOCSPEC task TASKNO'
  'break LOCSPEC task TASKNO if ...'
  'break LOCSPEC thread THREAD-ID'
  'break LOCSPEC thread THREAD-ID if ...'

As such, I don't believe there is any indication that combining 'task'
and 'thread' would be expected to work; it seems clear to me in the
above that those four options are all distinct commands.

I think the NEWS entry is enough that if someone is combining these
keywords (it's not clear what the expected behaviour would be in this
case) then they can figure out that this was a deliberate change in
GDB, but for a new user, the manual doesn't suggest combining them is
OK, and any future attempt to combine them will give an error.

Approved-By: Pedro Alves <pedro@palves.net>
2023-02-12 05:46:44 +00:00
f1f517e810 gdb: show task number in describe_other_breakpoints
I noticed that describe_other_breakpoints doesn't show the task
number, but does show the thread-id.  I can't see any reason why we'd
want to not show the task number in this situation, so this commit
adds this missing information, and extends gdb.ada/tasks.exp to check
this case.

Approved-By: Pedro Alves <pedro@palves.net>
2023-02-11 17:36:24 +00:00
ce068c5f45 gdb: don't print global thread-id to CLI in describe_other_breakpoints
I noticed that describe_other_breakpoints was printing the global
thread-id to the CLI.  For CLI output we should be printing the
inferior local thread-id (e.g. "2.1").  This can be seen in the
following GDB session:

  (gdb) info threads
    Id   Target Id                                Frame
    1.1  Thread 4065742.4065742 "bp-thread-speci" main () at /tmp/bp-thread-specific.c:27
  * 2.1  Thread 4065743.4065743 "bp-thread-speci" main () at /tmp/bp-thread-specific.c:27
  (gdb) break foo thread 2.1
  Breakpoint 3 at 0x40110a: foo. (2 locations)
  (gdb) break foo thread 1.1
  Note: breakpoint 3 (thread 2) also set at pc 0x40110a.
  Note: breakpoint 3 (thread 2) also set at pc 0x40110a.
  Breakpoint 4 at 0x40110a: foo. (2 locations)

Notice that GDB says:

  Note: breakpoint 3 (thread 2) also set at pc 0x40110a.

The 'thread 2' in here is using the global thread-id, we should
instead say 'thread 2.1' which corresponds to how the user specified
the breakpoint.

This commit fixes this issue and adds a test.

Approved-By: Pedro Alves <pedro@palves.net>
2023-02-11 17:35:14 +00:00
bb146a79c7 gdb: add test for readline handling very long commands
The test added in this commit tests for a long fixed readline issue
relating to long command lines.  A similar patch has existed in the
Fedora GDB tree for several years, but I don't see any reason why this
test would not be suitable for inclusion in upstream GDB.  I've
updated the patch to current testsuite standards.

The test is checking for an issue that was fixed by this readline
patch:

  https://lists.gnu.org/archive/html/bug-readline/2006-11/msg00002.html

Which was merged into readline 6.0 (released ~2010).  The issue was
triggered when the user enters a long command line, which wrapped over
multiple terminal lines.  The crash looks like this:

  free(): invalid pointer

  Fatal signal: Aborted
  ----- Backtrace -----
  0x4fb583 gdb_internal_backtrace_1
          ../../src/gdb/bt-utils.c:122
  0x4fb583 _Z22gdb_internal_backtracev
          ../../src/gdb/bt-utils.c:168
  0x6047b9 handle_fatal_signal
          ../../src/gdb/event-top.c:964
  0x7f26e0cc56af ???
  0x7f26e0cc5625 ???
  0x7f26e0cae8d8 ???
  0x7f26e0d094be ???
  0x7f26e0d10aab ???
  0x7f26e0d124ab ???
  0x7f26e1d32e12 rl_free_undo_list
          ../../readline-5.2/undo.c:119
  0x7f26e1d229eb readline_internal_teardown
          ../../readline-5.2/readline.c:405
  0x7f26e1d3425f rl_callback_read_char
          ../../readline-5.2/callback.c:197
  0x604c0d gdb_rl_callback_read_char_wrapper_noexcept
          ../../src/gdb/event-top.c:192
  0x60581d gdb_rl_callback_read_char_wrapper
          ../../src/gdb/event-top.c:225
  0x60492f stdin_event_handler
          ../../src/gdb/event-top.c:545
  0xa60015 gdb_wait_for_event
          ../../src/gdbsupport/event-loop.cc:694
  0xa6078d gdb_wait_for_event
          ../../src/gdbsupport/event-loop.cc:593
  0xa6078d _Z16gdb_do_one_eventi
          ../../src/gdbsupport/event-loop.cc:264
  0x6fc459 start_event_loop
          ../../src/gdb/main.c:411
  0x6fc459 captured_command_loop
          ../../src/gdb/main.c:471
  0x6fdce4 captured_main
          ../../src/gdb/main.c:1310
  0x6fdce4 _Z8gdb_mainP18captured_main_args
          ../../src/gdb/main.c:1325
  0x44f694 main
          ../../src/gdb/gdb.c:32
  ---------------------

I recreated the above crash by a little light hacking on GDB, and then
linking GDB against readline 5.2.  The above stack trace was generated
from the test included in this patch, and matches the trace that was
included in the original bug report.

It is worth acknowledging that without hacking things GDB has a
minimum requirement of readline 7.0.  This test is not about checking
whether GDB has been built against an older version of readline, it is
about checking that readline doesn't regress in this area.

Reviewed-By: Tom Tromey <tom@tromey.com>
2023-02-11 17:17:56 +00:00
a0c0791577 GDB: Introduce limited array lengths while printing values
This commit introduces the idea of loading only part of an array in
order to print it, what I call "limited length" arrays.

The motivation behind this work is to make it possible to print slices
of very large arrays, where very large means bigger than
`max-value-size'.

Consider this GDB session with the current GDB:

  (gdb) set max-value-size 100
  (gdb) p large_1d_array
  value requires 400 bytes, which is more than max-value-size
  (gdb) p -elements 10 -- large_1d_array
  value requires 400 bytes, which is more than max-value-size

notice that the request to print 10 elements still fails, even though 10
elements should be less than the max-value-size.  With a patched version
of GDB:

  (gdb) p -elements 10 -- large_1d_array
  $1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9...}

So now the print has succeeded.  It also has loaded `max-value-size'
worth of data into value history, so the recorded value can be accessed
consistently:

  (gdb) p -elements 10 -- $1
  $2 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9...}
  (gdb) p $1
  $3 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    20, 21, 22, 23, 24, <unavailable> <repeats 75 times>}
  (gdb)

Accesses with other languages work similarly, although for Ada only
C-style [] array element/dimension accesses use history.  For both Ada
and Fortran () array element/dimension accesses go straight to the
inferior, bypassing the value history just as with C pointers.

Co-Authored-By: Maciej W. Rozycki <macro@embecosm.com>
2023-02-10 23:49:19 +00:00
a2fb245a4b GDB/testsuite: Add -nonl' option to gdb_test'
Add a `-nonl' option to `gdb_test' making it possible to match output
from commands such as `output' that do not produce a new line sequence
at the end, e.g.:

  (gdb) output 0
  0(gdb)
2023-02-10 23:49:19 +00:00
aaab5fce4f GDB: Only make data actually retrieved into value history available
While it makes sense to allow accessing out-of-bounds elements in the
debuggee and see whatever there might happen to be there in memory (we
are a debugger and not a programming rules enforcement facility and we
want to make people's life easier in chasing bugs), e.g.:

  (gdb) print one_hundred[-1]
  $1 = 0
  (gdb) print one_hundred[100]
  $2 = 0
  (gdb)

we shouldn't really pretend that we have any meaningful data around
values recorded in history (what these commands really retrieve are
current debuggee memory contents outside the original data accessed,
really confusing in my opinion).  Mark values recorded in history as
such then and verify accesses to be in-range for them:

  (gdb) print one_hundred[-1]
  $1 = <unavailable>
  (gdb) print one_hundred[100]
  $2 = <unavailable>

Add a suitable test case, which also covers integer overflows in data
location calculation.

Approved-By: Tom Tromey <tom@tromey.com>
2023-02-10 23:49:19 +00:00
bae19789c0 GDB: Ignore `max-value-size' setting with value history accesses
We have an inconsistency in value history accesses where array element
accesses cause an error for entries exceeding the currently selected
`max-value-size' setting even where such accesses successfully complete
for elements located in the inferior, e.g.:

  (gdb) p/d one
  $1 = 0
  (gdb) p/d one_hundred
  $2 = {0 <repeats 100 times>}
  (gdb) p/d one_hundred[99]
  $3 = 0
  (gdb) set max-value-size 25
  (gdb) p/d one_hundred
  value requires 100 bytes, which is more than max-value-size
  (gdb) p/d one_hundred[99]
  $7 = 0
  (gdb) p/d $2
  value requires 100 bytes, which is more than max-value-size
  (gdb) p/d $2[99]
  value requires 100 bytes, which is more than max-value-size
  (gdb)

According to our documentation the `max-value-size' setting is a safety
guard against allocating an overly large amount of memory.  Moreover a
statement in documentation says, concerning this setting, that: "Setting
this variable does not affect values that have already been allocated
within GDB, only future allocations."  While in the implementer-speak
the sentence may be unambiguous I think the outside user may well infer
that the setting does not apply to values previously printed.

Therefore rather than just fixing this inconsistency it seems reasonable
to lift the setting for value history accesses, under an implication
that by having been retrieved from the debuggee they have already passed
the safety check.  Do it then, by suppressing the value size check in
`value_copy' -- under an observation that if the original value has been
already loaded (i.e. it's not lazy), then it must have previously passed
said check -- making the last two commands succeed:

  (gdb) p/d $2
  $8 = {0 <repeats 100 times>}
  (gdb) p/d $2 [99]
  $9 = 0
  (gdb)

Expand the testsuite accordingly, covering both value history handling
and the use of `value_copy' by `make_cv_value', used by Python code.
2023-02-10 23:49:19 +00:00
71bb560755 gdb/testsuite: fix gdb.gdb/selftest.exp for native-extended-gdbserver
Following commit 4e2a80ba606 ("gdb/testsuite: expect SIGSEGV from top
GDB spawn id"), the next failure I get in gdb.gdb/selftest.exp, using
the native-extended-gdbserver, is:

    (gdb) PASS: gdb.gdb/selftest.exp: send ^C to child process
    signal SIGINT
    Continuing with signal SIGINT.
    FAIL: gdb.gdb/selftest.exp: send SIGINT signal to child process (timeout)

The problem is that in this gdb_test_multiple:

    set description "send SIGINT signal to child process"
    gdb_test_multiple "signal SIGINT" "$description" {
	-re "^signal SIGINT\r\nContinuing with signal SIGINT.\r\nQuit\r\n.* $" {
	    pass "$description"
	}
    }

The "Continuing with signal SIGINT" portion is printed by the top GDB,
while the Quit portion is printed by the bottom GDB.  As the
gdb_test_multiple is written, it expects both the the top GDB's spawn
id.

Fix this by splitting the gdb_test_multiple in two.  The first one
expects the "Continuing with signal SIGINT" from the top GDB.  The
second one expect "Quit"  and the "(xgdb)" prompt from
$inferior_spawn_id.  When debugging natively, this spawn id will be the
same as the top GDB's spawn id, but it's different when debugging with
GDBserver.

Change-Id: I689bd369a041b48f4dc9858d38bf977d09600da2
2023-02-10 13:55:45 -05:00
632652850d [gdb/testsuite] Fix linespec ambiguity in gdb.base/longjmp.exp
PR testsuite/30103 reports the following failure on aarch64-linux
(ubuntu 22.04):
...
(gdb) PASS: gdb.base/longjmp.exp: with_probes=0: pattern 1: next to longjmp
next
warning: Breakpoint address adjusted from 0x83dc305fef755015 to \
  0xffdc305fef755015.
Warning:
Cannot insert breakpoint 0.
Cannot access memory at address 0xffdc305fef755015

__libc_siglongjmp (env=0xaaaaaaab1018 <env>, val=1) at ./setjmp/longjmp.c:30
30	}
(gdb) KFAIL: gdb.base/longjmp.exp: with_probes=0: pattern 1: gdb/26967 \
  (PRMS: next over longjmp)
delete breakpoints
Delete all breakpoints? (y or n) y
(gdb) info breakpoints
No breakpoints or watchpoints.
(gdb) break 63
No line 63 in the current file.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) FAIL: gdb.base/longjmp.exp: with_probes=0: pattern 2: setup: breakpoint \
  at pattern start (got interactive prompt)
...

The test-case intends to set the breakpoint on line number 63 in
gdb.base/longjmp.c.

It tries to do so by specifying "break 63", which specifies a line in the
"current source file".

Due to the KFAIL PR, gdb stopped in __libc_siglongjmp, and because of presence
of debug info, the "current source file" becomes glibc's ./setjmp/longjmp.c.

Consequently, setting the breakpoint fails.

Fix this by adding a $subdir/$srcfile: prefix to the breakpoint linespecs.

I've managed to reproduce the FAIL on x86_64/-m32, by installing the
glibc-32bit-debuginfo package.  This allowed me to confirm the "current source
file" that is used:
...
(gdb) KFAIL: gdb.base/longjmp.exp: with_probes=0: pattern 1: gdb/26967 \
  (PRMS: next over longjmp)
info source^M
Current source file is ../setjmp/longjmp.c^M
...

Tested on x86_64-linux, target boards unix/{-m64,-m32}.

Reported-By: Luis Machado <luis.machado@arm.com>
Reviewed-By: Tom Tromey <tom@tromey.com>

PR testsuite/30103
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30103
2023-02-10 15:58:00 +01:00
8e77fff268 Fix comment in gdb.rust/fnfield.exp
gdb.rust/fnfield.exp has a comment that, I assume, I copied from some
other test.  This patch fixes it.
2023-02-09 12:23:08 -07:00
31cf28c784 gdb, testsuite: Remove unnecessary call of "set print pretty on"
The command has no effect for the loading of GDB pretty printers and is
removed by this patch to avoid confusion.

Documentation for "set print pretty"
"Cause GDB to print structures in an indented format with one member per line"
2023-02-09 19:38:52 +01:00
1775f8b380 Increase size of main_type::nfields
main_type::nfields is a 'short', and has been for many years.  PR
c++/29985 points out that 'short' is too narrow for an enum that
contains more than 2^15 constants.

This patch bumps the size of 'nfields'.  To verify that the field
isn't directly used, it is also renamed.  Note that this does not
affect the size of main_type on x86-64 Fedora 36.  And, if it does
have a negative effect somewhere, it's worth considering that types
could be shrunk more drastically by using subclasses for the different
codes.

This is v2 of this patch, which has these changes:

* I changed nfields to 'unsigned', per Simon's request.  I looked at
  changing all the uses, but this quickly fans out into a very large
  patch.  (One additional tweak was needed, though.)

* I wrote a test case.  I discovered that GCC cannot compile a large
  enough C test case, so I resorted to using the DWARF assembler.
  This test doesn't reproduce the crash, but it does fail without the
  patch.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29985
2023-02-09 07:55:34 -07:00
cdeb7b7de2 Avoid FAILs in gdb.compile
Many gdb.compile C++ tests fail for me on Fedora 36.  I think these
are largely bugs in the plugin, though I didn't investigate too
deeply.  Once one failure is seen, this often cascades and sometimes
there are many timeouts.

For example, this can happen:

    (gdb) compile code var = a->get_var ()
    warning: Could not find symbol "_ZZ9_gdb_exprP10__gdb_regsE1a" for compiled module "/tmp/gdbobj-0xdI6U/out2.o".
    1 symbols were missing, cannot continue.

I think this is probably a plugin bug because, IIRC, in theory these
symbols should be exempt from a lookup via gdb.

This patch arranges to catch any catastrophic failure and then simply
exit the entire .exp file.
2023-02-08 10:12:22 -07:00
300fa060ab Don't let .gdb_history file cause failures
I had a .gdb_history file in my testsuite directory in the build tree,
and this provoked a failure in gdbhistsize-history.exp.  It seems
simple to prevent this file from causing a failure.
2023-02-08 10:12:22 -07:00
4e315cd4af [gdb/testsuite] Use maint ignore-probes in gdb.base/longjmp.exp
Test-case gdb.base/longjmp.exp handles both the case that there is a libc
longjmp probe, and the case that there isn't.

However, it only tests one of the two cases.

Use maint ignore-probes to test both cases, if possible.

Tested on x86_64-linux.
2023-02-08 13:46:17 +01:00
0ab9328277 [gdb/testsuite] Use maint ignore-probes in gdb.base/solib-corrupted.exp
Test-case gdb.base/solib-corrupted.exp only works for a glibc without probes
interface, otherwise we run into:
...
XFAIL: gdb.base/solib-corrupted.exp: info probes
UNTESTED: gdb.base/solib-corrupted.exp: GDB is using probes
...

Fix this by using maint ignore-probes to simulate the absence of the relevant
probes.

Also, it requires glibc debuginfo, and if not present, it produces an XFAIL:
...
XFAIL: gdb.base/solib-corrupted.exp: make solibs looping
UNTESTED: gdb.base/solib-corrupted.exp: no _r_debug symbol has been found
...
This is incorrect, because an XFAIL indicates a known problem in the
environment.  In this case, there is no problem: the environment is
functioning as expected when glibc debuginfo is not installed.

Fix this by using UNSUPPORTED instead, and make the message less cryptic:
...
UNSUPPORTED: gdb.base/solib-corrupted.exp: make solibs looping \
  (glibc debuginfo required)
...

Finally, with glibc debuginfo present, we run into:
...
(gdb) PASS: gdb.base/solib-corrupted.exp: make solibs looping
info sharedlibrary^M
warning: Corrupted shared library list: 0x7ffff7ffe750 != 0x0^M
From                To                  Syms Read   Shared Object Library^M
0x00007ffff7dd4170  0x00007ffff7df4090  Yes         /lib64/ld-linux-x86-64.so.2^M
(gdb) FAIL: gdb.base/solib-corrupted.exp: corrupted list \
  (shared library list corrupted)
...
due to commit 44288716537 ("gdb, testsuite: extend gdb_test_multiple checks").

Fix this by rewriting into gdb_test_multiple and using -early.

Tested on x86_64-linux, with and without glibc debuginfo installed.
2023-02-08 11:48:53 +01:00
944b1b1817 gdb: fix display of thread condition for multi-location breakpoints
This commit addresses the issue in PR gdb/30087.

If a breakpoint with multiple locations has a thread condition, then
the 'info breakpoints' output is a little messed up, here's an example
of the current output:

  (gdb) break foo thread 1
  Breakpoint 2 at 0x401114: foo. (3 locations)
  (gdb) break bar thread 1
  Breakpoint 3 at 0x40110a: file /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c, line 32.
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   <MULTIPLE>          thread 1
          stop only in thread 1
  2.1                         y   0x0000000000401114 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25
  2.2                         y   0x0000000000401146 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25
  2.3                         y   0x0000000000401168 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25
  3       breakpoint     keep y   0x000000000040110a in bar at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:32 thread 1
          stop only in thread 1

Notice that, at the end of the location for breakpoint 3, the 'thread
1' condition is printed, but this is then repeated on the next line
with 'stop only in thread 1'.

In contrast, for breakpoint 2, the 'thread 1' appears randomly, in the
"What" column, though slightly offset, non of the separate locations
have the 'thread 1' information.  Additionally for breakpoint 2 we
also get a 'stop only in thread 1' line.

There's two things going on here.  First the randomly placed 'thread
1' for breakpoint 2 is due to a bug in print_one_breakpoint_location,
where we check the variable part_of_multiple instead of
header_of_multiple.

If I fix this oversight, then the output is now:

  (gdb) break foo thread 1
  Breakpoint 2 at 0x401114: foo. (3 locations)
  (gdb) break bar thread 1
  Breakpoint 3 at 0x40110a: file /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c, line 32.
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   <MULTIPLE>
          stop only in thread 1
  2.1                         y   0x0000000000401114 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25 thread 1
  2.2                         y   0x0000000000401146 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25 thread 1
  2.3                         y   0x0000000000401168 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25 thread 1
  3       breakpoint     keep y   0x000000000040110a in bar at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:32 thread 1
          stop only in thread 1

The 'thread 1' condition is now displayed at the end of each location,
which makes the output the same for single location breakpoints and
multi-location breakpoints.

However, there's still some duplication here.  Both breakpoints 2 and
3 include a 'stop only in thread 1' line, and it feels like the
additional 'thread 1' is redundant.  In fact, there's a comment to
this very effect in the code:

  /* FIXME: This seems to be redundant and lost here; see the
     "stop only in" line a little further down.  */

So, lets fix this FIXME.  The new plan is to remove all the trailing
'thread 1' markers from the CLI output, we now get this:

  (gdb) break foo thread 1
  Breakpoint 2 at 0x401114: foo. (3 locations)
  (gdb) break bar thread 1
  Breakpoint 3 at 0x40110a: file /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c, line 32.
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   <MULTIPLE>
          stop only in thread 1
  2.1                         y   0x0000000000401114 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25
  2.2                         y   0x0000000000401146 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25
  2.3                         y   0x0000000000401168 in foo at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:25
  3       breakpoint     keep y   0x000000000040110a in bar at /tmp/src/gdb/testsuite/gdb.base/thread-bp-multi-loc.c:32
          stop only in thread 1

All of the above points are also true for the Ada 'task' breakpoint
condition, and the changes I've made also update how the task
information is printed, though in the case of the Ada task there was
no 'stop only in task XXX' line printed, so I've added one of those.

Obviously it can't be quite that easy.  For MI backwards compatibility
I've retained the existing code (but now only for MI like outputs),
which ensures we should generate backwards compatible output.

I've extended an Ada test to cover the new task related output, and
updated all the tests I could find that checked for the old output.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30087

Approved-By: Pedro Alves <pedro@palves.net>
2023-02-07 14:41:40 +00:00
ca2f51c696 [gdb/testsuite] Improve untested message in gdb.ada/finish-var-size.exp
I came across:
...
UNTESTED: gdb.ada/finish-var-size.exp: GCC too told for this test
...
The message only tells us that the compiler version too old, not what compiler
version is required.

Fix this by rewriting using required:
...
UNSUPPORTED: gdb.ada/finish-var-size.exp: require failed: \
  expr [gcc_major_version] >= 12
...

Tested on x86_64-linux.
2023-02-07 11:41:44 +01:00
9af467b824 [gdb/testsuite] Fix gdb.threads/schedlock.exp on fast cpu
Occasionally, I run into:
...
(gdb) PASS: gdb.threads/schedlock.exp: schedlock=on: cmd=continue: \
  set scheduler-locking on
continue^M
Continuing.^M
PASS: gdb.threads/schedlock.exp: schedlock=on: cmd=continue: \
  continue (with lock)
[Thread 0x7ffff746e700 (LWP 1339) exited]^M
No unwaited-for children left.^M
(gdb) Quit^M
(gdb) FAIL: gdb.threads/schedlock.exp: schedlock=on: cmd=continue: \
  stop all threads (with lock) (timeout)
...

What happens is that this loop which is supposed to run "just short of forever":
...
    /* Don't run forever.  Run just short of it :)  */
    while (*myp > 0)
      {
        /* schedlock.exp: main loop.  */
        MAYBE_CALL_SOME_FUNCTION(); (*myp) ++;
      }
...
finishes after 0x7fffffff iterations (when a signed wrap occurs), which on my
system takes only about 1.5 seconds.

Fix this by:
- changing the pointed-at type of myp from signed to unsigned, which makes the
  wrap defined behaviour (and which also make the loop run twice as long,
  which is already enough to make it impossible for me to reproduce the FAIL.
  But let's try to solve this more structurally).
- changing the pointed-at type of myp from int to long long, making the wrap
  unlikely.
- making sure the loop runs forever, by setting the loop condition to 1.
- making sure the loop still contains different lines (as far as debug info is
  concerned) by incrementing a volatile counter in the loop.
- making sure the program doesn't run forever in case of trouble, by adding an
  "alarm (30)".

Tested on x86_64-linux.

PR testsuite/30074
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30074
2023-02-06 12:52:50 +01:00
980dbf3622 gdb: error if 'thread' or 'task' keywords are overused
When creating a breakpoint or watchpoint, the 'thread' and 'task'
keywords can be used to create a thread or task specific breakpoint or
watchpoint.

Currently, a thread or task specific breakpoint can only apply for a
single thread or task, if multiple threads or tasks are specified when
creating the breakpoint (or watchpoint), then the last specified id
will be used.

The exception to the above is that when the 'thread' keyword is used
during the creation of a watchpoint, GDB will give an error if
'thread' is given more than once.

In this commit I propose making this behaviour consistent, if the
'thread' or 'task' keywords are used more than once when creating
either a breakpoint or watchpoint, then GDB will give an error.

I haven't updated the manual, we don't explicitly say that these
keywords can be repeated, and (to me), given the keyword takes a
single id, I don't think it makes much sense to repeat the keyword.
As such, I see this more as adding a missing error to GDB, rather than
making some big change.  However, I have added an entry to the NEWS
file as I guess it is possible that some people might hit this new
error with an existing (I claim, badly written) GDB script.

I've added some new tests to check for the new error.

Just one test needed updating, gdb.linespec/keywords.exp, this test
did use the 'thread' keyword twice, and expected the breakpoint to be
created.  Looking at what this test was for though, it was checking
the use of '-force-condition', and I don't think that being able to
repeat 'thread' was actually a critical part of this test.

As such, I've updated this test to expect the error when 'thread' is
repeated.
2023-02-06 11:02:48 +00:00
79436bfc5a gdb/testsuite: don't try to set non-stop mode on a running target
The test gdb.threads/thread-specific-bp.exp tries to set non-stop mode
on a running target, something which the manual makes clear is not
allowed.

This commit restructures the test a little, we now set the non-stop
mode as part of the GDBFLAGS, so the mode will be set before GDB
connects to the target.  As a consequence I'm able to move the
with_test_prefix out of the check_thread_specific_breakpoint proc.
The check_thread_specific_breakpoint proc is now called within a loop.

After this commit the gdb.threads/thread-specific-bp.exp test still
has some failures, this is because of an issue GDB currently has
printing "Thread ... exited" messages.  This problem should be
addressed by this patch:

  https://sourceware.org/pipermail/gdb-patches/2022-December/194694.html

when it is merged.
2023-02-04 16:15:38 +00:00
18b4d0736b gdb: initial support for ROCm platform (AMDGPU) debugging
This patch adds the foundation for GDB to be able to debug programs
offloaded to AMD GPUs using the AMD ROCm platform [1].  The latest
public release of the ROCm release at the time of writing is 5.4, so
this is what this patch targets.

The ROCm platform allows host programs to schedule bits of code for
execution on GPUs or similar accelerators.  The programs running on GPUs
are typically referred to as `kernels` (not related to operating system
kernels).

Programs offloaded with the AMD ROCm platform can be written in the HIP
language [2], OpenCL and OpenMP, but we're going to focus on HIP here.
The HIP language consists of a C++ Runtime API and kernel language.
Here's an example of a very simple HIP program:

    #include "hip/hip_runtime.h"
    #include <cassert>

    __global__ void
    do_an_addition (int a, int b, int *out)
    {
      *out = a + b;
    }

    int
    main ()
    {
      int *result_ptr, result;

      /* Allocate memory for the device to write the result to.  */
      hipError_t error = hipMalloc (&result_ptr, sizeof (int));
      assert (error == hipSuccess);

      /* Run `do_an_addition` on one workgroup containing one work item.  */
      do_an_addition<<<dim3(1), dim3(1), 0, 0>>> (1, 2, result_ptr);

      /* Copy result from device to host.  Note that this acts as a synchronization
         point, waiting for the kernel dispatch to complete.  */
      error = hipMemcpyDtoH (&result, result_ptr, sizeof (int));
      assert (error == hipSuccess);

      printf ("result is %d\n", result);
      assert (result == 3);

      return 0;
    }

This program can be compiled with:

    $ hipcc simple.cpp -g -O0 -o simple

... where `hipcc` is the HIP compiler, shipped with ROCm releases.  This
generates an ELF binary for the host architecture, containing another
ELF binary with the device code.  The ELF for the device can be
inspected with:

    $ roc-obj-ls simple
    1       host-x86_64-unknown-linux                                           file://simple#offset=8192&size=0
    1       hipv4-amdgcn-amd-amdhsa--gfx906                                     file://simple#offset=8192&size=34216
    $ roc-obj-extract 'file://simple#offset=8192&size=34216'
    $ file simple-offset8192-size34216.co
    simple-offset8192-size34216.co: ELF 64-bit LSB shared object, *unknown arch 0xe0* version 1, dynamically linked, with debug_info, not stripped
                                                                                 ^
                       amcgcn architecture that my `file` doesn't know about ----´

Running the program gives the very unimpressive result:

    $ ./simple
    result is 3

While running, this host program has copied the device program into the
GPU's memory and spawned an execution thread on it.  The goal of this
GDB port is to let the user debug host threads and these GPU threads
simultaneously.  Here's a sample session using a GDB with this patch
applied:

    $ ./gdb -q -nx --data-directory=data-directory ./simple
    Reading symbols from ./simple...
    (gdb) break do_an_addition
    Function "do_an_addition" not defined.
    Make breakpoint pending on future shared library load? (y or [n]) y
    Breakpoint 1 (do_an_addition) pending.
    (gdb) r
    Starting program: /home/smarchi/build/binutils-gdb-amdgpu/gdb/simple
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    [New Thread 0x7ffff5db7640 (LWP 1082911)]
    [New Thread 0x7ffef53ff640 (LWP 1082913)]
    [Thread 0x7ffef53ff640 (LWP 1082913) exited]
    [New Thread 0x7ffdecb53640 (LWP 1083185)]
    [New Thread 0x7ffff54bf640 (LWP 1083186)]
    [Thread 0x7ffdecb53640 (LWP 1083185) exited]
    [Switching to AMDGPU Wave 2:2:1:1 (0,0,0)/0]

    Thread 6 hit Breakpoint 1, do_an_addition (a=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>,
        b=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>,
        out=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>) at simple.cpp:24
    24        *out = a + b;
    (gdb) info inferiors
      Num  Description       Connection           Executable
    * 1    process 1082907   1 (native)           /home/smarchi/build/binutils-gdb-amdgpu/gdb/simple
    (gdb) info threads
      Id   Target Id                                    Frame
      1    Thread 0x7ffff5dc9240 (LWP 1082907) "simple" 0x00007ffff5e9410b in ?? () from /opt/rocm-5.4.0/lib/libhsa-runtime64.so.1
      2    Thread 0x7ffff5db7640 (LWP 1082911) "simple" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
      5    Thread 0x7ffff54bf640 (LWP 1083186) "simple" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
    * 6    AMDGPU Wave 2:2:1:1 (0,0,0)/0                do_an_addition (
        a=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>,
        b=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>,
        out=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>) at simple.cpp:24
    (gdb) bt
    Python Exception <class 'gdb.error'>: Unhandled dwarf expression opcode 0xe1
    #0  do_an_addition (a=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>,
        b=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>,
        out=<error reading variable: DWARF-2 expression error: `DW_OP_regx' operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>) at simple.cpp:24
    (gdb) continue
    Continuing.
    result is 3
    warning: Temporarily disabling breakpoints for unloaded shared library "file:///home/smarchi/build/binutils-gdb-amdgpu/gdb/simple#offset=8192&size=67208"
    [Thread 0x7ffff54bf640 (LWP 1083186) exited]
    [Thread 0x7ffff5db7640 (LWP 1082911) exited]
    [Inferior 1 (process 1082907) exited normally]

One thing to notice is the host and GPU threads appearing under
the same inferior.  This is a design goal for us, as programmers tend to
think of the threads running on the GPU as part of the same program as
the host threads, so showing them in the same inferior in GDB seems
natural.  Also, the host and GPU threads share a global memory space,
which fits the inferior model.

Another thing to notice is the error messages when trying to read
variables or printing a backtrace.  This is expected for the moment,
since the AMD GPU compiler produces some DWARF that uses some
non-standard extensions:

  https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html

There were already some patches posted by Zoran Zaric earlier to make
GDB support these extensions:

  https://inbox.sourceware.org/gdb-patches/20211105113849.118800-1-zoran.zaric@amd.com/

We think it's better to get the basic support for AMD GPU in first,
which will then give a better justification for GDB to support these
extensions.

GPU threads are named `AMDGPU Wave`: a wave is essentially a hardware
thread using the SIMT (single-instruction, multiple-threads) [3]
execution model.

GDB uses the amd-dbgapi library [4], included in the ROCm platform, for
a few things related to AMD GPU threads debugging.  Different components
talk to the library, as show on the following diagram:

    +---------------------------+     +-------------+     +------------------+
    | GDB   | amd-dbgapi target | <-> |     AMD     |     |    Linux kernel  |
    |       +-------------------+     |   Debugger  |     +--------+         |
    |       | amdgcn gdbarch    | <-> |     API     | <=> | AMDGPU |         |
    |       +-------------------+     |             |     | driver |         |
    |       | solib-rocm        | <-> | (dbgapi.so) |     +--------+---------+
    +---------------------------+     +-------------+

  - The amd-dbgapi target is a target_ops implementation used to control
    execution of GPU threads.  While the debugging of host threads works
    by using the ptrace / wait Linux kernel interface (as usual), control
    of GPU threads is done through a special interface (dubbed `kfd`)
    exposed by the `amdgpu` Linux kernel module.  GDB doesn't interact
    directly with `kfd`, but instead goes through the amd-dbgapi library
    (AMD Debugger API on the diagram).

    Since it provides execution control, the amd-dbgapi target should
    normally be a process_stratum_target, not just a target_ops.  More
    on that later.

  - The amdgcn gdbarch (describing the hardware architecture of the GPU
    execution units) offloads some requests to the amd-dbgapi library,
    so that knowledge about the various architectures doesn't need to be
    duplicated and baked in GDB.  This is for example for things like
    the list of registers.

  - The solib-rocm component is an solib provider that fetches the list of
    code objects loaded on the device from the amd-dbgapi library, and
    makes GDB read their symbols.  This is very similar to other solib
    providers that handle shared libraries, except that here the shared
    libraries are the pieces of code loaded on the device.

Given that Linux host threads are managed by the linux-nat target, and
the GPU threads are managed by the amd-dbgapi target, having all threads
appear in the same inferior requires the two targets to be in that
inferior's target stack.  However, there can only be one
process_stratum_target in a given target stack, since there can be only
one target per slot.  To achieve it, we therefore resort the hack^W
solution of placing the amd-dbgapi target in the arch_stratum slot of
the target stack, on top of the linux-nat target.  Doing so allows the
amd-dbgapi target to intercept target calls and handle them if they
concern GPU threads, and offload to beneath otherwise.  See
amd_dbgapi_target::fetch_registers for a simple example:

    void
    amd_dbgapi_target::fetch_registers (struct regcache *regcache, int regno)
    {
      if (!ptid_is_gpu (regcache->ptid ()))
        {
          beneath ()->fetch_registers (regcache, regno);
          return;
        }

      // handle it
    }

ptids of GPU threads are crafted with the following pattern:

  (pid, 1, wave id)

Where pid is the inferior's pid and "wave id" is the wave handle handed
to us by the amd-dbgapi library (in practice, a monotonically
incrementing integer).  The idea is that on Linux systems, the
combination (pid != 1, lwp == 1) is not possible.  lwp == 1 would always
belong to the init process, which would also have pid == 1 (and it's
improbable for the init process to offload work to the GPU and much less
for the user to debug it).  We can therefore differentiate GPU and
non-GPU ptids this way.  See ptid_is_gpu for more details.

Note that we believe that this scheme could break down in the context of
containers, where the initial process executed in a container has pid 1
(in its own pid namespace).  For instance, if you were to execute a ROCm
program in a container, then spawn a GDB in that container and attach to
the process, it will likely not work.  This is a known limitation.  A
workaround for this is to have a dummy process (like a shell) fork and
execute the program of interest.

The amd-dbgapi target watches native inferiors, and "attaches" to them
using amd_dbgapi_process_attach, which gives it a notifier fd that is
registered in the event loop (see enable_amd_dbgapi).  Note that this
isn't the same "attach" as in PTRACE_ATTACH, but being ptrace-attached
is a precondition for amd_dbgapi_process_attach to work.  When the
debugged process enables the ROCm runtime, the amd-dbgapi target gets
notified through that fd, and pushes itself on the target stack of the
inferior.  The amd-dbgapi target is then able to intercept target_ops
calls.  If the debugged process disables the ROCm runtime, the
amd-dbgapi target unpushes itself from the target stack.

This way, the amd-dbgapi target's footprint stays minimal when debugging
a process that doesn't use the AMD ROCm platform, it does not intercept
target calls.

The amd-dbgapi library is found using pkg-config.  Since enabling
support for the amdgpu architecture (amdgpu-tdep.c) depends on the
amd-dbgapi library being present, we have the following logic for
the interaction with --target and --enable-targets:

 - if the user explicitly asks for amdgcn support with
   --target=amdgcn-*-* or --enable-targets=amdgcn-*-*, we probe for
   the amd-dbgapi and fail if not found

 - if the user uses --enable-targets=all, we probe for amd-dbgapi,
   enable amdgcn support if found, disable amdgcn support if not found

 - if the user uses --enable-targets=all and --with-amd-dbgapi=yes,
   we probe for amd-dbgapi, enable amdgcn if found and fail if not found

 - if the user uses --enable-targets=all and --with-amd-dbgapi=no,
   we do not probe for amd-dbgapi, disable amdgcn support

 - otherwise, amd-dbgapi is not probed for and support for amdgcn is not
   enabled

Finally, a simple test is included.  It only tests hitting a breakpoint
in device code and resuming execution, pretty much like the example
shown above.

[1] https://docs.amd.com/category/ROCm_v5.4
[2] https://docs.amd.com/bundle/HIP-Programming-Guide-v5.4
[3] https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
[4] https://docs.amd.com/bundle/ROCDebugger-API-Guide-v5.4

Change-Id: I591edca98b8927b1e49e4b0abe4e304765fed9ee
Co-Authored-By: Zoran Zaric <zoran.zaric@amd.com>
Co-Authored-By: Laurent Morichetti <laurent.morichetti@amd.com>
Co-Authored-By: Tony Tye <Tony.Tye@amd.com>
Co-Authored-By: Lancelot SIX <lancelot.six@amd.com>
Co-Authored-By: Pedro Alves <pedro@palves.net>
2023-02-02 10:02:34 -05:00
cded17bfca gdb/testsuite: fix fetch_src_and_symbols.exp with native-gdbserver board
I noticed that the gdb.debuginfod/fetch_src_and_symbols.exp script
doesn't work with the native-gdbserver board, I see this error:

  ERROR: tcl error sourcing /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.debuginfod/fetch_src_and_symbols.exp.
  ERROR: gdbserver does not support run without extended-remote
      while executing
  "error "gdbserver does not support $command without extended-remote""
      (procedure "gdb_test_multiple" line 51)
      invoked from within

This was introduced with this commit:

  commit 7dd38e31d67c2548b52bea313ab18e40824c05da
  Date:   Fri Jan 6 18:45:27 2023 -0500

      gdb/linespec.c: Fix missing source file during breakpoint re-set

The problem is that the above commit introduces a direct use of the
"run" command, which doesn't work with 'target remote' targets, as
exercised by the native-gdbserver board.

To avoid this, in this commit I switch to using runto_main.  However,
calling runto_main will, by default, delete all the currently set
breakpoints.  As the point of the above commit was to check that a
breakpoint set before stating an inferior would be correctly re-set,
we need to avoid this breakpoint deleting behaviour.

To do this I make use of with_override, and override the
delete_breakpoints proc with a dummy proc which does nothing.

By reverting the GDB changes in commit 7dd38e31d67c I have confirmed
that even after my changes in this commit, the test still fails.  But
with the fixes in commit 7dd38e31d67c, this test now passed using the
unix, native-gdbserver, and native-extended-gdbserver boards.
2023-02-01 17:32:16 +00:00
6647f05df0 gdb: defer warnings when loading separate debug files
Currently, when GDB loads debug information from a separate debug
file, there are a couple of warnings that could be produced if things
go wrong.

In find_separate_debug_file_by_buildid (build-id.c) GDB can give a
warning if the separate debug file doesn't include any actual debug
information, and in separate_debug_file_exists (symfile.c) we can warn
if the CRC checksum in the separate debug file doesn't match the
checksum in the original executable.

The problem here is that, when looking up debug information, GDB will
try several different approaches, lookup by build-id, lookup by
debug-link, and then a lookup from debuginfod.  GDB can potentially
give a warning from an earlier attempt, and then succeed with a later
attempt.  In the cases I have run into this is primarily a warning
about some out of date debug information on my machine, but then GDB
finds the correct information using debuginfod.  This can be confusing
to a user, they will see warnings from GDB when really everything is
working just fine.

For example:

  warning: the debug information found in "/usr/lib/debug//lib64/ld-2.32.so.debug" \
      does not match "/lib64/ld-linux-x86-64.so.2" (CRC mismatch).

This diagnostic was printed on Fedora 33 even when the correct
debuginfo was downloaded.

In this patch I propose that we defer any warnings related to looking
up debug information from a separate debug file.  If any of the
approaches are successful then GDB will not print any of the warnings.
As far as the user is concerned, everything "just worked".  Only if
GDB completely fails to find any suitable debug information will the
warnings be printed.

The crc_mismatch test compiles two executables: crc_mismatch and
crc_mismatch-2 and then strips them of debuginfo creating separate
debug files. The test then replaces crc_mismatch-2.debug with
crc_mismatch.debug to trigger "CRC mismatch" warning. A local
debuginfod server is setup to supply the correct debug file, now when
GDB looks up the debug info no warning is given.

The build-id-no-debug-warning.exp is similar to the previous test. It
triggers the "separate debug info file has no debug info" warning by
replacing the build-id based .debug file with the stripped binary and
then loading it to GDB.  It then also sets up local debuginfod server
with the correct debug file to download to make sure no warnings are
emitted.
2023-02-01 11:12:35 +00:00
95cbab2beb gdb/testsuite: adjust ensure_gdb_index to cooked_index_functions::dump changes
Following 7d82b08e9e0a ("gdb/dwarf: dump cooked index contents in
cooked_index_functions::dump"), I see some failures like:

    (gdb) mt print objfiles with-mf^M
    ^M
    Object file /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/with-mf/with-mf:  Objfile at 0x614000005040, bfd at 0x6120000e08c0, 18 minsyms    ^M
    ^M
    Cooked index in use:^M
    ^M
    ...
    (gdb) FAIL: gdb.base/with-mf.exp: check if index present

This is because the format of the "Cooked index in use" line changed
slightly.  Adjust ensure_gdb_index to expect the trailing colon.

Change-Id: If0a87575c02d8a0bc0d4b8ead540c234c62760f8
2023-01-31 11:43:38 -05:00
d8e88d10d8 gdb/testsuite: fix xfail in gdb.ada/ptype_tagged_param.exp
I see:

    ERROR: wrong # args: should be "xfail message"
        while executing
    "xfail "no debug info" $gdb_test_name"
        ("uplevel" body line 3)
        invoked from within
    "uplevel {
            if {!$has_runtime_debug_info} {
                xfail "no debug info" $gdb_test_name
            } else {
                fail $gdb_test_name
            }
        }"

This is because the xfail takes only one argument, fix that.

Change-Id: I2e304d4fd3aa61067c04b5dac2be2ed34dab3190
2023-01-31 11:35:43 -05:00
8d31d08fe6 Use xfail in ptype_tagged_param.exp
Pedro pointed out that ptype_tagged_param.exp used a kfail, but an
xfail would be more appropriate as the problem appears to be in gcc,
not gdb.
2023-01-30 08:03:33 -07:00