When interrupting a program in non-stop, the program gets interrupted
correctly, but GDB busy loops (the event loop is always woken up).
Here is how to reproduce it:
1. Start GDB: ./gdb -nx --data-directory=data-directory -ex "set non-stop 1" --args /bin/sleep 60
2. Run the program with "run"
3. Interrupt with ^C.
4. Look into htop, see GDB taking 100% CPU
Debugging `handle_file_event`, we see that the event source that wakes
up the event loop is the linux-nat one:
(top-gdb) p file_ptr.proc
$5 = (handler_func *) 0xb9cccd <handle_target_event(int, gdb_client_data)>
^^^^^^^^^^^^^^^^^^^
|
\-- the linux-nat callback
Debugging fetch_inferior_event and do_target_wait, we see that we
don't actually call `wait` on the linux-nat target, because
inferior_matches returns false:
auto inferior_matches = [&wait_ptid] (inferior *inf)
{
return (inf->process_target () != NULL
&& (threads_are_executing (inf->process_target ())
|| threads_are_resumed_pending_p (inf))
&& ptid_t (inf->pid).matches (wait_ptid));
};
because `threads_are_executing` is false.
What happens is:
1. User types ctrl-c, that writes in the linux-nat pipe, waking up
the event source.
2. linux-nat's wait gets called, the SIGINT event is returned, but
before returning, it marks the pipe again, in order for wait to
get called again:
/* If we requested any event, and something came out, assume there
may be more. If we requested a specific lwp or process, also
assume there may be more. */
if (target_is_async_p ()
&& ((ourstatus->kind != TARGET_WAITKIND_IGNORE
&& ourstatus->kind != TARGET_WAITKIND_NO_RESUMED)
|| ptid != minus_one_ptid))
async_file_mark ();
3. The SIGINT event is handled, the program is stopped, the stop
notification is printed.
4. The event loop is woken up again because of the `async_file_mark`
of step 2.
5. Because `inferior_matches` returns false, we never call
linux-nat's wait, so the pipe stays readable.
6. Goto 4.
Pedro says:
This commit fixes it by letting do_target_wait call target_wait even
if threads_are_executing is false. This will normally result in the
target returning TARGET_WAITKIND_NO_RESUMED, and _not_ marking its
event source again. This results in infrun only calling into the
target only once (i.e., breaking the busy loop).
Note that the busy loop bug didn't trigger in all-stop mode because
all-stop handles this by unregistering the target from the event loop
as soon as it was all stopped -- see
inf-loop.c:inferior_event_handler's INF_EXEC_COMPLETE handling. If we
remove that non-stop check from inferior_event_handler, and replace
the target_has_execution check for threads_are_executing instead, it
also fixes the issue for non-stop. I considered that as the final
solution, but decided that the solution proposed here instead is just
simpler and more future-proof design. With the
TARGET_WAITKIND_NO_RESUMED handling fixes done in the previous
patches, I think it should be possible to always keep the target
registered in the event loop, meaning we could eliminate the
target_async(0) call from inferior_event_handler as well as most of
the target_async(1) calls in the target backends. That would allow in
the future e.g., the remote target reporting asynchronous
notifications even if all threads are stopped. I haven't attempted
that, though.
gdb/ChangeLog:
yyyy-mm-dd Simon Marchi <simon.marchi@polymtl.ca>
Pedro Alves <pedro@palves.net>
PR gdb/26199
* infrun.c (threads_are_resumed_pending_p): Delete.
(do_target_wait): Remove threads_are_executing and
threads_are_resumed_pending_p checks from the inferior_matches
lambda. Update comments.
This adds a testcase that covers the scenarios described in the
previous two commits.
gdb/testsuite/ChangeLog:
PR gdb/26199
* gdb.multi/multi-target.c (exit_thread): New.
(thread_start): Break loop if EXIT_THREAD.
* gdb.multi/multi-target.exp (test_no_unwaited_for): New proc.
(top level) Call test_no_resumed.
Let's consider the same use case as in the previous commit:
Say you have two inferiors 1 and 2, each connected to a different
target, A and B.
Now say you set inferior 2 running, with "continue &".
Now you select a thread of inferior 1, say thread 1.2, and continue in
the foreground. All other threads of inferior 1 are left stopped.
Thread 1.2 exits, and thus target A has no other resumed thread, so it
reports TARGET_WAITKIND_NO_RESUMED.
At this point, because the threads of inferior 2 are still executing
the TARGET_WAITKIND_NO_RESUMED event is ignored.
Now, the user types Ctrl-C. Because GDB had previously put inferior 1
in the foreground, the kernel sends the SIGINT to that inferior.
However, no thread in that inferior is executing right now, so ptrace
never intercepts the SIGINT -- it is never dequeued by any thread.
The result is that GDB's CLI is stuck. There's no way to get back the
prompt (unless inferior 2 happens to report some event).
The fix in this commit is to make handle_no_resumed give the terminal
to some other inferior that still has threads executing so that a
subsequent Ctrl-C reaches that target first (and then GDB intercepts
the SIGINT). This is a bit hacky, but seems like the best we can do
with the current design.
I think that putting all native inferiors in their own session would
help fixing this in a clean way, since with that a Ctrl-C on GDB's
terminal will _always_ reach GDB first, and then GDB can decide how to
pause the inferior. But that's a much larger change.
The testcase added by the following patch needs this fix.
gdb/ChangeLog:
PR gdb/26199
* infrun.c (handle_no_resumed): Transfer terminal to inferior with
executing threads.
handle_no_resumed is currently not considering multiple targets.
Say you have two inferiors 1 and 2, each connected to a different
target, A and B.
Now say you set inferior 2 running, with "continue &".
Now you select a thread of inferior 1, say thread 1.2, and continue in
the foreground. All other threads of inferior 1 are left stopped.
Thread 1.2 exits, and thus target A has no other resumed thread, so it
reports TARGET_WAITKIND_NO_RESUMED.
At this point, if both inferiors were running in the same target,
handle_no_resumed would realize that threads of inferior 2 are still
executing, so the TARGET_WAITKIND_NO_RESUMED event should be ignored.
But because handle_no_resumed only walks the threads of the current
target, it misses noticing that threads of inferior 2 are still
executing. The fix is just to walk over all threads of all targets.
A testcase covering the use case above will be added in a following
patch. It can't be added yet because it depends on yet another fix to
handle_no_resumed not included here.
gdb/ChangeLog:
PR gdb/26199
* infrun.c (handle_no_resumed): Handle multiple targets.
If we hit the synchronous execution command case described by
handle_no_resumed, and handle_no_resumed determines that the event
should be ignored, because it found a thread that is executing, we end
up in prepare_to_wait.
There, if the current target is not registered in the event loop right
now, we call mark_infrun_async_event_handler. With that event handler
marked, the event loop calls again into fetch_inferior_event, which
calls target_wait, which returns TARGET_WAITKIND_NO_RESUMED, and we
end up in handle_no_resumed, again ignoring the event and marking
infrun_async_event_handler. The result is that GDB is now always
keeping the CPU 100% busy in this loop, even though it continues to be
able to react to input and to real target events, because we still go
through the event-loop.
The problem is that marking of the infrun_async_event_handler in
prepare_to_wait. That is there to handle targets that don't support
asynchronous execution. So the correct predicate is whether async
execution is supported, not whether the target is async right now.
gdb/ChangeLog:
PR gdb/26199
* infrun.c (prepare_to_wait): Check target_can_async_p instead of
target_is_async_p.
We were checking the thr->executing of an exited thread.
gdb/ChangeLog:
PR gdb/26199
* target.c (target_pass_ctrlc): Look at the inferior's non-exited
threads, not all threads.
In non-stop mode, remote targets mark an async event source whose
callback is supposed to result in calling remote_target::wait_ns to
either process the event queue, or acknowledge an incoming %Stop
notification.
The callback in question is remote_async_inferior_event_handler, where
we call inferior_event_handler, to end up in fetch_inferior_event ->
target_wait -> remote_target::wait -> remote_target::wait_ns.
A problem here however is that when debugging multiple targets,
fetch_inferior_event can pull events out of any target picked at
random, for event fairness. This means that when
remote_async_inferior_event_handler returns, remote_target::wait may
have not been called at all, and thus pending notifications may have
not been acked. Because async event sources auto-clear, when
remote_async_inferior_event_handler returns the async event handler is
no longer marked, so the event loop won't automatically call
remote_async_inferior_event_handler again to try to process the
pending remote notifications/queue. The result is that stop events
may end up not processed, e.g., "interrupt -a" seemingly not managing
to stop all threads.
Fix this by making remote_async_inferior_event_handler mark the event
handler again before returning, if necessary.
Maybe a better fix would be to make async event handlers not
auto-clear themselves, make that the responsibility of the callback,
so that the event loop would keep calling the callback automatically.
Or, we could try making so that fetch_inferior_event would optionally
handle events only for the target that it got passed down via
parameter. However, I don't think now just before branching is the
time to try to do any such change.
gdb/ChangeLog:
PR gdb/26199
* remote.c (remote_target::open_1): Pass remote target pointer as
data to create_async_event_handler.
(remote_async_inferior_event_handler): Mark async event handler
before returning if the remote target still has either pending
events or unacknowledged notifications.
Extract extended states from operand types in instruction template. Set
xstate_zmm for master register move.
* config/tc-i386.c (_i386_insn): Remove has_regmmx, has_regxmm,
has_regymm, has_regzmm and has_regtmm. Add xstate.
(md_assemble): Set i.xstate from operand types in instruction
template.
(build_modrm_byte): Updated.
(output_insn): Check i.xstate.
* testsuite/gas/i386/i386.exp: Run property-6 and
x86-64-property-6.
* testsuite/gas/i386/property-6.d: New file.
* testsuite/gas/i386/property-6.s: Updated.
* testsuite/gas/i386/x86-64-property-6.d: Likewise.
Needed for libraries that use ifuncs or other means to support
cpu-optimized versions of functions, some power10, some not, and those
functions make calls using linkage stubs.
bfd/
* elf64-ppc.h (struct ppc64_elf_params): Add power10_stubs.
* elf64-ppc.c (struct ppc_link_hash_table): Delete
power10_stubs.
(ppc64_elf_check_relocs): Adjust setting of power10_stubs.
(plt_stub_size, ppc_build_one_stub, ppc_size_one_stub): Adjust
uses of power10_stubs.
ld/
* emultempl/ppc64elf.em (params): Init new field.
(enum ppc64_opt): Add OPTION_POWER10_STUBS and OPTION_NO_POWER10_STUBS.
(PARSE_AND_LIST_LONGOPTS): Support --power10-stubs and
--no-power10-stubs.
(PARSE_AND_LIST_OPTIONS, PARSE_AND_LIST_ARGS_CASES): Likewise.
* testsuite/ld-powerpc/callstub-3.d: New test.
* testsuite/ld-powerpc/powerpc.exp: Run it.
'inf_ptrace::wait' needs to discard termination events reported by
detached child processes. Previously it compared the returned pid
against the pid in inferior_ptid to determine if a termination event
should be discarded or reported. The multi-target changes cleared
inferior_ptid to null_ptid in 'wait' target methods, so this was
always failing and never reporting exit events. Instead, report
termination events whose pid matches any inferior belonging to the
current target.
Several tests started failing on FreeBSD after the multi-target
changes and pass again after this change.
gdb/ChangeLog:
* inf-ptrace.c (inf_ptrace_target::wait): Don't compare against
inferior_ptid.
Since VEX/EVEX vector instructions will always update the full YMM/ZMM
registers, set YMM/ZMM features for VEX/EVEX vector instructions.
* config/tc-i386.c (output_insn): Set YMM/ZMM features for
VEX/EVEX vector instructions.
* testsuite/gas/i386/property-4.d: New file.
* testsuite/gas/i386/property-4.s: Likewise.
* testsuite/gas/i386/property-5.d: Likewise.
* testsuite/gas/i386/property-5.s: Likewise.
* testsuite/gas/i386/x86-64-property-4.d: Likewise.
* testsuite/gas/i386/x86-64-property-5.d: Likewise.
FreeBSD's kernel recently added several ELF auxiliary vector entries
to describe the arguments passed to new executable images during
exec(). The AT_FREEBSD_ARGC and AT_FREEBSD_ARGV entries give the
length and address of the process argument array. AT_FREEBSD_ENVC and
AT_FREEBSD_ENVV entries give the length and address of the initial
process environment. AT_FREEBSD_PS_STRINGS gives the address of the
'struct ps_strings' object.
include/ChangeLog:
* elf/common.h (AT_FREEBSD_ARGC, AT_FREEBSD_ARGV, AT_FREEBSD_ENVC)
(AT_FREEBSD_ENVV, AT_FREEBSD_PS_STRINGS): Define.
gdb/ChangeLog:
* fbsd-tdep.c (fbsd_print_auxv_entry): Handle AT_FREEBSD_ARGC,
AT_FREEBSD_ARGV, AT_FREEBSD_ENVC, AT_FREEBSD_ENVV,
AT_FREEBSD_PS_STRINGS.
ld's garbage collection test on powerpc64 catered for old compilers
(pre -mcmodel=medium support), setting options that caused the test to
fail. Which meant the test wasn't really testing anything. Get rid
of that old compiler support, and avoid -fPIE fails on ppc32.
* testsuite/ld-gc/gc.exp: Don't set -mminimal-toc for powerpc64,
and remove powerpc64 xfail. Use -fno-PIE for ppc32.
The PR18841 test does cross-module calls from within an ifunc
resolver, which is nasty, and not supported in general since the
called function may not be relocated. In this case the called
function (zoo) is just a stub so doesn't need relocating, but on ppc64
the function descriptor for zoo in the executable won't be relocated
at the time the shared library ifunc resolver runs. That means the
test will fail if your compiler generates PIEs by default.
PR 18841
* testsuite/ld-ifunc/ifunc.exp: Run pr18841 tests non-pie.
This one isn't just a weird corner case requiring multiple
.PARISC.unwind sections in an object file to trigger the buffer
overflow, it's also a simple bug that would prevent relocations being
applied in the normal case of a single .PARISC.unwind section.
* readelf (slurp_hppa_unwind_table): Set table_len before use
in relocation sanity checks.
Fixes this testsuite fail on Windows:
FAIL: gdb.base/auto-load.exp: print $script_loaded
Converts the debugfile path from c:/dir/file to /c/dir/file, so it can be
appended to the auto-load path.
gdb/ChangeLog:
2020-07-08 Hannes Domani <ssbssa@yahoo.de>
* auto-load.c (auto_load_objfile_script_1): Convert drive part
of debugfile path on Windows.
gdb/doc/ChangeLog:
2020-07-08 Hannes Domani <ssbssa@yahoo.de>
* gdb.texinfo: Document Windows drive conversion of
'set auto-load scripts-directory'.
The argument is passed as a generic cookie value to the supplied
callback and is not necessarily a pointer to a bfd.
gdb/ChangeLog:
* fbsd-nat.c (fbsd_nat_target::find_memory_regions): Rename 'obfd'
argument to 'data'.
Testing using the internal AdaCore test suite showed a regression from
the target string reading changes. In particular, now
ada_exception_message_1 can get the wrong answer in some cases. In
particular, when an Ada exception catchpoint is hit, sometimes the
exception name will be incorrect. The case I was seeing changed from
the correct:
Catchpoint 2, CONSTRAINT_ERROR (catch C_E) at [...]
to:
Catchpoint 2, CONSTRAINT_ERROR (catch C_EE) at [...]
I was not able to reproduce this failure with the Fedora gnat.
Perhaps it is related to some local change to gnat; I do not know.
Meanwhile, because ada_exception_message_1 knows the length of the
string to read, we can use read_memory here. This fixes the bug.
I've updated the test suite to at least exercise this code path.
However, as mentioned above, the new test does not actually provoke
the failure.
gdb/ChangeLog
2020-07-08 Tom Tromey <tromey@adacore.com>
* ada-lang.c (ada_exception_message_1): Use read_memory.
gdb/testsuite/ChangeLog
2020-07-08 Tom Tromey <tromey@adacore.com>
* gdb.ada/catch_ex/foo.adb: Pass string to raise.
* gdb.ada/catch_ex.exp: Examine catchpoint text.
* testsuite/script_test_7.sh: Adjust expected address of the .bss
section.
* testsuite/script_test_9.sh: Do not expect the .init section to
immediately follow the .text section in the mapping of sections to
segments.
While some insns support both XOP.W based operand swapping and 256-bit
operation (XOP.L=1), many others don't support one or both.
For {L,S}LWPCB also fix the so far not decoded ModRM.mod == 3
restriction.
Take the opportunity and replace the custom OP_LWP_E() and OP_LWPCB_E()
routines by suitable other, non-custom operanbd specifiers.
Just like other VEX-encoded scalar insns do.
Besides a testcase for this behavior also introduce one to verify that
XOP scalar insns don't honor -mavxscalar=256, as they don't ignore
XOP.L.
There's no need for custom operand handling here, except for the VEX.W
controlled operand swapping and the printing of the remaining 4-bit
immediate. VEX.W can be handled just like 4-operand insns.
Also take the opportunity and drop the stray indirection through
vex_w_table[].
There's no need for custom operand handling here, except for the VEX.W
controlled operand swapping. The latter can be easily integrated into
OP_REG_VexI4().
git commit 7193487fa8 took h8300 out of the notarget list, resulting in
h8300-elf +FAIL: ld-scripts/section-match-1
h8300-linux +FAIL: ld-scripts/section-match-1
* testsuite/ld-scripts/section-match-1.d: xfail h8300.
--image-base 0 is not just for x86_64 mingw. This patch fixes that,
and a case where a changed LDFLAGS leaked out of one script to the next.
* testsuite/ld-scripts/align.exp: Use is_pecoff_format.
* testsuite/ld-scripts/defined.exp: Likewise.
* testsuite/ld-scripts/provide.exp: Likewise.
* testsuite/ld-scripts/weak.exp: Likewise.
* testsuite/ld-scripts/empty-address.exp: Likewise. Reset LDFLAGS
on exit.
* testsuite/ld-scripts/expr.exp: Set LDFLAGS earlier, and with
--image-base for PE.
* testsuite/ld-scripts/include.exp: Set LDFLAGS for PE.
* testsuite/ld-scripts/script.exp: Use is_pecoff_format, and
set LDFLAGS as well as flags.
and restrict some other tests using is_*_format.
* testsuite/binutils-all/ar.exp: Use is_xcoff_format.
* testsuite/binutils-all/nm.exp: Likewise.
* testsuite/binutils-all/copy-2.d: Run only for elf and pe targets.
* testsuite/binutils-all/copy-3.d: Run only for elf targets.
* testsuite/binutils-all/set-section-alignment.d: Likewise.
* testsuite/binutils-all/copy-4.d: Don't run for xcoff.
Avoid an UNRESOLVED test due to "Error: the XCOFF file format does not
support arbitrary sections".
* testsuite/lib/binutils-common.exp (is_xcoff_format): New.
* testsuite/binutils-all/objcopy.exp (pr25662): Exclude xcoff.
The binutils XCOFF support doesn't handle random linker scripts very
well at all. These tweaks to final_link fix segfaults when some
linker created sections are discarded due to "/DISCARD/ : { *(.*) }"
in scripts. The xcoff_mark change is necessary to not segfault on
symbols defined in scripts, which may be bfd_link_hash_defined yet
have u.def.section set to bfd_und_section_ptr. (Which might seem odd,
but occurs during early stages of linking before input sections are
mapped.)
* xcofflink.c (xcoff_mark): Don't mark const sections.
(bfd_xcoff_record_link_assignment): Add FIXME.
(_bfd_xcoff_bfd_final_link): Don't segfault on assorted magic
sections being discarded by linker script.