gdb/python: implement the print_insn extension language hook

This commit extends the Python API to include disassembler support.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  properties: address, architecture, and progspace.  And has methods:
  __init__, read_memory, and is_valid.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler function, by
  reading the properties, and calling the methods (and other support
  methods in the gdb.disassembler module) the user can perform and
  return the disassembly.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from.  This base class provides base implementations of
  __init__ and __call__ which the user written disassembler should
  override.

  DisassemblerResult

      This class can be used to hold the result of a call to the
  disassembler, it's really just a wrapper around a string (the text
  of the disassembled instruction) and a length (in bytes).  The user
  can return an instance of this class from Disassembler.__call__ to
  represent the newly disassembled instruction.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  use case that I see is augmenting the existing disassembler output.
  The user code can call this function to have GDB disassemble the
  instruction in the normal way.  The user gets back a
  DisassemblerResult object, which they can then read in order to
  augment the disassembler output in any way they wish.

      This function also provides a mechanism to intercept the
  disassemblers reads of memory, thus the user can adjust what GDB
  sees when it is disassembling.

The included documentation provides a more detailed description of the
API.

There is also a new CLI command added:

  maint info python-disassemblers

This command is defined in the Python gdb.disassemblers module, and
can be used to list the currently registered Python disassemblers.
This commit is contained in:
Andrew Burgess
2021-09-17 18:12:34 +01:00
committed by Andrew Burgess
parent e4ae302562
commit 15e15b2d9c
12 changed files with 2648 additions and 1 deletions

View File

@ -222,6 +222,7 @@ optional arguments while skipping others. Example:
* Registers In Python:: Python representation of registers.
* Connections In Python:: Python representation of connections.
* TUI Windows In Python:: Implementing new TUI windows.
* Disassembly In Python:: Instruction Disassembly In Python
@end menu
@node Basic Python
@ -599,6 +600,7 @@ such as those used by readline for command input, and annotation
related prompts are prohibited from being changed.
@end defun
@anchor{gdb_architecture_names}
@defun gdb.architecture_names ()
Return a list containing all of the architecture names that the
current build of @value{GDBN} supports. Each architecture name is a
@ -3287,6 +3289,7 @@ single address space, so this may not match the architecture of a
particular frame (@pxref{Frames In Python}).
@end defun
@anchor{gdbpy_inferior_read_memory}
@findex Inferior.read_memory
@defun Inferior.read_memory (address, length)
Read @var{length} addressable memory units from the inferior, starting at
@ -6575,6 +6578,331 @@ corner), and @var{button} specifies which mouse button was used, whose
values can be 1 (left), 2 (middle), or 3 (right).
@end defun
@node Disassembly In Python
@subsubsection Instruction Disassembly In Python
@cindex python instruction disassembly
@value{GDBN}'s builtin disassembler can be extended, or even replaced,
using the Python API. The disassembler related features are contained
within the @code{gdb.disassembler} module:
@deftp {class} gdb.disassembler.DisassembleInfo
Disassembly is driven by instances of this class. Each time
@value{GDBN} needs to disassemble an instruction, an instance of this
class is created and passed to a registered disassembler. The
disassembler is then responsible for disassembling an instruction and
returning a result.
Instances of this type are usually created within @value{GDBN},
however, it is possible to create a copy of an instance of this type,
see the description of @code{__init__} for more details.
This class has the following properties and methods:
@defvar DisassembleInfo.address
A read-only integer containing the address at which @value{GDBN}
wishes to disassemble a single instruction.
@end defvar
@defvar DisassembleInfo.architecture
The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
which @value{GDBN} is currently disassembling, this property is
read-only.
@end defvar
@defvar DisassembleInfo.progspace
The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
In Python}) for which @value{GDBN} is currently disassembling, this
property is read-only.
@end defvar
@defun DisassembleInfo.is_valid ()
Returns @code{True} if the @code{DisassembleInfo} object is valid,
@code{False} if not. A @code{DisassembleInfo} object will become
invalid once the disassembly call for which the @code{DisassembleInfo}
was created, has returned. Calling other @code{DisassembleInfo}
methods, or accessing @code{DisassembleInfo} properties, will raise a
@code{RuntimeError} exception if it is invalid.
@end defun
@defun DisassembleInfo.__init__ (info)
This can be used to create a new @code{DisassembleInfo} object that is
a copy of @var{info}. The copy will have the same @code{address},
@code{architecture}, and @code{progspace} values as @var{info}, and
will become invalid at the same time as @var{info}.
This method exists so that sub-classes of @code{DisassembleInfo} can
be created, these sub-classes must be initialized as copies of an
existing @code{DisassembleInfo} object, but sub-classes might choose
to override the @code{read_memory} method, and so control what
@value{GDBN} sees when reading from memory
(@pxref{builtin_disassemble}).
@end defun
@defun DisassembleInfo.read_memory (length, offset)
This method allows the disassembler to read the bytes of the
instruction to be disassembled. The method reads @var{length} bytes,
starting at @var{offset} from
@code{DisassembleInfo.address}.
It is important that the disassembler read the instruction bytes using
this method, rather than reading inferior memory directly, as in some
cases @value{GDBN} disassembles from an internal buffer rather than
directly from inferior memory, calling this method handles this
detail.
Returns a buffer object, which behaves much like an array or a string,
just as @code{Inferior.read_memory} does
(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}). The
length of the returned buffer will always be exactly @var{length}.
If @value{GDBN} is unable to read the required memory then a
@code{gdb.MemoryError} exception is raised (@pxref{Exception
Handling}).
This method can be overridden by a sub-class in order to control what
@value{GDBN} sees when reading from memory
(@pxref{builtin_disassemble}). When overriding this method it is
important to understand how @code{builtin_disassemble} makes use of
this method.
While disassembling a single instruction there could be multiple calls
to this method, and the same bytes might be read multiple times. Any
single call might only read a subset of the total instruction bytes.
If an implementation of @code{read_memory} is unable to read the
requested memory contents, for example, if there's a request to read
from an invalid memory address, then a @code{gdb.MemoryError} should
be raised.
Raising a @code{MemoryError} inside @code{read_memory} does not
automatically mean a @code{MemoryError} will be raised by
@code{builtin_disassemble}. It is possible the @value{GDBN}'s builtin
disassembler is probing to see how many bytes are available. When
@code{read_memory} raises the @code{MemoryError} the builtin
disassembler might be able to perform a complete disassembly with the
bytes it has available, in this case @code{builtin_disassemble} will
not itself raise a @code{MemoryError}.
Any other exception type raised in @code{read_memory} will propagate
back and be available re-raised by @code{builtin_disassemble}.
@end defun
@end deftp
@deftp {class} Disassembler
This is a base class from which all user implemented disassemblers
must inherit.
@defun Disassembler.__init__ (name)
The constructor takes @var{name}, a string, which should be a short
name for this disassembler.
@end defun
@defun Disassembler.__call__ (info)
The @code{__call__} method must be overridden by sub-classes to
perform disassembly. Calling @code{__call__} on this base class will
raise a @code{NotImplementedError} exception.
The @var{info} argument is an instance of @code{DisassembleInfo}, and
describes the instruction that @value{GDBN} wants disassembling.
If this function returns @code{None}, this indicates to @value{GDBN}
that this sub-class doesn't wish to disassemble the requested
instruction. @value{GDBN} will then use its builtin disassembler to
perform the disassembly.
Alternatively, this function can return a @code{DisassemblerResult}
that represents the disassembled instruction, this type is described
in more detail below.
The @code{__call__} method can raise a @code{gdb.MemoryError}
exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
that there was a problem accessing the required memory, this will then
be displayed by @value{GDBN} within the disassembler output.
Ideally, the only three outcomes from invoking @code{__call__} would
be a return of @code{None}, a successful disassembly returned in a
@code{DisassemblerResult}, or a @code{MemoryError} indicating that
there was a problem reading memory.
However, as an implementation of @code{__call__} could fail due to
other reasons, e.g.@: some external resource required to perform
disassembly is temporarily unavailable, then, if @code{__call__}
raises a @code{GdbError}, the exception will be converted to a string
and printed at the end of the disassembly output, the disassembly
request will then stop.
Any other exception type raised by the @code{__call__} method is
considered an error in the user code, the exception will be printed to
the error stream according to the @kbd{set python print-stack} setting
(@pxref{set_python_print_stack,,@kbd{set python print-stack}}).
@end defun
@end deftp
@deftp {class} DisassemblerResult
This class is used to hold the result of calling
@w{@code{Disassembler.__call__}}, and represents a single disassembled
instruction. This class has the following properties and methods:
@defun DisassemblerResult.__init__ (@var{length}, @var{string})
Initialize an instance of this class, @var{length} is the length of
the disassembled instruction in bytes, which must be greater than
zero, and @var{string} is a non-empty string that represents the
disassembled instruction.
@end defun
@defvar DisassemblerResult.length
A read-only property containing the length of the disassembled
instruction in bytes, this will always be greater than zero.
@end defvar
@defvar DisassemblerResult.string
A read-only property containing a non-empty string representing the
disassembled instruction.
@end defvar
@end deftp
The following functions are also contained in the
@code{gdb.disassembler} module:
@defun register_disassembler (disassembler, architecture)
The @var{disassembler} must be a sub-class of
@code{gdb.disassembler.Disassembler} or @code{None}.
The optional @var{architecture} is either a string, or the value
@code{None}. If it is a string, then it should be the name of an
architecture known to @value{GDBN}, as returned either from
@code{gdb.Architecture.name}
(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
@code{gdb.architecture_names}
(@pxref{gdb_architecture_names,,gdb.architecture_names}).
The @var{disassembler} will be installed for the architecture named by
@var{architecture}, or if @var{architecture} is @code{None}, then
@var{disassembler} will be installed as a global disassembler for use
by all architectures.
@cindex disassembler in Python, global vs.@: specific
@cindex search order for disassembler in Python
@cindex look up of disassembler in Python
@value{GDBN} only records a single disassembler for each architecture,
and a single global disassembler. Calling
@code{register_disassembler} for an architecture, or for the global
disassembler, will replace any existing disassembler registered for
that @var{architecture} value. The previous disassembler is returned.
If @var{disassembler} is @code{None} then any disassembler currently
registered for @var{architecture} is deregistered and returned.
When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
first looks for an architecture specific disassembler. If none has
been registered then @value{GDBN} looks for a global disassembler (one
registered with @var{architecture} set to @code{None}). Only one
disassembler is called to perform disassembly, so, if there is both an
architecture specific disassembler, and a global disassembler
registered, it is the architecture specific disassembler that will be
used.
@value{GDBN} tracks the architecture specific, and global
disassemblers separately, so it doesn't matter in which order
disassemblers are created or registered; an architecture specific
disassembler, if present, will always be used in preference to a
global disassembler.
You can use the @kbd{maint info python-disassemblers} command
(@pxref{maint info python-disassemblers}) to see which disassemblers
have been registered.
@end defun
@anchor{builtin_disassemble}
@defun builtin_disassemble (info)
This function calls back into @value{GDBN}'s builtin disassembler to
disassemble the instruction identified by @var{info}, an instance, or
sub-class, of @code{DisassembleInfo}.
When the builtin disassembler needs to read memory the
@code{read_memory} method on @var{info} will be called. By
sub-classing @code{DisassembleInfo} and overriding the
@code{read_memory} method, it is possible to intercept calls to
@code{read_memory} from the builtin disassembler, and to modify the
values returned.
It is important to understand that, even when
@code{DisassembleInfo.read_memory} raises a @code{gdb.MemoryError}, it
is the internal disassembler itself that reports the memory error to
@value{GDBN}. The reason for this is that the disassembler might
probe memory to see if a byte is readable or not; if the byte can't be
read then the disassembler may choose not to report an error, but
instead to disassemble the bytes that it does have available.
If the builtin disassembler is successful then an instance of
@code{DisassemblerResult} is returned from @code{builtin_disassemble},
alternatively, if something goes wrong, an exception will be raised.
A @code{MemoryError} will be raised if @code{builtin_disassemble} is
unable to read some memory that is required in order to perform
disassembly correctly.
Any exception that is not a @code{MemoryError}, that is raised in a
call to @code{read_memory}, will pass through
@code{builtin_disassemble}, and be visible to the caller.
Finally, there are a few cases where @value{GDBN}'s builtin
disassembler can fail for reasons that are not covered by
@code{MemoryError}. In these cases, a @code{GdbError} will be raised.
The contents of the exception will be a string describing the problem
the disassembler encountered.
@end defun
Here is an example that registers a global disassembler. The new
disassembler invokes the builtin disassembler, and then adds a
comment, @code{## Comment}, to each line of disassembly output:
@smallexample
class ExampleDisassembler(gdb.disassembler.Disassembler):
def __init__(self):
super().__init__("ExampleDisassembler")
def __call__(self, info):
result = gdb.disassembler.builtin_disassemble(info)
length = result.length
text = result.string + "\t## Comment"
return gdb.disassembler.DisassemblerResult(length, text)
gdb.disassembler.register_disassembler(ExampleDisassembler())
@end smallexample
The following example creates a sub-class of @code{DisassembleInfo} in
order to intercept the @code{read_memory} calls, within
@code{read_memory} any bytes read from memory have the two 4-bit
nibbles swapped around. This isn't a very useful adjustment, but
serves as an example.
@smallexample
class MyInfo(gdb.disassembler.DisassembleInfo):
def __init__(self, info):
super().__init__(info)
def read_memory(self, length, offset):
buffer = super().read_memory(length, offset)
result = bytearray()
for b in buffer:
v = int.from_bytes(b, 'little')
v = (v << 4) & 0xf0 | (v >> 4)
result.append(v)
return memoryview(result)
class NibbleSwapDisassembler(gdb.disassembler.Disassembler):
def __init__(self):
super().__init__("NibbleSwapDisassembler")
def __call__(self, info):
info = MyInfo(info)
return gdb.disassembler.builtin_disassemble(info)
gdb.disassembler.register_disassembler(NibbleSwapDisassembler())
@end smallexample
@node Python Auto-loading
@subsection Python Auto-loading
@cindex Python auto-loading