Skip to content

Source Code Walkthrough

Toshi Piazza edited this page Jan 16, 2022 · 3 revisions

Typically, Ghidra proc mods are described completely by a Sleigh slaspec and other architecture files (pspec, cspec, opinion, ldefs, etc). But hexagon.slaspec is auto-generated, and instruction fields and token have no semantic information. Furthermore, the hexagon.slaspec requires many context registers to be set appropriately by an accompanying HexagonPacketAnalyzer Analyzer plugin.

The auto-generation script for hexagon.slaspec is loosely adapted from binja-hexagon, which in turn manipulates auto-generated artifacts from qemu-hexagon

Resources

First-time Disassembly Flow

First, all instructions are disassembled without any context set. This differs from the final disassembly in a few notable ways:

  1. All duplex sub instructions (consecutive two-byte instructions that appear at the end of a Hexagon packet) will decode as one 4-byte DUPLEX instruction
  2. All pc-relative immediates are incorrect
  3. Therefore instruction flows are typically incorrect
  4. All new-value operands are scalars because they cannot be resolved to registers
  5. Hardware endloops are not identified

When a set of straight-line instructions are disassembled, HexagonPacketAnalyzer organizes each instruction into larger packets, then sets the corresponding context registers:

  • pkt_start
  • pkt_next
  • subinsn: values 1-5 indicating A, L1, L2, S1, or S2 duplex subinstruction, respectively
  • hasnew: whether the instruction has a new-value operand
  • dotnew: the register number corresponding to R0-R31 that the dot-new operand resolves to
  • endloop: values 1-3 indicating endloop0, endloop1, or endloop01, respectively
  • duplex_next: if an immext instruction precedes a pair of duplex instructions, specifies the address of the second duplex instruction, which is the one that receives the extension

The HexagonPacketAnalyzer plugin sets the context appropriately, then re-disassembles all instructions. At this point, all new-value operands are identified as registers instead of scalars, and all single DUPLEX placeholder instructions are split up into two two-byte duplex subinstructions.

Finally, HexagonPacketAnalyzer sets fallthrough for all instructions in the packet besides the last one, and cleans up any bookmarks corresponding to disassembly errors.

ELF Relocation

As with all architectures, the Hexagon ABI specifies a handful of architecture-specific ELF relocations, which are implemented in Hexagon_ElfRelocationHandler.java.

For more information on the topic consult Qualcomm Hexagon Application Binary Interface

ParallelInstructionLanguageHelper

ParallelInstructionLanguageHelper is a pre-existing feature/plugin that you can specify in the pspec. But Hexagon requires some changes/improvements to how this feature works:

  • Added a new "Parallel Suffix" label for "}" endpacket marker
  • Added "Parallel ||" and "Parallel Suffix" labels to Function Graph by default
  • SimpleBlockModel and BasicBlockModel are enlightened to respect packet boundaries\
  • Added hook in DecompileCallbacks.getPcodePacked to ParallelInstructionLanguageHelper to fixup Pcode for entire packet before sending to decompiler

On that last point, HexagonParallelInstructionLanguageHelper needs to fixup pcode for all instructions in a packet, in order to emulate the behavior of parallel execution. This is handled in HexagonPcodeEmitPacked.