Replies: 2 comments 3 replies
-
So far, here's a list of the major changes that have happened.
To be continued in the next post some time in the future... |
Beta Was this translation helpful? Give feedback.
-
So, originally I was planning to merge the "persistence-refactor" branch into mainline this year and include the recently introduced decompiler utilities too. At least, that was the goal... However, I learned pretty recently that the entire structure/enumeration/frame API was going to be culled in favor of the local type library api that is based on Addresses, Identifiers, and Type informationIn IDA, an identifier or address is generally treated as a first-class citizen. Anything with an identifier/address can have a name, can be referenced with an xref, and can contain other information stored for it within a netnode. The structure and enumeration API actually uses this identifier internally for storing all of its information. Other than some minor additions and the member offset being shifted, the frame api is essentially the same as the structure api. Structure-related operations such as operand structure paths, structures being applied to an address, etc. accomplish this by associating the structure id with the address. Type information, however, is somewhat "optional". Not all addresses/identifiers will have a type associated with it, and the type information can even be removed/disassociated from an address/identifier. Due to this original design, this plugin has always treated type information as a secondary source of information for an address. That's not to say that the type information is ignored, but rather that it is deprioritized when determining the boundaries of the type applied to an address/identifier. Essentially, the ida-minsc plugin relies on the disassembler/decompiler to manage the type information for an address/identifier implicitly and always assumes that the "flags" for an address/identifier are the actual "truth" of the type. Support @ Hex-RaysSo, I questioned support@ about how they plan to progressively deprecate this API. Specifically, since members don't have IDs anymore and they are now coupled to the The following list of questions is from memory, so it probably doesn't cover all of the concerns I asked about. (I'll correct this eventually, but it's not like anybody reads any of the shit I post here anyways).
I also asked about how to accomplish some of these things with the current 8.3 API in order to ease transitioning to the new API. Unfortunately, I didn't get an answer other than "Here's what works for me in 8.4." What this (might) affectNow...There is a LOT of code in this plugin that is based on the structure api in its older form. All of the interval arithmetic, structure and frame members, structure encoding and decoding, utilise the soon-to-be-deprecated structure API. Type references and even serialization/deserialization are also based on this API. Essentially when this API is removed by Hex-Rays, a lot of the things that I (personally) find useful about this plugin will end up ceasing to work. Here is a non-exhaustive list:
ConclusionBecause of the uncertainty associated with the changes that are going to happen in the future, I expect the "persistence-refactor" branch will be delayed until a couple minor versions after the next major version of IDA. It is also pretty likely that during the transition to the new local type library api that plugin compatibility with certain versions of IDA will not be able to be maintained until the local type library becomes as mature as the old structure/union/frame API. The way I plan to approach this transition is by creating an entirely separate implementation of the So in the end, if I don't end up putting a gun to my head (and pulling the trigger), this will result in there being two versions of the |
Beta Was this translation helpful? Give feedback.
-
Announcement about refactoring
Currently active development is being done in an off-branch labeled "persistence-refactor" (yea, I get this is a super shitty name). With the introduction of wrappers around the Hex-Rays API, I've come to the conclusion that there's a number of things that need to maintain state and so this branch reorganizes the entire repository in order to facilitate this.
What
The major change you'll notice is that the directory structure of the plugin has been completely reorganized. This puts all "internal" modules inside the "misc" directory, all application-specific modules inside the "application" directory, and general tools under the "tools" directory. The original directory structure was in all actuality just a piece of history since this plugin grew somewhat organically. Now it's organized in a way that actually makes sense.
Some of the additions include moving state that used to be kept within the base modules into their own "temporal" modules. Examples of this include the instruction decoding which used to be part of the "instruction" module. Things like this are now temporal in that their namespace will switch depending on the current state of the disassembler. So with regards to the "instruction" modue, the operand decoders and other stuff would change depending on the processor that's detected.
Specifically, the contents of the "instruction" module will remain the same, but most of its internals are now in a module that you may import called "architecture". This "architecture" module uses the contents of the "procs" directory in order to determine an architecture's register state and how an instruction's operands are to be decoded, and if you want to add your own minsc-like decoder for an architecture it's simply a matter of dropping code in a module and registering it with minsc.
What (references)
I'm also re-working the way that references work. A user reported that the way the
function.down
api worked was slow, and he was 100% right because it was actually doing more than what the user actually wanted. Bothfunction.up
andfunction.down
actually are used for references (rather than function calls), but since their output is always an integer...it's hard to distinguish that. Really, the best way to determine the callables for a function is actually to look at the basic blocks from the flowchart, but that's not actually how people think about things. So, as a result of this conversation I had..I've decided to make references first-class integers so that you can still do arithmetic with them, but also so you don't need to unpack their address to use them with a function that takes an address. This way you can use them wherever, but you can also check if it has metadata which you can then use to infer more information about the reference.References have always had metadata attached (in their "access" attribute, which used to be "reftype") so that you can determine if a reference is being read from (load), written to (store), or executed. This was done actually via an immutable interface. But now, these will be flags that you can independently check and modify. Most importantly you can now merge references together (which will be done implicitly in certain cases). This way you can merge (union) data-references with code-references and get the exact access type that you'd expect (that being a "load" and an "execute", or just a simple "execute"). From these flags you can distinguish whether a reference requires you to dereference an address, or use it directly. So, if you combine it with the pythonic type of an operand/address (or the regular type information) you can easily determine the operations you need to perform in order to get the value for a specific reversing artifact in whatever language you use.
Why
The primary reason for this refactor is due to the Hex-Rays microregister set being separated from the disassembler's set of registers, yet still being somewhat related. The Hex-Rays decompiler also exposes modifications to these registers as intervals, so if you're trying to interact with the Hex-Rays microcode (for some reason), you'll need to go through a process to figure out which part of the register is actually being modified. The part that resolves this to the actual register (so you can promote to the full register or demote to a partial register) is already complete, and if you're using the internal debugger (as opposed to the external one) you should be able to use it to evaluate expressions to get the location being referenced as determined by Hex-Rays. This way when I integrate the "tree-sitter" parser for Hex-Rays, you can interact with the operations in mostly Python's basic types.
The other reason for refactoring the codebase like this is to completely avoid the whole
vdui_t
garbage from Hex-Rays which requires the user to be navigated to the desired function with the pseudocode view being open in order to use it. The tools insidevdui_t
are super-powerful, and is pretty big part of what you'd want to interact with. Unfortunately, it requires interactivity and is super-flakey if you're trying to interact with a microcode pass that is not currently being viewed. So to deal with this as well as being able to map anmblock_t
back to the actuallymba_t
,cfunc_t
, orcfuncptr_t
, the loader was tweaked a bit to allow support for temporal modules which can maintain state while currently inside the database.Other than
architecture
, the other major temporal module is thehooks
module. This used to be exposed via theui.hook
namespace (and still is), but now you can justimport hook
and then interact with whatever hooks you need to. This allows you to set up any kind of hook at any point where IDAPython is actually loaded. Attached to thishooks
module are different properties which allow you to attach a Python callable to any of IDA's available hooks and also includes proper support for Hex-Rays' notifications.When
You can find this inside the "persistence-refactor" branch which I'm doing all development work in. I'm keeping it in a separate branch because the changes in this branch aren't really bugfixes or incremental changes. Rather, they're still experiments. As an example, I've been experimenting with changing the data structure used for pattern matching so that it's a little more accurate and doesn't result in accidentally determining the correct callable for a constraint by doing a complete type check. Another example is that I'm considering refactoring the tag cache so that it's significantly faster on large database, possibly with the option of using native code instead of pure-Python.
Compatibility
Pretty much everything should be the same and work exactly the same if you're using the public interface. The one thing that will likely change is that you won't be able to pickle your entire namespace anymore with things like
dill
if you're trying to save your state between open databases. This is due to the whole temporal modues thing and is related to some of the strange Python trickery that I'm performing to hide objects and other state from the user. This'll probably be fixed eventually (especially if people demand it), but it's not a priority in my opinion.Conclusion
After this, I'm not sure if I'm going to keep maintaining the html documentation for each release as it's going to take some time for it to catch up to these changes. Its maintenance is a ton of work and really serves as more of an advertisement (rather than a reference) due to the idea of this plugin being to take advantage of Python's auto-completion and provide simple/useful documentation natively via Python's
help()
or the?
expression that IDAPython provides. If you feel otherwise, let me know here...or through the usual channels.Beta Was this translation helpful? Give feedback.
All reactions