Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept for autogenerating the C API for expressions #6291

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kripken
Copy link
Member

@kripken kripken commented Feb 8, 2024

This just handles child expressions and vectors of them, but hopefully shows what can be done here.

For example:

#define DELEGATE_FIELD_CHILD_VECTOR(id, field)                                 \
  std::cout << "BINARYEN_API BinaryenExpressionRef Binaryen" << #id << "Get"   \
            << unpluralize(capitalize(#field))                                 \
            << "At(BinaryenExpressionRef expr, BinaryenIndex index);\n";

Those few lines take the ID of a class and the field name of that class, where the field is a vector of children, such as Call which has the operands list of values we send in the call. It generates this:

BINARYEN_API BinaryenExpressionRef BinaryenCallGetOperandAt(BinaryenExpressionRef expr, BinaryenIndex index);

The same code also generates proper code for all other vector fields like that, here is a sample (including non-vector fields too):

// Block
BINARYEN_API BinaryenExpressionRef BinaryenBlockGetListAt(BinaryenExpressionRef expr, BinaryenIndex index);

// Call
BINARYEN_API BinaryenExpressionRef BinaryenCallGetOperandAt(BinaryenExpressionRef expr, BinaryenIndex index);

// CallIndirect
BINARYEN_API BinaryenExpressionRef BinaryenCallIndirectGetTarget(BinaryenExpressionRef expr);
BINARYEN_API BinaryenExpressionRef BinaryenCallIndirectGetOperandAt(BinaryenExpressionRef expr, BinaryenIndex index);

// Try
BINARYEN_API BinaryenExpressionRef BinaryenTryGetCatchBodieAt(BinaryenExpressionRef expr, BinaryenIndex index);
BINARYEN_API BinaryenExpressionRef BinaryenTryGetBody(BinaryenExpressionRef expr);

// CallRef
BINARYEN_API BinaryenExpressionRef BinaryenCallRefGetTarget(BinaryenExpressionRef expr);
BINARYEN_API BinaryenExpressionRef BinaryenCallRefGetOperandAt(BinaryenExpressionRef expr, BinaryenIndex index);

All that is autogenerated. The proof of concept here that generates API calls for children and vectors of them emits 139 bindings so far.

The benefit is that after writing stuff like the 4 lines at the top of this comment then we can autogenerate all relevant bindings automatically with no manual work per class or per field. And whenever we add new classes we can add support for them by just running the tool, with no manual work at all.

We could similarly autogenerate other bindings perhaps, like the JS API and others.

@tlively
Copy link
Member

tlively commented Feb 8, 2024

BinaryenTryGetCatchBodieAt 😆

Do you think this will scale to everything we will need, or do you think we will need to fall back to something like parsing wasm-delegations-fields.h with a Python script that then generates the APIs?

@kripken
Copy link
Member Author

kripken commented Feb 8, 2024

Heh, you can see I fixed up capitalization and simple plurals, but not interesting ones yet... 😄 Fixing those would add a little code, but the number of such English oddities is pretty short for our use case here.

I don't think a Python script could do anything more than this? It's all really very simple stuff. Though it would also be simple to parse it from Python, if we preferred that. I think last time we discussed this topic the preference was less Python and more C++ but I don't feel strongly either way.

@tlively
Copy link
Member

tlively commented Feb 9, 2024

I just had a thought: It might improve readability if we first use the macros to collect information about each expression and its contents in some simple data structure, then separately emit the generated code based on that data structure. The main benefit would be that the macros and the core code generation logic could be understood and read separately.

@ericvergnaud
Copy link
Contributor

I suspect this approach might be a good example of the Pareto law i.e. 80% of the entry points are simple enough to be generated at 20% of the cost. But for complex entry points (such as TypeBuilderSetStructType or TypeBuilderBuildAndDispose), it might require more work to adjust the generation process than to manually create the simple entry points...
That said I'd love to be proven wrong :-)

@kripken
Copy link
Member Author

kripken commented Feb 13, 2024

@ericvergnaud This would not help with TypeBuilderSetStructType as TypeBuilder is not an Expression. We would keep writing those APIs manually. This approach only helps with Expressions, but Expressions are the large majority of the API.

I do think that all Expressions are automatable (is that a word? 🤔 ). They are really just repeated boilerplate, and a lot of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants