Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: generate enums as interface + enum + class for unknown values #288

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Mar 25, 2024

This PR implements suggestion from kaitai-io/kaitai_struct#778 which is also enforced by enum_invalid.ksy test.

Related issues:

The new generated enum is represented as an interface with two nested classes:

  • enum class Known with the list of known enumeration values
  • class Unknown which represents any not-known value

The enum enum would be generated as:

interface Enum extends IKaitaiEnum {
    public static class Unknown extends IKaitaiEnum.Unknown implements Enum {
        private Unknown(long id) { super(id); }
    }
    public enum Known implements Enum {
        Variant1(1),
        Variant2(2),
        Variant(3);

        private final long id;

        static HashMap<Long, Enum> variants = new HashMap<>(3);
        static {
            for (final Known e : Known.values()) {
                variants.put(e.id, e);
            }
        }

        private Known(long id) { this.id = id; }
        @Override
        public long id() { return this.id; }
    }

    public static Enum byId(final long id) {
        return Known.variants.computeIfAbsent(id, _id -> new Unknown(id));
    }
}

Example

Unfortunately, it is not possible to generate enum what will implement nested interface due to cyclic reference. If that would be possible, we would be protected against name clashes. In the current implementation the names Known and Unknown can clash with enum name itself.

Current implementation uses Java 8 features (Map.computeIfAbsent and lambdas). I think, that this should be OK, but if not, it is possible to rewrite it in a Java 7 compatible code.

Probably not all cases handled, at least I did not test switch generation. If corresponding runtime changes will be merged, then CI would help with that.

Fixes the following https://ci.kaitai.io tests:

  • EnumToIInvalid

@generalmimon generalmimon changed the title Generate enums as interface + enum + class for unknown values Java: generate enums as interface + enum + class for unknown values Aug 17, 2024
Copy link
Member

@generalmimon generalmimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun Thanks for suggesting a possible implementation. I don't like the fact that the Known and Unknown classes are nested in the interface, though. The main problem with this that which bothers me a lot is that it makes the access to concrete enum values (which is something user applications working with generated Java parsers will be probably doing most of the time when working with enums) very cumbersome and awkward because .Known always needs to be added (for example, Enum0.Animal.Known.CAT instead of Enum0.Animal.CAT), as is very visible in https://github.com/kaitai-io/kaitai_struct_tests/pull/125/files. And even users that don't care about unknown enum values at all would have to pay this "cost" of having .Known everywhere, which means they would be paying for something they don't use.

The structure suggested by @dgelessus in kaitai-io/kaitai_struct#778 (comment) makes more sense to me overall. It doesn't change the way known enum values are accessed, which is good. And prefixing the enum name with I to name the interface that can represent both known and unknown case is elegant within reason, even though it's supposedly not idiomatic to start the interface names with I in Java (see https://www.baeldung.com/java-interface-naming-conventions#examples-of-incorrect-interface-naming) - it is only a convention in C#.

It unfortunately allows for a name collision with another type or enum whose name matches i_{enum_name} where {enum_name} is the name of an existing enum (related: kaitai-io/kaitai_struct#86), but that's not really that much of a problem. We should just "teach" the compiler that defining an enum called {enum_name} reserves not only {enum_name} itself (you cannot define a type that is also called {enum_name}, although even this is currently unchecked by the compiler, as noted in kaitai-io/kaitai_struct#1019 (comment)) but also i_{enum_name}. So if there is another type or enum called i_{enum_name}, the compiler should reject the .ksy spec with an error that this name is already reserved by the {enum_name} enum and another name must be used.

Another possible collision is that a user-specified unknown_{enum_name} type might collide with the auto-generated Unknown{EnumName} class, but as @dgelessus also pointed out, that class doesn't need to be public: if it isn't, the compiler could actually rename the class arbitrarily to avoid the name collision.

(Just to be clear, these strategies to combat name collision are just me thinking out loud, I definitely don't want any of this in this PR.)

But as you mentioned, your solution also doesn't have the advantage of avoiding name clashes:

Unfortunately, it is not possible to generate enum what will implement nested interface due to cyclic reference. If that would be possible, we would be protected against name clashes. In the current implementation the names Known and Unknown can clash with enum name itself.

So I don't really see the benefit of the approach with nested classes. I'd really prefer the design proposed by @dgelessus in kaitai-io/kaitai_struct#778 (comment).

But I like the code reuse mechanism that you used for Unknown (i.e. having the implementation in the runtime library so that it doesn't have to be repeated over and over again in the generated code).

@Mingun
Copy link
Contributor Author

Mingun commented Aug 19, 2024

Ok, I changed representation to that code, which is protected from any clashes in names (proof). Enum named enum would be generated as:

// interface to abstract known and unknown values
interface IEnum extends IKaitaiEnum {
    // Storage for unknown values
    public static class Unknown extends IKaitaiEnum.Unknown implements IEnum {
        Unknown(long id) { super(id); }

        @Override
        public String toString() { return "Enum(" + this.id + ")"; }

        @Override
        public int hashCode() {
            final int result = 31 + "Enum".hashCode();
            return 31 * result + Long.hashCode(this.id);
        }

        @Override
        public boolean equals(Object other) {
            return other instanceof Unknown && this.id == ((Unknown)other).id;
        }
    }
}
// Storage for known values
class Enum implements IEnum {
    VARIANT_1(1),
    VARIANT_2(2),
    VARIANT_3(3);

    private final long id;

    private static HashMap<Long, IEnum> variants = new HashMap<>(3);
    static {
        for (final Enum e : values()) {
            variants.put(e.id, e);
        }
    }

    public static IEnum byId(final long id) {
        return variants.computeIfAbsent(id, _id -> new IEnum.Unknown(id));
    }

    private Enum(long id) { this.id = id; }
    @Override
    public long id() { return this.id; }
}

EDIT: seems to all working now. I checked generated tests in tests/compiled subdirectory and everything looks fine. I decided to generate hashCode in the generated part of enum to include enum name in hash calculations. this should protect if you will try to put different enums with the same ids into the HashMap.

Mingun and others added 2 commits September 8, 2024 08:48
The new generated enum is represented as an interface with two nested classes:
- enum class Known with the list of known enumeration values
- class Unknown which represents any not-known value

The enum `enum` would be generated as:

```java
// interface to abstract known and unknown values
interface IEnum extends IKaitaiEnum {
    // Storage for unknown values
    public static class Unknown extends IKaitaiEnum.Unknown implements IEnum {
        Unknown(long id) { super(id); }
        @OverRide
        public String toString() { return "Enum(" + this.id + ")"; }
        @OverRide
        public int hashCode() {
            // Analogues to this code, but without boxing
            // return Objects.hash("Enum", this.id);
            final int result = 31 + "Enum".hashCode();
            return 31 * result + Long.hashCode(this.id);
        }
        @OverRide
        public boolean equals(Object other) { return other instanceof Enum && this.id == ((Enum)other).id; }
    }
}
// Storage for known values
class Enum implements IEnum {
    VARIANT_1(1),
    VARIANT_2(2),
    VARIANT_3(3);

    private final long id;

    private static HashMap<Long, IEnum> variants = new HashMap<>(3);
    static {
        for (final Enum e : values()) {
            variants.put(e.id, e);
        }
    }

    public static IEnum byId(final long id) {
        return variants.computeIfAbsent(id, _id -> new IEnum.Unknown(id));
    }

    private Enum(long id) { this.id = id; }
    @OverRide
    public long id() { return this.id; }
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve support for debugging by throwing on missing ENUMs.
2 participants