Standardize dict, list, and array inputs #617

nwlandry · 2024-11-12T12:49:30Z

In uniform_hypergraph_configuration_model, it only accepts dict inputs. I propose that we create a function called to_array or something like this where it will standardize the inputs so that functions which accept a sequence can accept several types of input.

def to_array(x, return_indices=False):
    if isinstance(x, dict):
        a = np.array(list(x.values()))
        idx = np.array(list(x))
    elif isinstance(x, (list, np.ndarray)):
        a = np.array(x)
        idx = np.arange(a.shape[0], dtype=int)
    else:
        raise Exception("Invalid sequence type!")

    if return_indices:
        return a, idx
    else:
        return a

The text was updated successfully, but these errors were encountered:

tlarock · 2024-11-12T13:49:58Z

The purpose of the dict input is to support arbitrary node identifiers, right? So adding the list would allow for a default behavior (probably the most common use case tbh) of just having node identifiers 0,...,n-1 inferred from the length of the list? I support adding that functionality and I don't see an alternative to the solution you propose.

This would not change the code interface to a function like uniform_hypergraph_configuration_model, correct? Only the documentation to specify that either a dict or list may be passed and the implications of each (in terms of node identifiers). Then the internals modified to deal with the return of the to_array function rather than what they do now.

We should raise a more informative error in the real implementation. imo it would also be nice from a user perspective to have the error raise from the function I called if possible, rather than an internal one that I've never seen.

Unrelated (I can open a separate issue if needed), but can we please not use id as a variable name in uniform_hypergraph_configuration_model as that is a built-in function in python.

nwlandry · 2024-11-12T15:45:23Z

The purpose of the dict input is to support arbitrary node identifiers, right? So adding the list would allow for a default behavior (probably the most common use case tbh) of just having node identifiers 0,...,n-1 inferred from the length of the list? I support adding that functionality and I don't see an alternative to the solution you propose.

Great, I will work on implementing this.

This would not change the code interface to a function like uniform_hypergraph_configuration_model, correct? Only the documentation to specify that either a dict or list may be passed and the implications of each (in terms of node identifiers). Then the internals modified to deal with the return of the to_array function rather than what they do now.

Correct. It should be backwards-compatible, but allow more diverse input.

We should raise a more informative error in the real implementation. imo it would also be nice from a user perspective to have the error raise from the function I called if possible, rather than an internal one that I've never seen.

Yes. This is a great suggestion.

Unrelated (I can open a separate issue if needed), but can we please not use id as a variable name in uniform_hypergraph_configuration_model as that is a built-in function in python.

Yeah, I agree, we really shouldn't do this 😬. Unfortunately, it is in other modules so if you could open an issue to address this library-wide, that would be amazing.

leotrs · 2024-11-12T20:00:12Z

NetworkX solves this problem with the concept of an nbunch. As I'm sure yall know, most functions accept an nbunch, so they have abstracted the requirement of "one, many, or all nodes" away into a single kind of object that is used throughout their entire interface. I find that pretty elegant, and not unlike the solution @nwlandry is suggesting.

However, I don't know how it is implemented. I don't know if it is a Protocol and they use duck typing or if they decorate every single function that accepts an nbunch (like they do for functions that are random) or if it's hard-coded as an input validation step throughout their codebase.

FWIF, I think the first of these (using Protocol or some other version of structural typing) is the most elegant. More info here.

tlarock mentioned this issue Nov 12, 2024

Remove instances of variable name id across the codebase #619

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize dict, list, and array inputs #617

Standardize dict, list, and array inputs #617

nwlandry commented Nov 12, 2024

tlarock commented Nov 12, 2024

nwlandry commented Nov 12, 2024

leotrs commented Nov 12, 2024

Standardize dict, list, and array inputs #617

Standardize dict, list, and array inputs #617

Comments

nwlandry commented Nov 12, 2024

tlarock commented Nov 12, 2024

nwlandry commented Nov 12, 2024

leotrs commented Nov 12, 2024