Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize dict, list, and array inputs #617

Open
nwlandry opened this issue Nov 12, 2024 · 3 comments
Open

Standardize dict, list, and array inputs #617

nwlandry opened this issue Nov 12, 2024 · 3 comments

Comments

@nwlandry
Copy link
Collaborator

In uniform_hypergraph_configuration_model, it only accepts dict inputs. I propose that we create a function called to_array or something like this where it will standardize the inputs so that functions which accept a sequence can accept several types of input.

def to_array(x, return_indices=False):
    if isinstance(x, dict):
        a = np.array(list(x.values()))
        idx = np.array(list(x))
    elif isinstance(x, (list, np.ndarray)):
        a = np.array(x)
        idx = np.arange(a.shape[0], dtype=int)
    else:
        raise Exception("Invalid sequence type!")

    if return_indices:
        return a, idx
    else:
        return a
@tlarock
Copy link
Collaborator

tlarock commented Nov 12, 2024

The purpose of the dict input is to support arbitrary node identifiers, right? So adding the list would allow for a default behavior (probably the most common use case tbh) of just having node identifiers 0,...,n-1 inferred from the length of the list? I support adding that functionality and I don't see an alternative to the solution you propose.

This would not change the code interface to a function like uniform_hypergraph_configuration_model, correct? Only the documentation to specify that either a dict or list may be passed and the implications of each (in terms of node identifiers). Then the internals modified to deal with the return of the to_array function rather than what they do now.

We should raise a more informative error in the real implementation. imo it would also be nice from a user perspective to have the error raise from the function I called if possible, rather than an internal one that I've never seen.

Unrelated (I can open a separate issue if needed), but can we please not use id as a variable name in uniform_hypergraph_configuration_model as that is a built-in function in python.

@nwlandry
Copy link
Collaborator Author

The purpose of the dict input is to support arbitrary node identifiers, right? So adding the list would allow for a default behavior (probably the most common use case tbh) of just having node identifiers 0,...,n-1 inferred from the length of the list? I support adding that functionality and I don't see an alternative to the solution you propose.

Great, I will work on implementing this.

This would not change the code interface to a function like uniform_hypergraph_configuration_model, correct? Only the documentation to specify that either a dict or list may be passed and the implications of each (in terms of node identifiers). Then the internals modified to deal with the return of the to_array function rather than what they do now.

Correct. It should be backwards-compatible, but allow more diverse input.

We should raise a more informative error in the real implementation. imo it would also be nice from a user perspective to have the error raise from the function I called if possible, rather than an internal one that I've never seen.

Yes. This is a great suggestion.

Unrelated (I can open a separate issue if needed), but can we please not use id as a variable name in uniform_hypergraph_configuration_model as that is a built-in function in python.

Yeah, I agree, we really shouldn't do this 😬. Unfortunately, it is in other modules so if you could open an issue to address this library-wide, that would be amazing.

@leotrs
Copy link
Collaborator

leotrs commented Nov 12, 2024

NetworkX solves this problem with the concept of an nbunch. As I'm sure yall know, most functions accept an nbunch, so they have abstracted the requirement of "one, many, or all nodes" away into a single kind of object that is used throughout their entire interface. I find that pretty elegant, and not unlike the solution @nwlandry is suggesting.

However, I don't know how it is implemented. I don't know if it is a Protocol and they use duck typing or if they decorate every single function that accepts an nbunch (like they do for functions that are random) or if it's hard-coded as an input validation step throughout their codebase.

FWIF, I think the first of these (using Protocol or some other version of structural typing) is the most elegant. More info here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants