Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: Rename rnd to rng and use instead of seed if applicable. #828

Merged
merged 6 commits into from
Oct 9, 2024

Conversation

Debilski
Copy link
Member

@Debilski Debilski commented Oct 4, 2024

Renamed rnd to rng (which seems to be the numpy accepted name) and changed get_random_layout to use the rng object instead of the seed.

Open question (need to think about it a bit more): How to deal with the seed argument in setup_game and run_background_game.

The big idea is to get away from global random state (either through the random module or with cheating via RNG = random.Random()). There should be one rng generator that should be passed around to all functions that rely on it.

Closes #827

@otizonaizit
Copy link
Member

I think that one decision we should take is if we want to keep on using the stdlib random module or switch to np.random. As far as I can tell, np.random guarantees reproducible random numbers for ever, i.e. even for future versions of numpy. Not sure what the stdlib random module guarantees.

@Debilski
Copy link
Member Author

Debilski commented Oct 4, 2024

I don’t see that guarantee as a thing we should care about. What would be the long-term use case?

@otizonaizit
Copy link
Member

otizonaizit commented Oct 4, 2024 via email

@Debilski
Copy link
Member Author

Debilski commented Oct 4, 2024

I don’t see that guarantee as a thing we should care about. What would be the long-term use case?
fix a bug in a bot 10 years after the facts maybe or replay that game at the tournament in Heraklion to your grand-children ;)

A reasonable argument for sure, but you’d also need the 10-year old Pelita for that (chances are that we ourselves have changed the order of the calls to rng – this PR already changes it for example) and likely also some old-enough Python to be able to run it.

I’m open for changes in case you find the numpy version nicer. Maybe you prefer rng.integers to rng.randint or you think that it is useful for the students to get an array of random numbers quickly?

@otizonaizit
Copy link
Member

otizonaizit commented Oct 4, 2024 via email

@Debilski
Copy link
Member Author

Debilski commented Oct 4, 2024

Yeah, if you want, you can now easily plug your own stable RNG. ;)

class StableRNG(random.Random):
    def random(self):
        return 0.42

run_background_game(..., rng=StableRNG())

@Debilski
Copy link
Member Author

Debilski commented Oct 4, 2024

What I like about the numpy style is that it allows one to pass a seed directly to rng and the handling is all done in one line. To do it properly, we have to write

def blah(..., rng=None):
    if not isinstance(rng, random.Random):
        rng = random.Random(rng)

which looks a bit unbalanced.

@Debilski Debilski force-pushed the feature/use_rng_instead_of_seed branch 6 times, most recently from 806bd4f to 7fef122 Compare October 4, 2024 21:15
@Debilski Debilski force-pushed the feature/use_rng_instead_of_seed branch from 7fef122 to c37f331 Compare October 4, 2024 21:19
@Debilski Debilski marked this pull request as ready for review October 7, 2024 13:07
@Debilski
Copy link
Member Author

Debilski commented Oct 7, 2024

I think we can merge this? @otizonaizit

The tournament needs some further handling (#833) but this PR gets us half-way there at least.

I have kept the seed=... arguments for all functions that are meant to be used by users (and when this reflects an existing command line argument) and where it is only possible to pass a seed.

In most other cases rng=... can take a seed, an instance of random.Random or None, which is then normalised by calling

    if not isinstance(rng, random.Random):
        rng = random.Random(rng)

We could make this a library function and do it all in a one line assignment (just like numpy does it) but we don’t have a good place where to put it. (pelita/utils.py feels more like utils for external users of Pelita not internal stuff; maybe I should add pelita/internal.py?).

There are some inner functions (mostly the underscore functions in maze_generator.py) where rng does not have the default argument of None. In this case only instances of random.Random are accepted (no seeds). This is on purpose as I think there is no reason here to make this optional for convenience reasons.

I am inclined to add type annotations Random | int | None to all arguments but I don’t want to start a flame war. :)

@otizonaizit
Copy link
Member

I think we can merge this? @otizonaizit

This is an impressive amount of work, thanks! Two comments/questions before merging:

The tournament needs some further handling (#833) but this PR gets us half-way there at least.

what does that mean? I think you mean #833 ? I was under the impression that the tournament was already reproducible (in this context reproducible and predictable is the same, or do you mean something different?). What is missing?

I have kept the seed=... arguments for all functions that are meant to be used by users (and when this reflects an existing command line argument) and where it is only possible to pass a seed.

That is very useful, yes!

In most other cases rng=... can take a seed, an instance of random.Random or None, which is then normalised by calling

    if not isinstance(rng, random.Random):
        rng = random.Random(rng)

We could make this a library function and do it all in a one line assignment (just like numpy does it) but we don’t have a good place where to put it. (pelita/utils.py feels more like utils for external users of Pelita not internal stuff; maybe I should add pelita/internal.py?).

Why not just use everywhere the numpy RNG? It seems thread-safe, well thought, and allows for a one-liner everywhere, which seems nicer?

There are some inner functions (mostly the underscore functions in maze_generator.py) where rng does not have the default argument of None. In this case only instances of random.Random are accepted (no seeds). This is on purpose as I think there is no reason here to make this optional for convenience reasons.

OK, agreed!

I am inclined to add type annotations Random | int | None to all arguments but I don’t want to start a flame war. :)

Very good. We should keep flame wars for in-person meetings ;-)

@Debilski
Copy link
Member Author

Debilski commented Oct 7, 2024

I think we can merge this? @otizonaizit

This is an impressive amount of work, thanks! Two comments/questions before merging:

The tournament needs some further handling (#833) but this PR gets us half-way there at least.

what does that mean? I think you mean #833 ? I was under the impression that the tournament was already reproducible (in this context reproducible and predictable is the same, or do you mean something different?). What is missing?

Yeah, sorry, that wasn’t super exact. The tournament is (and was) reproducible and now with the rng it is even reproducible in a traceable manner and not just based on the right order of imports.
However, when the tournament (or individual games) is restarted and restored from the state.yaml file, the rng will give different results since it will be called a different number of times. This is what #833 is trying to solve: Making even the randomness predictable based on the match id or whatever.

In most other cases rng=... can take a seed, an instance of random.Random or None, which is then normalised by calling

    if not isinstance(rng, random.Random):
        rng = random.Random(rng)

We could make this a library function and do it all in a one line assignment (just like numpy does it) but we don’t have a good place where to put it. (pelita/utils.py feels more like utils for external users of Pelita not internal stuff; maybe I should add pelita/internal.py?).

Why not just use everywhere the numpy RNG? It seems thread-safe, well thought, and allows for a one-liner everywhere, which seems nicer?

Yeah, in the end I didn’t want to import numpy just for that function, which is basically just a function of three lines ( https://github.com/numpy/numpy/blob/18507bb9f8f50bac415999f832bcbed0766b1f69/numpy/random/_generator.pyx#L5067-L5075 )

But we can try it. Unfortunately it uses a different algorithm and returns slightly different types.

    import random
    import numpy as np

    random.Random(1).randint(0, 10)
    > 2

    np.random.default_rng(1).integers(0, 10)
    > np.int64(4)

    random.Random(1).random()
    > 0.13436424411240122

    np.random.default_rng(1).random()
    > 0.5118216247002567

I’m 60:40 to keeping plain Python in the backend but if you prefer the other I’ll change it (maybe in a separate issue: We would need to discuss whether we also need to change bot.random to the numpy version then).

@otizonaizit
Copy link
Member

Yeah, sorry, that wasn’t super exact. The tournament is (and was) reproducible and now with the rng it is even reproducible in a traceable manner and not just based on the right order of imports. However, when the tournament (or individual games) is restarted and restored from the state.yaml file, the rng will give different results since it will be called a different number of times. This is what #833 is trying to solve: Making even the randomness predictable based on the match id or whatever.

Ah yes, now I understand. Cool, #833 isn't a blocker for this PR and it would be a new feature anyway, so all is fine.

Why not just use everywhere the numpy RNG? It seems thread-safe, well thought, and allows for a one-liner everywhere, which seems nicer?

Yeah, in the end I didn’t want to import numpy just for that function, which is basically just a function of three lines ( https://github.com/numpy/numpy/blob/18507bb9f8f50bac415999f832bcbed0766b1f69/numpy/random/_generator.pyx#L5067-L5075 )

But we can try it. Unfortunately it uses a different algorithm and returns slightly different types.

    import random
    import numpy as np

    random.Random(1).randint(0, 10)
    > 2

    np.random.default_rng(1).integers(0, 10)
    > np.int64(4)

    random.Random(1).random()
    > 0.13436424411240122

    np.random.default_rng(1).random()
    > 0.5118216247002567

I’m 60:40 to keeping plain Python in the backend but if you prefer the other I’ll change it (maybe in a separate issue: We would need to discuss whether we also need to change bot.random to the numpy version then).

I didn't think about the fact the numpy would return different types... hmm, it seems to me that then it makes more sense to keep the Python random module: we don't want numpy scalars to appear instead of Python integers...

But then just plug the wrapper into pelita/utils.py. That function can be useful for users too, no? And it would hide the ugly isinstance thing that is hurting my eyes ;-) We can bikeshed the name, but I guess default_rng sounds pretty much the right name already?

@Debilski
Copy link
Member Author

Debilski commented Oct 7, 2024

(I missed a few files apparently. Working on it)

@Debilski Debilski force-pushed the feature/use_rng_instead_of_seed branch from bd75074 to bd349cd Compare October 7, 2024 16:03
@Debilski
Copy link
Member Author

Debilski commented Oct 7, 2024

But then just plug the wrapper into pelita/utils.py. That function can be useful for users too, no? And it would hide the ugly isinstance thing that is hurting my eyes ;-) We can bikeshed the name, but I guess default_rng sounds pretty much the right name already?

Fine with default_rng. We are getting circular imports now however with utils.py (utils.py imports game.py imports utils.py). All the pelita-internal stuff that all methods want to use needs to live somewhere else; we need to do a bit of rethinking where to put global variables and global functions …

@otizonaizit
Copy link
Member

But then just plug the wrapper into pelita/utils.py. That function can be useful for users too, no? And it would hide the ugly isinstance thing that is hurting my eyes ;-) We can bikeshed the name, but I guess default_rng sounds pretty much the right name already?

Fine with default_rng. We are getting circular imports now however with utils.py (utils.py imports game.py imports utils.py). All the pelita-internal stuff that all methods want to use needs to live somewhere else; we need to do a bit of rethinking where to put global variables and global functions …

Do you plan to go for default_rng in this PR? If utils.py is no good for now, just create private_tools.py or something like that. The big refactoring can be postponed in my opinion.

@Debilski
Copy link
Member Author

Debilski commented Oct 9, 2024

Should be good to merge then

@otizonaizit otizonaizit merged commit 7cf29c1 into ASPP:main Oct 9, 2024
27 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 9, 2024
RF: Rename rnd to rng and use instead of seed if applicable. 7cf29c1
@Debilski Debilski deleted the feature/use_rng_instead_of_seed branch October 9, 2024 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adopt SciPy SPEC 7 on Seeding Pseudo-Random Number Generation
2 participants