Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.2 proposal #61

Merged

Conversation

idontgetoutmuch
Copy link
Member

@idontgetoutmuch idontgetoutmuch commented May 13, 2020

Context

Following @lehins' performance analysis of Haskell pseudo-random number libraries and the ensuing discussion, @lehins, @idontgetoutmuch and @curiousleo with help from @Shimuuar set out to improve random as both an interface for and implementation of a pseudo-random number generator for Haskell.

Our goals were to fix #25 (filed in 2015) and #51 (filed in 2018), see "Quality" and "Performance" below.

In the process of tackling these two issues, we addressed a number of other issues too (see "Other issues addressed" below) and added a monadic interface to the library so monadic pseudo-random number generators can be used interchangeably with random, see "API changes" below.

This PR is the result of that effort. The changes are considerable. To signal this, we propose to release this as version 1.2 (the previous released version is 1.1, from 2014).

However, the API changes are generally backwards-compatible, see "Compatibility" below.

Quality (#25)

We created an environment for running statistical pseudo-random number generator tests, tested random v1.1 and splitmix using dieharder, TestU01, PractRand and other test suites and recorded the results.

The results clearly show that the split operation in random v1.1 produces pseudo-random number generators which are correlated, corroborating #25. The split operation in splitmix showed no weakness in our tests.

As a result, we replaced the pseudo-random number generator implementation in random by the one provided by splitmix.

Performance (#51)

@lehins' performance analysis has the data for random v1.1. It is slow, and using faster pseudo-random number generators via random v1.1 makes them slow.

By switching to splitmix and improving the API, this PR speeds up pseudo-random number generation with random by one to three orders of magnitude, depending on the number type. See Benchmarks for details.

API changes

MonadRandom

The major API addition in this PR is the definition of a new class MonadRandom:

-- | 'MonadRandom' is an interface to monadic pseudo-random number generators.
class Monad m => MonadRandom g s m | g m -> s where
  {-# MINIMAL freezeGen,thawGen,(uniformWord32|uniformWord64) #-}

  type Frozen g = (f :: Type) | f -> g
  freezeGen :: g s -> m (Frozen g)
  thawGen :: Frozen g -> m (g s)

  uniformWord32 :: g s -> m Word32 -- default implementation in terms of uniformWord64
  uniformWord64 :: g s -> m Word64 -- default implementation in terms of uniformWord32
  -- plus methods for other word sizes and for byte strings
  -- all have default implementations so the MINIMAL pragma holds

Conceptually, in MonadRandom g s m, g s is the type of the generator, s is the state type, and m the underlying monad. Via the functional dependency g m -> s, the state type is determined by the generator and monad.

Frozen is the type of the generator's state "at rest". It is defined as an injective type family via f -> g, so there is no ambiguity as to which g any Frozen g belongs to.

This definition is generic enough to accomodate, for example, the Gen type from mwc-random, which itself abstracts over the underlying primitive monad and state token. This is the full instance declaration (provided here as an example - this instance is not part of random as random does not depend on mwc-random):

instance (s ~ PrimState m, PrimMonad m) => MonadRandom MWC.Gen s m where
  type Frozen MWC.Gen = MWC.Seed
  freezeGen = MWC.save
  thawGen = MWC.restore

  uniformWord8 = MWC.uniform
  uniformWord16 = MWC.uniform
  uniformWord32 = MWC.uniform
  uniformWord64 = MWC.uniform
  uniformShortByteString n g = unsafeSTToPrim (genShortByteStringST n (MWC.uniform g))

Four MonadRandom instances ("monadic adapters") are provided for pure generators to enable their use in monadic code. The documentation describes them in detail.

Uniform and UniformRange

The Random typeclass has conceptually been split into Uniform and UniformRange. The Random typeclass is still included for backwards compatibility. Uniform is for types where it is possible to sample from the type's entire domain; UniformRange is for types where one can sample from a specified range.

Changes left out

There were changes we considered and decided against including in this PR.

Some pseudo-random number generators are splittable, others are not. A good way of communicating this is to have a separate typeclass, Splittable, say, which only splittable generators implement. After long discussions (see this issue and this PR), we decided against adding Splittable: the interface changes would either have been backwards-incompatible or too complex. For now, split stays part of the RandomGen typeclass. The new documentation suggests that split should call error if the generator is not splittable.

Due to floating point rounding, generating a floating point number in a range can yield surprising results. There are techniques to generate floating point numbers in a range with actual guarantees, but they are more complex and likely slower than the naive methods, so we decided to postpone this particular issue.

Ranges on the real number line can be inclusive or exclusive in the lower and upper bound. We considered API designs that would allow users to communicate precisely what kind of range they wanted to generate. This is particularly relevant for floating point numbers. However, we found that such an API would make more sense in conjunction with an improved method for generating floating point numbers, so we postponed this too.

Compatibility

We strove to make changes backwards compatible where possible and desirable.

The following changes may break existing packages:

  • import clashes, e.g. with the new functions uniform and uniformR
  • randomIO and randomRIO where extracted outside of Random class as separate functions, which means some packages need to adjust how they are imported
  • StdGen is no longer an instance of Read
  • requires base >= 4.10 (GHC-8.2)

In addition, genRange and next have been deprecated.

We have built all of Stackage against the code in this PR, and confirmed that no other build breakages occurred.

For more details, see this comment and the "Compatibility" section in the docs.

Other issues addressed

This PR also addresses #26, #44, #53, #55, #58 and #59, see Issues Addressed for details.

lehins and others added 30 commits February 22, 2020 20:59
…ypes

Introduce PrimMonad interface. Add range generation used in splitmix

Export all class contents

Initial stab at MonadRandom

Working MonadRandom

Rename next* to gen*
Add functionality for generating a ByteArray
…uniform-range

Implement Uniform and UniformRange
* StdGen = SMGen

* Remove dependency on "time"
This reverts commit 67952a4.
@cartazio cartazio closed this May 20, 2020
@lehins
Copy link
Contributor

lehins commented May 20, 2020

@cartazio I thought you are going to merge this PR? Why did you close it again?

@cartazio
Copy link
Contributor

shit, id din't see you repointed it, my bad :)

@cartazio cartazio reopened this May 20, 2020
@cartazio
Copy link
Contributor

i totally respect your persectives here (we're both right overall), and you're right about scope. the improvements for the APIS will all land, and i'm thinking for now we table the monadic apis from the actual release.

even if i'm sometimes a tad "grrr", that work is outstanding, and shifting out the default RNG and not letting other things block it is on me.

i will do some other RNG shuffling, but i think you'll be happy overall (though perhaps rightly frustrated with me :) ).

@cartazio cartazio merged commit 1b8d98f into haskell:wip/carter/1.1-base-forv1.2-collab May 20, 2020
@curiousleo
Copy link
Contributor

Hi @cartazio, re squashing: I also think that the commit history of this PR is valuable. A lot of thinking went into the decisions that were made, and the commit history points to the PRs where the relevant discussions took place.

There are definitely commits in the history that by themselves don't add much value, so there is an opportunity for careful squashing.

I volunteer to squash the history of this branch into one commit per PR. But only if the entire history does not get squashed into one big commit in the end. Can you confirm?

@curiousleo
Copy link
Contributor

#61 (comment) ah I guess that's a moot point now :)

@cartazio
Copy link
Contributor

@curiousleo i'm going to persist this entire diff in a design-artifact/v1.2-proposal, so dont worry :)

i'm also going to probably see about re-applpying the diff set ontop of a compacted/filtered history with v1.1 as the historical base for the new-new repo (and i"ll see about making sure all urls also point to internet archive backups if needed)

i'm actually frustrated with myself for not seeing the applicative+alternative design path YEARS ago, but c'est lie-vie

I hope that in aggregate you all find yourself more happy than annoyed with how i use this work, but i'm in a good headspace to engage on this stuff and focus (despite the plague times), and zooming out, we ALL care about good randomness :)

i'm thinking for near term i'll have a child / satellite lib for workshopping out the fun fancy stuff

@cartazio
Copy link
Contributor

design_artifacts/theLeo_and_Lehins_etal_v1.2_proposal is the stable in repo snapshot for now

@curiousleo
Copy link
Contributor

@curiousleo i'm going to persist this entire diff in a design-artifact/v1.2-proposal, so dont worry :)

That is appreciated, but beside the point: I think that a lot of the value in a good Git history lies in being able to answer the question "why is this code the way it is?" using git blame. This requires the master branch to have that history, since that is what people generally check out and run git blame on.

@lehins
Copy link
Contributor

lehins commented May 20, 2020

i'm thinking for now we table the monadic apis from the actual release.

@cartazio monadic part of the API is critical to all of the work in this PR, cutting it out of the release will be a huge mistake

@cartazio
Copy link
Contributor

cartazio commented May 20, 2020 via email

@lehins
Copy link
Contributor

lehins commented May 20, 2020

@cartazio We have a concrete design that already works. Instead of saying what is possible could you please explain what you don't like about the current design and why do you think it doesn't work?

I am very well aware of all other approaches taken in all other libraries that provide alternative monadic interface. I know for a fact that none of them have the desired property of being suitable for both pure and stateful generators at the same time, including Brent's MonadRandom package.

We've iterated on the design quite a bit and weighed all pros and cons. @Shimuuar who is the maintainer for the most popular satetful RNG library mwc-random has also participated in the design process and is onboard with this monadic interface.

Plus making sure that that the base class has a liftSampling prim operation
with the type “g->(g,a)-> m a ”

Why do you need this extra function, this is just the state :: (s -> (a, s)) -> m a function from MonadState, so what's the point of reinventing stuff here. Current design already works with any monad that implements MonadState.

Current design is the only one that provides such great performance and usability across all available non-crypto RNGs in Haskell, it works with all transformers without any need to create crap load of instances, there is literally no more work needs to be done, so could you please shed the light for me why are you so reluctant about it?

@cartazio
Copy link
Contributor

@lehins
Copy link
Contributor

lehins commented May 20, 2020

I know, and I am saying that I know about it, I studied it amongst other approaches and it does not work with mwc-random and alike

@idontgetoutmuch
Copy link
Member Author

@cartazio you will no doubt be aware that many years ago it was proposed to make MWC compatible with MonadRandom.

byorgey/MonadRandom#26

It's not clear from the discussion if this is feasible without changing MonadRandom in a breaking way.

I think you are looking at this the wrong way up. With this proposal, there is no need for MonadRandom.

@cartazio
Copy link
Contributor

... umm, @lehins ... mwc doesnt have a monadic interface for threading the generator.

@cartazio
Copy link
Contributor

(i guess that lives in random-fu)

@lehins
Copy link
Contributor

lehins commented May 20, 2020

Current PR has MonadRandom class that can be used with mwc-random, see the haddock in this PR it has examples on how it works

@idontgetoutmuch
Copy link
Member Author

@Shimuuar
Copy link
Contributor

... umm, @lehins ... mwc doesnt have a monadic interface for threading the generator.

What is PrimMonad m that appears everywhere then :)? Instead of passing state explicitly it passes state token around and modifies generator in place.

Although it's possible to have some vaguely MonadRandomish interfaces for mwc-random for monads like ReaderT (Gen (PrimState m)) m it does require breaking changes.

@cartazio
Copy link
Contributor

cartazio commented May 20, 2020 via email

@cartazio
Copy link
Contributor

cartazio commented May 20, 2020 via email

@curiousleo
Copy link
Contributor

Absolutely agree mwc needs to be sane to have a monad wrapper for.

With this PR, it does. Literal copy from the docs section "How to implement MonadRandom" (see https://htmlpreview.github.io/?https://raw.githubusercontent.com/idontgetoutmuch/random/haddock-preview/doc/System-Random-Monad.html#g:12), as @idontgetoutmuch pointed out:


Here is an example instance for the monadic pseudo-random number generator from the mwc-random package:

instance (s ~ PrimState m, PrimMonad m) => MonadRandom MWC.Gen s m where
  type Frozen MWC.Gen = MWC.Seed
  thawGen = MWC.restore
  freezeGen = MWC.save
  uniformWord8 = MWC.uniform
  uniformWord16 = MWC.uniform
  uniformWord32 = MWC.uniform
  uniformWord64 = MWC.uniform
  uniformShortByteString n g = unsafeSTToPrim (genShortByteStringST n (MWC.uniform g))

@cartazio
Copy link
Contributor

cartazio commented May 21, 2020 via email

@lehins
Copy link
Contributor

lehins commented May 21, 2020

It doesn’t need the state token in the monad random though.

@cartazio please elaborate, best with example.

If I am inferring correctly what you are saying, then it does not need s for PrimMonad becasue it works for all s ~ PrimState m. Loosing s though will prevent it from working in with things like gen stored in IORef and monad being StateT. In short, It DOES need the state token in the monad random!

@idontgetoutmuch
Copy link
Member Author

Another data point: random-fu is about x3 faster using the proposed new version:

random-fu-perf

This was referenced May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The seeds generated by split are not independent
6 participants