RNG for multiple dropout layers #3262

minkooseo · 2023-08-06T13:56:39Z

minkooseo
Aug 6, 2023

Hi

I'm trying to understand how RNGs should be passed when there are multiple dropouts. Below is one example.

from typing import Optional
from flax.training import train_state
from jax import random
from jax import lax
from flax.linen.module import merge_param
from typing import Sequence


class TrainState(train_state.TrainState):
  key: jax.random.KeyArray

      
class MyModelMultiple(nn.Module):
    num_neurons: int
    
    @nn.compact
    def __call__(self, x, training: bool):
        x = nn.Dense(self.num_neurons)(x)
        x = nn.Dropout(rate=0.5, deterministic=not training)(x)
        x = nn.Dropout(rate=0.5, deterministic=not training)(x)
        x = nn.Dropout(rate=0.5, deterministic=not training)(x)
        return x
        
@jax.jit
def train_step(state: TrainState, xs, ys, dropout_key):
    dropout_train_key = jax.random.fold_in(
        key=dropout_key, data=state.step)
    
    def loss_fn(params):
        yhats = state.apply_fn(
            {'params': params}, xs, training=True, 
            rngs={'dropout': dropout_train_key})
        loss = jnp.mean((ys - yhats) ** 2)
        return loss
    
    grad_fn = jax.value_and_grad(loss_fn)
    loss, grads = grad_fn(state.params)
    state = state.apply_gradients(grads=grads)
    return state, loss
    
model = MyModelMultiple(num_neurons=3)
rng1, rng2, rng3 = jax.random.split(dropout_key, 3)
print("* Init")
variables = model.init(params_key, xs, training=False)
params = variables['params']

state = TrainState.create(
    apply_fn=model.apply,
    params=params,
    key=dropout_key,
    tx=optax.adam(1e-3),
)

print("* Training")
for i in range(1001):
    state, loss = train_step(state, xs, ys, dropout_key)
    if i % 100 == 0:
        print(f'Iteration {i}: {loss}')

The more and more looking into this, it looks to me that those three Dropout will be using same rng which will cause those three act the same.

So I changed the code to:

from typing import Optional
from flax.linen.stochastic import KeyArray
from flax.training import train_state
from jax import random
from jax import lax
from flax.linen.module import merge_param
from typing import Sequence


class TrainState(train_state.TrainState):
  key: jax.random.KeyArray


class MyModelMultiple(nn.Module):
    num_neurons: int
    
    @nn.compact
    def __call__(self, x, training: bool):
        x = nn.Dense(self.num_neurons)(x)
        x = nn.Dropout(rate=0.5, deterministic=not training, rng_collection='drop1')(x)
        x = nn.Dropout(rate=0.5, deterministic=not training, rng_collection='drop2')(x)
        x = nn.Dropout(rate=0.5, deterministic=not training, rng_collection='drop3')(x)
        return x
        
@jax.jit
def train_step(state: TrainState, xs, ys, dropout_key):
    dropout_train_key = jax.random.fold_in(
        key=dropout_key, data=state.step)
    dkey1, dkey2, dkey3 = jax.random.split(dropout_train_key, 3)
    
    def loss_fn(params):
        yhats = state.apply_fn(
            {'params': params}, xs, training=True, 
            rngs={'drop1': dkey1, 'drop2': dkey2, 'drop3': dkey3})
        loss = jnp.mean((ys - yhats) ** 2)
        return loss
    
    grad_fn = jax.value_and_grad(loss_fn)
    loss, grads = grad_fn(state.params)
    state = state.apply_gradients(grads=grads)
    return state, loss
    
model = MyModelMultiple(num_neurons=3)
rng1, rng2, rng3 = jax.random.split(dropout_key, 3)
print("* Init")
variables = model.init(params_key, xs, training=False)
params = variables['params']

state = TrainState.create(
    apply_fn=model.apply,
    params=params,
    key=dropout_key,
    tx=optax.adam(1e-3),
)

print("* Training")
for i in range(1001):
    state, loss = train_step(state, xs, ys, dropout_key)
    if i % 100 == 0:
        print(f'Iteration {i}: {loss}')

Note that I'm using dkey1, dkey2, and key3 and rng_collection to avoid using the same key. Also I'm generating and passing those keys like this:

    dropout_train_key = jax.random.fold_in(
        key=dropout_key, data=state.step)
    dkey1, dkey2, dkey3 = jax.random.split(dropout_train_key, 3)

...

    yhats = state.apply_fn(
            {'params': params}, xs, training=True, 
            rngs={'drop1': dkey1, 'drop2': dkey2, 'drop3': dkey3})

How does this sound? Is this standard approach to use multiple dropout? I couldn't find relevant code example demonstrating multiple dropout easily.

Answered by cgarciae

Aug 16, 2023

We have long been pending a "Randomness Guide" explaining how make_rng works and its interaction with lifted transforms. For now here is basic idea (BTW this is pseudo code, internal names are different):

You have a path: tuple[str, ...] which is built by the Module system, and you have a count: int that keeps count of how many times make_rng has been called. The trick is to create a hash for the tuple (*self.path, self.count) using hashlib and create a uint32 from it, in the example below this is done in the _stable_hash method. That integer will be the fold_data you pass to jax.random.fold_in to produce a unique derived key from a root key.

  def make_rng(self) -> jax.Array:
    fold_data

View full answer

peterdavidfagan · 2023-08-08T06:03:13Z

peterdavidfagan
Aug 8, 2023

Hi @minkooseo,

The more and more looking into this, it looks to me that those three Dropout will be using same rng which will cause those three act the same.

I think the default behaviour will result in unique rng for each submodule. If you pass in the rng as part of the rng collection this should result in the self.make_rng method being executed as shown in the following line. The behaviour of this method is described here which seems to indicate uniqueness for each submodule and every call.

As a result, I don't believe you need to be concerned with passing a single rng for "dropout" when you are calling the apply method as it should handle the generation of unique rngs for dropout submodules automatically. I may have missed something so I would take my answer with a grain of salt.

2 replies

minkooseo Aug 8, 2023
Author

Thank you for reply. I read the link and it says folding in submodule name which I guessed as 'Dropout'.

Could you help me finding the code that uses name fold in? That will clarify my confusion.

peterdavidfagan Aug 8, 2023

Could you help me finding the code that uses name fold in? That will clarify my confusion.

I found the definition here under flax/core/scope.py, it is called in the Module definition here.

I believe that the counter increment should ensure that you get a unique rng for each call while the submodule name should also be leveraged to ensure uniqueness.

It's possible to trace back to the fold_in calls from this point starting from LazyRng.create which can be found here.

Hopefully this helps.

minkooseo · 2023-08-08T11:51:59Z

minkooseo
Aug 8, 2023
Author

Thank you for discussion.

I think the 'scope' is playing the role of preventing 'Dropout' using the same 'dropout' rng key resulting in the same random numbers.

flax/flax/core/scope.py

Lines 637 to 652 in fe54d39

    
           rngs = {key: LazyRng.create(rng, name) for key, rng in self.rngs.items()} 
        
           rng_key = (child_rng_token, name) 
        
           if rng_key in self.rng_counters: 
        
             rng_counters = self.rng_counters.get(rng_key)  # type: ignore 
        
           else: 
        
             rng_counters = {key: 0 for key in rngs} 
        
             self.rng_counters[rng_key] = rng_counters  # type: ignore 
        
           scope = Scope( 
        
               {}, 
        
               name=name, 
        
               rngs=rngs, 
        
               parent=self, 
        
               mutable=self.mutable, 
        
               path=self.path + (name,), 
        
               flags=self.flags, 
        
           )

Test code:

import jax
import jax.numpy as jnp
from flax import linen as nn
from flax.linen.module import merge_param
from typing import Sequence, Optional
from flax.core.scope import Scope

# Copy paste of Dropout, but added debug message of rng key
class MyDropout(nn.Module):
  rate: float
  broadcast_dims: Sequence[int] = ()
  deterministic: Optional[bool] = None
  rng_collection: str = 'dropout'

  @nn.compact
  def __call__(self, inputs, deterministic: Optional[bool] = None):
    deterministic = merge_param(
        'deterministic', self.deterministic, deterministic)

    if (self.rate == 0.) or deterministic:
      return inputs

    # Prevent gradient NaNs in 1.0 edge-case.
    if self.rate == 1.0:
      return jnp.zeros_like(inputs)

    keep_prob = 1. - self.rate
    rng = self.make_rng(self.rng_collection)
    jax.debug.print(f'{self.parent=}, {self.name=}, {self.scope.name=}, {self.scope.rngs.keys()}, {self.scope.rngs.values()}, {self.scope.rng_counters.keys()=}, {self.scope.rng_counters.values()=}, {self.rng_collection=}, {str(rng)=}')
    broadcast_shape = list(inputs.shape)
    for dim in self.broadcast_dims:
      broadcast_shape[dim] = 1
    mask = jax.random.bernoulli(rng, p=keep_prob, shape=broadcast_shape)
    mask = jnp.broadcast_to(mask, inputs.shape)
    return jax.lax.select(mask, inputs / keep_prob, jnp.zeros_like(inputs))

class MyModel(nn.Module):
    dropout_rate: float
    
    @nn.compact
    def __call__(self, x, is_training: bool):
        x = nn.Dense(10)(x)
        x = MyDropout(rate=0.5, deterministic=not is_training)(x)
        x = MyDropout(rate=0.5, deterministic=not is_training)(x)
        return x

root_key = jax.random.PRNGKey(0)
params_key, dropout_key, data_key = jax.random.split(root_key, 3)
model = MyModel(0.5)
x = jax.random.uniform(data_key, (10, 10))
params = model.init(params_key, x, is_training=False)['params']
y = model.apply({'params': params}, x, is_training=True, rngs={'dropout': dropout_key})

Output:

self.parent=MyModel(
    # attributes
    dropout_rate = 0.5
    # children
    Dense_0 = Dense(
        # attributes
        features = 10
        use_bias = True
        dtype = None
        param_dtype = float32
        precision = None
        kernel_init = init
        bias_init = zeros
        dot_general = dot_general
    )
    MyDropout_0 = MyDropout(
        # attributes
        rate = 0.5
        broadcast_dims = ()
        deterministic = False
        rng_collection = 'dropout'
    )
), self.name='MyDropout_0', self.scope.name='MyDropout_0', dict_keys(['dropout']), dict_values([LazyRng(rng=Array([3186719485, 3840466878], dtype=uint32), suffix=('MyDropout_0',))]), self.scope.rng_counters.keys()=dict_keys(['dropout']), self.scope.rng_counters.values()=dict_values([1]), self.rng_collection='dropout', str(rng)='[3900928851 2520777170]'
self.parent=MyModel(
    # attributes
    dropout_rate = 0.5
    # children
    Dense_0 = Dense(
        # attributes
        features = 10
        use_bias = True
        dtype = None
        param_dtype = float32
        precision = None
        kernel_init = init
        bias_init = zeros
        dot_general = dot_general
    )
    MyDropout_0 = MyDropout(
        # attributes
        rate = 0.5
        broadcast_dims = ()
        deterministic = False
        rng_collection = 'dropout'
    )
    MyDropout_1 = MyDropout(
        # attributes
        rate = 0.5
        broadcast_dims = ()
        deterministic = False
        rng_collection = 'dropout'
    )
), self.name='MyDropout_1', self.scope.name='MyDropout_1', dict_keys(['dropout']), dict_values([LazyRng(rng=Array([3186719485, 3840466878], dtype=uint32), suffix=('MyDropout_1',))]), self.scope.rng_counters.keys()=dict_keys(['dropout']), self.scope.rng_counters.values()=dict_values([1]), self.rng_collection='dropout', str(rng)='[3553266583 2045193556]'

So, each 'MyDropout' has suffix of 'MyDropout_0' and 'MyDropout_1'. They're given the same seed of [3186719485, 3840466878] and using the same counter key of dropout. But their 'suffix' (or, the scope names) are folded in to prevent them resulting in same randon numbers.

Thus, it's suffice to give 'dropout' rng only in the code.

1 reply

peterdavidfagan Aug 8, 2023

Thanks @minkooseo for the discussion and detailed response above.

cgarciae · 2023-08-16T21:40:15Z

cgarciae
Aug 16, 2023
Maintainer

We have long been pending a "Randomness Guide" explaining how make_rng works and its interaction with lifted transforms. For now here is basic idea (BTW this is pseudo code, internal names are different):

You have a path: tuple[str, ...] which is built by the Module system, and you have a count: int that keeps count of how many times make_rng has been called. The trick is to create a hash for the tuple (*self.path, self.count) using hashlib and create a uint32 from it, in the example below this is done in the _stable_hash method. That integer will be the fold_data you pass to jax.random.fold_in to produce a unique derived key from a root key.

  def make_rng(self) -> jax.Array:
    fold_data = self._stable_hash((*self.path, self.count))
    self.count += 1
    return random.fold_in(self.root, fold_data)

  @staticmethod
  def _stable_hash(data: tuple[int | str, ...]) -> int:
    hash_str = " ".join(str(x) for x in data)
    _hash = hashlib.blake2s(hash_str.encode())
    hash_bytes = _hash.digest()
    # uint32 is represented as 4 bytes in big endian
    return int.from_bytes(hash_bytes[:4], byteorder="big")

1 reply

peterdavidfagan Aug 17, 2023

Thanks @cgarciae.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNG for multiple dropout layers #3262

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

RNG for multiple dropout layers #3262

minkooseo Aug 6, 2023

Replies: 3 comments · 4 replies

peterdavidfagan Aug 8, 2023

minkooseo Aug 8, 2023 Author

peterdavidfagan Aug 8, 2023

minkooseo Aug 8, 2023 Author

peterdavidfagan Aug 8, 2023

cgarciae Aug 16, 2023 Maintainer

peterdavidfagan Aug 17, 2023

minkooseo
Aug 6, 2023

Replies: 3 comments 4 replies

peterdavidfagan
Aug 8, 2023

minkooseo Aug 8, 2023
Author

minkooseo
Aug 8, 2023
Author

cgarciae
Aug 16, 2023
Maintainer