Want to know what I'm doing wrong in implementation here #444

dhruvsreenivas · 2022-10-23T15:39:01Z

dhruvsreenivas
Oct 23, 2022

Hi everyone, hope you are doing well! I'm working on a research project with the DeepMind JAX ecosystem (Haiku, Optax), but for some reason, I find that when I train over a dataset, the training loss doesn't go down, as shown in this screenshot.

I'm trying to do something pretty simple: train Random Network Distillation (https://arxiv.org/abs/1810.12894, https://github.com/deepmind/acme/tree/master/acme/agents/jax/rnd) on an offline dataset of D4RL MuJoCo data. I tried a few sanity checks, including training on one random data point for some number of iterations. That loss also doesn't go down: it basically stays at 0.005 for 1000 straight epochs (shown in below screenshots):

Here are some snippets:

RND neural network + trainer code:

class RNDTrainState(NamedTuple):
    params: hk.Params
    target_params: hk.Params
    opt_state: optax.OptState

class MLPRNDModel(hk.Module):
    def __init__(self, cfg):
        super().__init__()
        
        self.encoder = hk.nets.MLP(
            [cfg.hidden_dim, cfg.hidden_dim],
            activation=jax.nn.swish
        )
        self.predictor = RNDPredictor(cfg)
    
    def __call__(self, obs):
        reprs = self.encoder(obs)
        return self.predictor(reprs)
    
class RNDModelTrainer:
    '''RND model trainer.'''
    def __init__(self, cfg):
        self.cfg = cfg
        
        if cfg.task in MUJOCO_ENVS:
            rnd_fn = lambda o: MLPRNDModel(cfg.d4rl)(o)
        else:
            rnd_fn = lambda o: ConvRNDModel(cfg.vd4rl)(o)
        
        self.rnd = hk.without_apply_rng(hk.transform(rnd_fn))
        
        # params
        key = jax.random.PRNGKey(cfg.seed)
        k1, k2 = jax.random.split(key)
        
        rnd_params = self.rnd.init(k1, jnp.zeros((1,) + tuple(cfg.obs_shape)))
        target_params = self.rnd.init(k2, jnp.zeros((1,) + tuple(cfg.obs_shape)))
        
        # optimizer
        self.rnd_opt = optax.adam(cfg.lr)
        rnd_opt_state = self.rnd_opt.init(rnd_params)
        
        self.train_state = RNDTrainState(
            params=rnd_params,
            target_params=target_params,
            opt_state=rnd_opt_state
        )
    
    @functools.partial(jax.jit, static_argnames=('self',))
    def rnd_loss_fn(self, params, target_params, obs):
        output = self.rnd.apply(params, obs)
        target_output = self.rnd.apply(target_params, obs)
        
        # no need to do jax.lax.stop_gradient, as gradient is only taken w.r.t. first param
        return jnp.mean(jnp.square(target_output - output))
    
    @functools.partial(jax.jit, static_argnames=('self',))
    def update(self, obs, step):
        del step
        
        loss_grad_fn = jax.value_and_grad(self.rnd_loss_fn)
        loss, grads = loss_grad_fn(self.train_state.params, self.train_state.target_params, obs)
        
        update, new_opt_state = self.rnd_opt.update(grads, self.train_state.opt_state)
        new_params = optax.apply_updates(self.train_state.params, update)
        
        metrics = {
            'rnd_loss': loss
        }
        
        new_train_state = RNDTrainState(
            params=new_params,
            target_params=self.train_state.target_params,
            opt_state=new_opt_state
        )
        
        return new_train_state, metrics

Training loop code:

def train_rnd(self):
        '''Train RND model offline.'''
        for epoch in trange(1, self.cfg.model_train_epochs + 1):
            epoch_metrics = defaultdict(AverageMeter)
            for batch in self.rnd_dataloader:
                obs, _, _, _, _ = batch
                self.rnd_trainer.train_state, batch_metrics = self.rnd_trainer.update(obs, self.global_step)
                
                for k, v in batch_metrics.items():
                    epoch_metrics[k].update(v, obs.shape[0])
            
            if self.cfg.wandb:
                log_dump = {k: v.value() for k, v in epoch_metrics.items()}
                wandb.log(log_dump)
            
            if self.cfg.save_model and epoch % self.cfg.model_save_every == 0:
                model_path = self.pretrained_rnd_dir / f'rnd_{epoch}.pkl'
                self.rnd_trainer.save(model_path)

def train_one_datapoint(self):
        '''Train on one datapoint for sanity checking. Loss SHOULD converge to 0.'''
        self.rng, subkey = jax.random.split(self.rng)
        rand_datapoint = jax.random.normal(key=subkey, shape=(1,) + tuple(self.cfg.obs_shape), dtype=jnp.float32)
        for epoch in trange(1, self.cfg.model_train_epochs + 1):
            self.rnd_trainer.train_state, metrics = self.rnd_trainer.update(rand_datapoint, self.global_step)
            print(f'metrics for epoch {epoch}:  {metrics["rnd_loss"]}')
            
            if self.cfg.wandb:
                wandb.log(metrics)

where self refers to a workspace with an experiment config cfg where I train and save everything of interest.

As shown, I use the optax.adam optimizer with learning rate 1e-3. This I think is standard (maybe a bit large, but I've swept through a few learning rates both larger and smaller to get the same results).

I'm wondering where I am going wrong in this training approach--I think I have it correct, but there's something that I'm certainly missing that I don't know about. Any help would be greatly appreciated! If you guys have any additional questions, I'll be happy to send you updates either on here or through a video chat. Also let me know if the Optax repo is the right place to send this msg--I don't think this is an issue yet (more on me than on the package) so I'm putting it in the discussions tab.

Regarding package versions, I am using Haiku 0.0.7, Optax 0.1.3, JAX 0.3.16 on CUDA for these experiments. I love the framework by the way!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want to know what I'm doing wrong in implementation here #444

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Want to know what I'm doing wrong in implementation here #444

dhruvsreenivas Oct 23, 2022

Replies: 0 comments

dhruvsreenivas
Oct 23, 2022