Best practice to a line-search sweep using TrainState functions #3467

shyams2 · 2023-11-07T05:26:29Z

shyams2
Nov 7, 2023

I'm working on some code where I'm trying to see if it helps to incorporate some sort of manual line search for the choice of step-size used. I'm checking across a small list of step-sizes to check the best step-size to take. This is the way I've got it currently implemented.

            temp_losses = []
            for j in range(len(step_sizes)):
                # Make a copy of the state
                temp_state = train_state.TrainState.replace(
                    model.state, tx=optax.adamw(step_sizes[j])
                )
                temp_state = model.step(temp_state, batch)
                loss = model.eval_loss(temp_state.params, batch).mean().item()
                temp_losses.append(loss)

            idx = onp.argmin(onp.array(temp_losses))
            model.state = train_state.TrainState.replace(
                model.state, tx=optax.adamw(step_sizes[idx])
            )
            model.state = model.step(model.state, batch)

Is there a better way to do this? I thought I could maybe vmap across the step-sizes but that does blow up the memory a fair bit

chiamp · 2023-11-15T22:33:33Z

chiamp
Nov 15, 2023
Collaborator

Are you trying to find the best learning rate for each iteration of the training loop? Not sure if there's a better way to do it than the code example you provided. I wonder if there's a way to write a custom optax scheduler that factors in the resulting loss depending on which learning rate you use?

1 reply

shyams2 Nov 15, 2023
Author

Yes, I'm trying to figure the ideal step size for each step. Playing with the idea of trying out a bunch of learning rates for a step and using the best one. I was considering looking into the scheduler but I think it just depends on the iteration count.

I think it's possible to remove a few redundant steps by just getting the updates and then checking for different step-sizes (assume that a constant schedule of stepsize 1 is prescribed)

    def get_updates(self, state, batch):
        grads = grad(self.loss)(state.params, batch)
        grads = pmean(grads, "num_devices")
        updates, opt_state = state.tx.update(grads, state.opt_state, state.params)
        return updates, opt_state

    def get_updated_loss(self, state, updates, batch, step_size):
        updates = jax.tree_map(lambda x: x * step_size, updates)
        params = optax.apply_updates(state.params, updates)
        loss = self.loss(params, batch)
        return loss

Then the main training loop would include the following steps:

            updates, opt_state = model.get_updates(model.state, batch)
            idx = model.get_idx_minloss(model.state, updates, batch, step_sizes)
            updates = jax.tree_map(lambda x: x * step_sizes[idx], updates)
            model.state = model.get_updated_state(model.state, updates, opt_state)

Not sure if there's any further refinements possible

cgarciae · 2023-11-21T13:16:08Z

cgarciae
Nov 21, 2023
Maintainer

You can probably jit this if you stack treat temp_losses and step_sizes as jax Arrays, not sure how much this helps though.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practice to a line-search sweep using TrainState functions #3467

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Best practice to a line-search sweep using TrainState functions #3467

shyams2 Nov 7, 2023

Replies: 2 comments · 1 reply

chiamp Nov 15, 2023 Collaborator

shyams2 Nov 15, 2023 Author

cgarciae Nov 21, 2023 Maintainer

shyams2
Nov 7, 2023

Replies: 2 comments 1 reply

chiamp
Nov 15, 2023
Collaborator

shyams2 Nov 15, 2023
Author

cgarciae
Nov 21, 2023
Maintainer