Vectorizing / Parallelizing multiple networks with different shape? #7236

lkhphuc · 2021-07-10T12:12:45Z

lkhphuc
Jul 10, 2021

I have a loss function depends on the outputs of, says, 100s different networks with different (relatively small) size:

1st network: MLP's layer size: [3, 10, 1]
2nd network: MLP's layer size: [5, 10, 1]
Each networks operates on a varying number of samples.

Up till now I have concatenate all my inputs and vmap all my function to operate on the largest possible regular shape matrix.
However, since the loss term depends on the outputs of all the networks, and the networks are of different size, I don't know any other ways to parallelize or vectorize the loss anymore, and jitting the the entire loss loop as below are just too slow:

# Jit can runs but the compilation is too slow, got multiple warnings
# Even after compilation, executing each step is way slower than non-jit
@jax.jit
def loss_fn(networks, samples):
    for net in networks:
        out += apply_fn(net, samples)
    return out

loss_grad_fn = jax.value_and_grad(loss_fn)

loss, grad = loss_grad_fn(networks, samples)

Ultimately, I would like to vmap or pmap over all networks instead of looping over them all like this:

@jax.vmap / @jax.pmap
def loss_fn(net, samples)
    out = apply_fn(net, samples)
    return out

loss_grad_fn = jax.value_and_grad(loss_fn)

networks = jnp.array([net for net network_list])  # Wouldn't work, return array of dtype=object
loss, grad = loss_grad_fn(networks, samples)

I have read / skim every single documetation's page over vmap, pmap, jax.lax.fori, jax.lax.while_loop etcetera but I don't think there is any way to achieve this. So I'm posting here in hope of somebody point out that I'm wrong.

Is my best hope of speeding this up even more is to use traditional multiprocessing paralellization in Python to parallelize the networks loop in multiple CPUs threads, since the networks are quite small to be benefited from GPU and TPU anyway?

Any though is appreciated. Thank you.

marcosrdac · 2021-07-15T00:28:08Z

marcosrdac
Jul 15, 2021

I'm running into the same problem. Do you think you can at least make batches of equally shaped inputs? If so this seems easier to answer (the easy answer might be "make batches out of data with same size and alternate them on training", but care about your loss function, it also needs to be compatible for all shapes, i.e. its results are in the same order of magnitude). As far as I know, vmap is not yet implemented in a manner so that variable shape is allowed.

1 reply

lkhphuc Jul 15, 2021
Author

Yes. At first I loop through each sample (contains many different input arrays), then loop through each networks.
Now I transpose the nested loop so that I only loop through the networks and batch compute all samples.

for sample in samples:
    for array, network in zip(sample, networks):
         out += network(array)

# Transpose the nested loop, batch the samples
for network in zip(batched_samples, networks):
    batch_out[i] = network(batch_samples[i])

This result in substantial speedup compare to the previous nested loop, though it's still slower than the original multiple nested loop Fortran, possibly because of Jax's overhead in multiple networks.

Since all my networks are independent and only needed to combine for the loss term, I will try to paralleizing through MPI or OpenMP, since I think it's the limit of Jax/XLA at the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorizing / Parallelizing multiple networks with different shape? #7236

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Vectorizing / Parallelizing multiple networks with different shape? #7236

lkhphuc Jul 10, 2021

Replies: 1 comment · 1 reply

marcosrdac Jul 15, 2021

lkhphuc Jul 15, 2021 Author

lkhphuc
Jul 10, 2021

Replies: 1 comment 1 reply

marcosrdac
Jul 15, 2021

lkhphuc Jul 15, 2021
Author