Porting PyTorch weight_norm, trouble with Flax kernel_init for WeightNorm #4131

DBraun · 2024-08-19T14:11:51Z

DBraun
Aug 19, 2024

I'm trying to port some PyTorch code to Flax. The model involves Conv2d layers wrapped with weight_norm. There are also LeakyReLU activations except on the last layer. I've confirmed that the parameter counts and input/output shapes are the same between PyTorch and Flax, and yet the mean/min/max/std of the outputs seem off. So can someone help me identify what went wrong in the porting of the code? I think the issue is related to weight initializations (see #4091)

Here's the PyTorch code:

import torch
import torch.nn as nn
from torch.nn.utils import weight_norm
import torch.nn.functional as F
from einops import rearrange


def WNConv2d(*args, act=True, **kwargs):
    conv = weight_norm(nn.Conv2d(*args, **kwargs))
    if not act:
        return conv
    return nn.Sequential(conv, nn.LeakyReLU(0.1))


class MPD(nn.Module):
    def __init__(self, period):
        super().__init__()
        self.period = period
        self.convs = nn.ModuleList(
            [
                WNConv2d(1, 32, (5, 1), (3, 1), padding=(2, 0)),
                WNConv2d(32, 128, (5, 1), (3, 1), padding=(2, 0)),
                WNConv2d(128, 512, (5, 1), (3, 1), padding=(2, 0)),
                WNConv2d(512, 1024, (5, 1), (3, 1), padding=(2, 0)),
                WNConv2d(1024, 1024, (5, 1), 1, padding=(2, 0)),
            ]
        )
        self.conv_post = WNConv2d(
            1024, 1, kernel_size=(3, 1), padding=(1, 0), act=False
        )

    def pad_to_period(self, x):
        t = x.shape[-1]
        x = F.pad(x, (0, self.period - t % self.period), mode="reflect")
        return x

    def forward(self, x):
        fmap = []

        x = self.pad_to_period(x)
        x = rearrange(x, "b c (l p) -> b c l p", p=self.period)

        for layer in self.convs:
            x = layer(x)
            fmap.append(x)

        x = self.conv_post(x)
        fmap.append(x)

        return fmap


def summary_stats(name, x):
    print(f'Stats for {name}:')
    print(f'shape:', list(x.shape))
    print(f'mean: { x.mean().item():,.5f} min: { x.min().item():,.5f} max: {x.max().item():,.5f} std: {x.std().item():,.5f}')


B, C, T = 1, 1, 44100
x = torch.zeros((B, C, T)).cuda()
period = 2

model = MPD(period).cuda()

fmaps = model(x)

for i, fmap in enumerate(fmaps):
    summary_stats(f"fmap {i}", fmap)
    print()

from torchinfo import summary
summary(model,
        col_names=['input_size', 'output_size', 'num_params'],
        input_size=x.shape,
        depth=5,
        verbose=1,
        )

and PyTorch output:

Stats for fmap 0:
shape: [1, 32, 7351, 2]
mean: 0.13107 min: -0.03949 max: 0.44536 std: 0.16288

Stats for fmap 1:
shape: [1, 128, 2451, 2]
mean: 0.04524 min: -0.04298 max: 0.41849 std: 0.08243

Stats for fmap 2:
shape: [1, 512, 817, 2]
mean: 0.02511 min: -0.01758 max: 0.16980 std: 0.03866

Stats for fmap 3:
shape: [1, 1024, 273, 2]
mean: 0.00995 min: -0.01018 max: 0.09186 std: 0.01762

Stats for fmap 4:
shape: [1, 1024, 273, 2]
mean: 0.00516 min: -0.00456 max: 0.04971 std: 0.00883

Stats for fmap 5:
shape: [1, 1, 273, 2]
mean: 0.00227 min: -0.00232 max: 0.00475 std: 0.00034

===================================================================================================================
Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
===================================================================================================================
MPD                                      [1, 1, 44100]             [1, 32, 7351, 2]          --
├─ModuleList: 1-1                        --                        --                        --
│    └─Sequential: 2-1                   [1, 1, 22051, 2]          [1, 32, 7351, 2]          --
│    │    └─Conv2d: 3-1                  [1, 1, 22051, 2]          [1, 32, 7351, 2]          224
│    │    └─LeakyReLU: 3-2               [1, 32, 7351, 2]          [1, 32, 7351, 2]          --
│    └─Sequential: 2-2                   [1, 32, 7351, 2]          [1, 128, 2451, 2]         --
│    │    └─Conv2d: 3-3                  [1, 32, 7351, 2]          [1, 128, 2451, 2]         20,736
│    │    └─LeakyReLU: 3-4               [1, 128, 2451, 2]         [1, 128, 2451, 2]         --
│    └─Sequential: 2-3                   [1, 128, 2451, 2]         [1, 512, 817, 2]          --
│    │    └─Conv2d: 3-5                  [1, 128, 2451, 2]         [1, 512, 817, 2]          328,704
│    │    └─LeakyReLU: 3-6               [1, 512, 817, 2]          [1, 512, 817, 2]          --
│    └─Sequential: 2-4                   [1, 512, 817, 2]          [1, 1024, 273, 2]         --
│    │    └─Conv2d: 3-7                  [1, 512, 817, 2]          [1, 1024, 273, 2]         2,623,488
│    │    └─LeakyReLU: 3-8               [1, 1024, 273, 2]         [1, 1024, 273, 2]         --
│    └─Sequential: 2-5                   [1, 1024, 273, 2]         [1, 1024, 273, 2]         --
│    │    └─Conv2d: 3-9                  [1, 1024, 273, 2]         [1, 1024, 273, 2]         5,244,928
│    │    └─LeakyReLU: 3-10              [1, 1024, 273, 2]         [1, 1024, 273, 2]         --
├─Conv2d: 1-2                            [1, 1024, 273, 2]         [1, 1, 273, 2]            3,074
===================================================================================================================
Total params: 8,221,154
Trainable params: 8,221,154
Non-trainable params: 0
Total mult-adds (Units.GIGABYTES): 8.23
===================================================================================================================
Input size (MB): 0.18
Forward/backward pass size (MB): 24.43
Params size (MB): 32.88
Estimated Total Size (MB): 57.49
===================================================================================================================

Here's the Flax code:

import jax
import jax.numpy as jnp
from flax import linen as nn
from einops import rearrange


def make_initializer(out_channels, in_channels, kernel_size, groups):
    # https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
    k = groups / (in_channels * jnp.prod(jnp.array(kernel_size)))
    scale = jnp.sqrt(k)

    def init_fn(key, shape, dtype):
        return jax.random.uniform(key, shape, minval=-scale, maxval=scale, dtype=dtype)

    return init_fn


class CustomConv1d(nn.Conv):

    @nn.compact
    def __call__(self, x):

        # note: we just ignore whatever self.kernel_init is
        kernel_init = make_initializer(
            self.features, x.shape[-1], self.kernel_size, self.feature_group_count
        )

        if self.use_bias:
            # note: we just ignore whatever self.bias_init is
            bias_init = make_initializer(
                self.features, x.shape[-1], self.kernel_size, self.feature_group_count
            )
        else:
            bias_init = None

        return nn.Conv(
            features=self.features,
            kernel_size=self.kernel_size,
            strides=self.strides,
            padding=self.padding,
            input_dilation=self.input_dilation,
            kernel_dilation=self.kernel_dilation,
            feature_group_count=self.feature_group_count,
            use_bias=self.use_bias,
            mask=self.mask,
            dtype=self.dtype,
            param_dtype=self.param_dtype,
            precision=self.precision,
            kernel_init=kernel_init,
            bias_init=bias_init
        )(x)


class LeakyReLU(nn.Module):

    negative_slope: float = .01

    @nn.compact
    def __call__(self, x):
        return nn.leaky_relu(x, negative_slope=self.negative_slope)


def WNConv2d(*args, act=True, **kwargs):
    conv = nn.WeightNorm(CustomConv1d(*args, **kwargs))
    if not act:
        return conv
    return nn.Sequential([conv, LeakyReLU(negative_slope=0.1)])


class MPD(nn.Module):

    period: int

    def pad_to_period(self, x):
        t = x.shape[-1]
        x = jnp.pad(x, pad_width=((0, 0), (0, 0), (0, self.period - t % self.period)), mode='reflect')
        return x

    @nn.compact
    def __call__(self, x):
        convs = [
            WNConv2d(features=32, kernel_size=(5, 1), strides=(3, 1), padding=(2, 0)),
            WNConv2d(features=128, kernel_size=(5, 1), strides=(3, 1), padding=(2, 0)),
            WNConv2d(features=512, kernel_size=(5, 1), strides=(3, 1), padding=(2, 0)),
            WNConv2d(features=1024, kernel_size=(5, 1), strides=(3, 1), padding=(2, 0)),
            WNConv2d(features=1024, kernel_size=(5, 1), strides=(1, 1), padding=(2, 0)),
            WNConv2d(features=1, kernel_size=(3, 1), strides=(1, 1), padding=(1, 0), act=False),
        ]

        fmap = []

        x = self.pad_to_period(x)
        x = rearrange(x, "b c (l p) -> b l p c", p=self.period)

        for layer in convs:
            x = layer(x)
            fmap.append(x)

        return fmap


def summary_stats(name, x):
    print(f'Stats for {name}:')
    print(f'shape:', list(x.shape))
    print(f'mean: { jnp.mean(x):,.5f} min: { jnp.min(x):,.5f} max: {jnp.max(x):,.5f} std: {jnp.std(x):,.5f}')


key = jax.random.PRNGKey(0)
B, C, T = 1, 1, 44100
x = jnp.zeros((B, C, T))
period = 2

model = MPD(period)
fmaps, variables = model.init_with_output({"params": key}, x)

# Print summary stats for each feature map
for i, fmap in enumerate(fmaps):
    summary_stats(f"fmap {i}", fmap)
    print()

print(model.tabulate({"params": key}, x, console_kwargs={"width": 400}))

and the Flax output:

Stats for fmap 0:
shape: [1, 7351, 2, 32]
mean: 0.06408 min: -0.04410 max: 0.39532 std: 0.13019

Stats for fmap 1:
shape: [1, 2451, 2, 128]
mean: 0.08193 min: -0.03646 max: 0.46157 std: 0.10837

Stats for fmap 2:
shape: [1, 817, 2, 512]
mean: 0.05626 min: -0.03676 max: 0.43672 std: 0.09033

Stats for fmap 3:
shape: [1, 273, 2, 1024]
mean: 0.03694 min: -0.03620 max: 0.30446 std: 0.06348

Stats for fmap 4:
shape: [1, 273, 2, 1024]
mean: 0.02796 min: -0.02669 max: 0.22543 std: 0.04544

Stats for fmap 5:
shape: [1, 273, 2, 1]
mean: 0.03674 min: -0.00612 max: 0.05273 std: 0.00327


                                                                 MPD Summary                                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ path                  ┃ module       ┃ inputs                ┃ outputs                 ┃ params                                            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│                       │ MPD          │ float32[1,1,44100]    │ - float32[1,7351,2,32]  │                                                   │
│                       │              │                       │ - float32[1,2451,2,128] │                                                   │
│                       │              │                       │ - float32[1,817,2,512]  │                                                   │
│                       │              │                       │ - float32[1,273,2,1024] │                                                   │
│                       │              │                       │ - float32[1,273,2,1024] │                                                   │
│                       │              │                       │ - float32[1,273,2,1]    │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ Sequential_0          │ Sequential   │ float32[1,22051,2,1]  │ float32[1,7351,2,32]    │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ WeightNorm_0          │ WeightNorm   │ float32[1,22051,2,1]  │ float32[1,7351,2,32]    │ CustomConv1d_0/Conv_0/kernel/scale: float32[32]   │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 32 (128 B)                                        │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_0        │ CustomConv1d │ float32[1,22051,2,1]  │ float32[1,7351,2,32]    │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_0/Conv_0 │ Conv         │ float32[1,22051,2,1]  │ float32[1,7351,2,32]    │ bias: float32[32]                                 │
│                       │              │                       │                         │ kernel: float32[5,1,1,32]                         │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 192 (768 B)                                       │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ LeakyReLU_0           │ LeakyReLU    │ float32[1,7351,2,32]  │ float32[1,7351,2,32]    │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ Sequential_1          │ Sequential   │ float32[1,7351,2,32]  │ float32[1,2451,2,128]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ WeightNorm_1          │ WeightNorm   │ float32[1,7351,2,32]  │ float32[1,2451,2,128]   │ CustomConv1d_1/Conv_0/kernel/scale: float32[128]  │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 128 (512 B)                                       │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_1        │ CustomConv1d │ float32[1,7351,2,32]  │ float32[1,2451,2,128]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_1/Conv_0 │ Conv         │ float32[1,7351,2,32]  │ float32[1,2451,2,128]   │ bias: float32[128]                                │
│                       │              │                       │                         │ kernel: float32[5,1,32,128]                       │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 20,608 (82.4 KB)                                  │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ LeakyReLU_1           │ LeakyReLU    │ float32[1,2451,2,128] │ float32[1,2451,2,128]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ Sequential_2          │ Sequential   │ float32[1,2451,2,128] │ float32[1,817,2,512]    │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ WeightNorm_2          │ WeightNorm   │ float32[1,2451,2,128] │ float32[1,817,2,512]    │ CustomConv1d_2/Conv_0/kernel/scale: float32[512]  │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 512 (2.0 KB)                                      │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_2        │ CustomConv1d │ float32[1,2451,2,128] │ float32[1,817,2,512]    │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_2/Conv_0 │ Conv         │ float32[1,2451,2,128] │ float32[1,817,2,512]    │ bias: float32[512]                                │
│                       │              │                       │                         │ kernel: float32[5,1,128,512]                      │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 328,192 (1.3 MB)                                  │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ LeakyReLU_2           │ LeakyReLU    │ float32[1,817,2,512]  │ float32[1,817,2,512]    │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ Sequential_3          │ Sequential   │ float32[1,817,2,512]  │ float32[1,273,2,1024]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ WeightNorm_3          │ WeightNorm   │ float32[1,817,2,512]  │ float32[1,273,2,1024]   │ CustomConv1d_3/Conv_0/kernel/scale: float32[1024] │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 1,024 (4.1 KB)                                    │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_3        │ CustomConv1d │ float32[1,817,2,512]  │ float32[1,273,2,1024]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_3/Conv_0 │ Conv         │ float32[1,817,2,512]  │ float32[1,273,2,1024]   │ bias: float32[1024]                               │
│                       │              │                       │                         │ kernel: float32[5,1,512,1024]                     │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 2,622,464 (10.5 MB)                               │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ LeakyReLU_3           │ LeakyReLU    │ float32[1,273,2,1024] │ float32[1,273,2,1024]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ Sequential_4          │ Sequential   │ float32[1,273,2,1024] │ float32[1,273,2,1024]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ WeightNorm_4          │ WeightNorm   │ float32[1,273,2,1024] │ float32[1,273,2,1024]   │ CustomConv1d_4/Conv_0/kernel/scale: float32[1024] │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 1,024 (4.1 KB)                                    │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_4        │ CustomConv1d │ float32[1,273,2,1024] │ float32[1,273,2,1024]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_4/Conv_0 │ Conv         │ float32[1,273,2,1024] │ float32[1,273,2,1024]   │ bias: float32[1024]                               │
│                       │              │                       │                         │ kernel: float32[5,1,1024,1024]                    │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 5,243,904 (21.0 MB)                               │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ LeakyReLU_4           │ LeakyReLU    │ float32[1,273,2,1024] │ float32[1,273,2,1024]   │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ WeightNorm_5          │ WeightNorm   │ float32[1,273,2,1024] │ float32[1,273,2,1]      │ CustomConv1d_5/Conv_0/kernel/scale: float32[1]    │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 1 (4 B)                                           │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_5        │ CustomConv1d │ float32[1,273,2,1024] │ float32[1,273,2,1]      │                                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│ CustomConv1d_5/Conv_0 │ Conv         │ float32[1,273,2,1024] │ float32[1,273,2,1]      │ bias: float32[1]                                  │
│                       │              │                       │                         │ kernel: float32[3,1,1024,1]                       │
│                       │              │                       │                         │                                                   │
│                       │              │                       │                         │ 3,073 (12.3 KB)                                   │
├───────────────────────┼──────────────┼───────────────────────┼─────────────────────────┼───────────────────────────────────────────────────┤
│                       │              │                       │                   Total │ 8,221,154 (32.9 MB)                               │
└───────────────────────┴──────────────┴───────────────────────┴─────────────────────────┴───────────────────────────────────────────────────┘
                                                                                                                                              
                                                    Total Parameters: 8,221,154 (32.9 MB)

To me, the most glaring differences in the outputs are the max: values, even when changing JAX seeds. Again, here's the PyTorch output:

Stats for fmap 0:
shape: [1, 32, 7351, 2]
mean: 0.13107 min: -0.03949 max: 0.44536 std: 0.16288

Stats for fmap 1:
shape: [1, 128, 2451, 2]
mean: 0.04524 min: -0.04298 max: 0.41849 std: 0.08243

Stats for fmap 2:
shape: [1, 512, 817, 2]
mean: 0.02511 min: -0.01758 max: 0.16980 std: 0.03866

Stats for fmap 3:
shape: [1, 1024, 273, 2]
mean: 0.00995 min: -0.01018 max: 0.09186 std: 0.01762

Stats for fmap 4:
shape: [1, 1024, 273, 2]
mean: 0.00516 min: -0.00456 max: 0.04971 std: 0.00883

Stats for fmap 5:
shape: [1, 1, 273, 2]
mean: 0.00227 min: -0.00232 max: 0.00475 std: 0.00034

and Flax output:

Stats for fmap 0:
shape: [1, 7351, 2, 32]
mean: 0.06408 min: -0.04410 max: 0.39532 std: 0.13019

Stats for fmap 1:
shape: [1, 2451, 2, 128]
mean: 0.08193 min: -0.03646 max: 0.46157 std: 0.10837

Stats for fmap 2:
shape: [1, 817, 2, 512]
mean: 0.05626 min: -0.03676 max: 0.43672 std: 0.09033

Stats for fmap 3:
shape: [1, 273, 2, 1024]
mean: 0.03694 min: -0.03620 max: 0.30446 std: 0.06348

Stats for fmap 4:
shape: [1, 273, 2, 1024]
mean: 0.02796 min: -0.02669 max: 0.22543 std: 0.04544

Stats for fmap 5:
shape: [1, 273, 2, 1]
mean: 0.03674 min: -0.00612 max: 0.05273 std: 0.00327

DBraun · 2024-08-19T16:12:21Z

DBraun
Aug 19, 2024
Author

Through some trial and error, I may have found a solution. My make_initializer in Flax might be correct, but nn.WeightNorm needs a scale_init that is nn.initializers.constant(1/jnp.sqrt(3))

New Flax code:

import jax
import jax.numpy as jnp
from flax import linen as nn
from einops import rearrange


def make_initializer(out_channels, in_channels, kernel_size, groups):
    # https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
    k = groups / (in_channels * jnp.prod(jnp.array(kernel_size)))
    scale = jnp.sqrt(k)

    def init_fn(key, shape, dtype):
        return jax.random.uniform(key, shape, minval=-scale, maxval=scale, dtype=dtype)

    return init_fn


class CustomConv1d(nn.Conv):

    @nn.compact
    def __call__(self, x):

        # note: we just ignore whatever self.kernel_init is
        kernel_init = make_initializer(
            self.features, x.shape[-1], self.kernel_size, self.feature_group_count
        )

        if self.use_bias:
            # note: we just ignore whatever self.bias_init is
            bias_init = make_initializer(
                self.features, x.shape[-1], self.kernel_size, self.feature_group_count
            )
        else:
            bias_init = None

        return nn.Conv(
            features=self.features,
            kernel_size=self.kernel_size,
            strides=self.strides,
            padding=self.padding,
            input_dilation=self.input_dilation,
            kernel_dilation=self.kernel_dilation,
            feature_group_count=self.feature_group_count,
            use_bias=self.use_bias,
            mask=self.mask,
            dtype=self.dtype,
            param_dtype=self.param_dtype,
            precision=self.precision,
            kernel_init=kernel_init,
            bias_init=bias_init
        )(x)


class LeakyReLU(nn.Module):

    negative_slope: float = .01

    @nn.compact
    def __call__(self, x):
        return nn.leaky_relu(x, negative_slope=self.negative_slope)


def WNConv2d(scale_init, *args, **kwargs):
    conv = nn.WeightNorm(CustomConv1d(*args, **kwargs), scale_init=scale_init)
    return conv


class MPD(nn.Module):

    period: int

    def pad_to_period(self, x):
        t = x.shape[-1]
        x = jnp.pad(x, pad_width=((0, 0), (0, 0), (0, self.period - t % self.period)), mode='reflect')
        return x

    @nn.compact
    def __call__(self, x):
        convs = [
            WNConv2d(nn.initializers.constant(1/jnp.sqrt(3)), features=32, kernel_size=(5, 1), strides=(3, 1), padding=((2, 2), (0, 0))),
            WNConv2d(nn.initializers.constant(1/jnp.sqrt(3)), features=128, kernel_size=(5, 1), strides=(3, 1), padding=((2, 2), (0, 0))),
            WNConv2d(nn.initializers.constant(1/jnp.sqrt(3)), features=512, kernel_size=(5, 1), strides=(3, 1), padding=((2, 2), (0, 0))),
            WNConv2d(nn.initializers.constant(1/jnp.sqrt(3)), features=1024, kernel_size=(5, 1), strides=(3, 1), padding=((2, 2), (0, 0))),
            WNConv2d(nn.initializers.constant(1/jnp.sqrt(3)), features=1024, kernel_size=(5, 1), strides=(1, 1), padding=((2, 2), (0, 0))),
            WNConv2d(nn.initializers.constant(1/jnp.sqrt(3)), features=1, kernel_size=(3, 1), strides=(1, 1), padding=((1, 1), (0, 0))),
        ]

        fmap = []

        x = self.pad_to_period(x)
        x = rearrange(x, "b c (l p) -> b l p c", p=self.period)

        for i, layer in enumerate(convs):
            x = layer(x)
            if i != (len(convs) - 1):
                x = LeakyReLU(negative_slope=0.1)(x)
            fmap.append(x)

        return fmap


def summary_stats(name, x):
    print(f'Stats for {name}:')
    print(f'shape:', list(x.shape))
    print(f'mean: { jnp.mean(x):,.5f} min: { jnp.min(x):,.5f} max: {jnp.max(x):,.5f} std: {jnp.std(x):,.5f}')


key = jax.random.PRNGKey(1)
B, C, T = 1, 1, 44100
x = jnp.zeros((B, C, T))
period = 2

model = MPD(period)
fmaps, variables = model.init_with_output({"params": key}, x)

# Print summary stats for each feature map
for i, fmap in enumerate(fmaps):
    summary_stats(f"fmap {i}", fmap)
    print()

params = variables["params"]
for i in range(6):
    params[f"WeightNorm_{i}"][f"CustomConv1d_{i}/Conv_0/kernel/scale"]
    params[f"CustomConv1d_{i}"]["Conv_0"]["bias"]
    params[f"CustomConv1d_{i}"]["Conv_0"]["kernel"]

print(model.tabulate({"params": key}, x, console_kwargs={"width": 400}))

New output:

Stats for fmap 0:
shape: [1, 7351, 2, 32]
mean: 0.07809 min: -0.03948 max: 0.44205 std: 0.14788

Stats for fmap 1:
shape: [1, 2451, 2, 128]
mean: 0.03976 min: -0.02503 max: 0.29757 std: 0.07191

Stats for fmap 2:
shape: [1, 817, 2, 512]
mean: 0.01850 min: -0.01398 max: 0.16506 std: 0.03107

Stats for fmap 3:
shape: [1, 273, 2, 1024]
mean: 0.00818 min: -0.00788 max: 0.08078 std: 0.01459

Stats for fmap 4:
shape: [1, 273, 2, 1024]
mean: 0.00469 min: -0.00375 max: 0.04141 std: 0.00793

Stats for fmap 5:
shape: [1, 273, 2, 1]
mean: -0.00658 min: -0.00811 max: -0.00481 std: 0.00020

And another randomly sampled PyTorch output:

Stats for fmap 0:
shape: [1, 32, 7351, 2]
mean: 0.06326 min: -0.04456 max: 0.40825 std: 0.13516

Stats for fmap 1:
shape: [1, 128, 2451, 2]
mean: 0.02273 min: -0.02894 max: 0.27207 std: 0.05526

Stats for fmap 2:
shape: [1, 512, 817, 2]
mean: 0.01308 min: -0.01228 max: 0.12596 std: 0.02426

Stats for fmap 3:
shape: [1, 1024, 273, 2]
mean: 0.00726 min: -0.00694 max: 0.06000 std: 0.01185

Stats for fmap 4:
shape: [1, 1024, 273, 2]
mean: 0.00420 min: -0.00321 max: 0.03466 std: 0.00693

Stats for fmap 5:
shape: [1, 1, 273, 2]
mean: 0.00344 min: 0.00280 max: 0.01138 std: 0.00050

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting PyTorch weight_norm, trouble with Flax kernel_init for WeightNorm #4131

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Porting PyTorch weight_norm, trouble with Flax kernel_init for WeightNorm #4131

DBraun Aug 19, 2024

Replies: 1 comment

DBraun Aug 19, 2024 Author

DBraun
Aug 19, 2024

DBraun
Aug 19, 2024
Author