You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Good question. I have been experimenting quite a bit with this repo, in an attempt to replicate the results from the original RQ-VAE paper but current performance is still lagging behind the results from the paper. While the transformer architecture is not exactly the same as the one from the paper (the paper uses encoder-decoder, while I have been using decoder-only, keeping roughly the same number of parameters), I suspect the encoding to also play a part in this.
As far as the encoding model is concerned, I do see some advantages in terms of reduced storage of the item embeddings with this approach as opposed to storing an item ID embedding table.
However, I have found the RQVAE to be very hard to train in a stable manner. I've experimented with a bunch of tricks from the literature to fight codebook collapse before I was able to get good codebook utilization (> 80%).
Just curious, based on your experiments, does the residual coding mechanism truly perform better than the traditional MLP-based embedding methods?
The text was updated successfully, but these errors were encountered: