Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Wide & Deep migration to PyTorch #2168

Draft
wants to merge 11 commits into
base: staging
Choose a base branch
from

Conversation

daviddavo
Copy link
Collaborator

@daviddavo daviddavo commented Sep 18, 2024

Description

Migrating Wide & Deep out of tensorflow.

Note: Previously I have only used models from high level libraries like Keras. I'm doing this to learn PyTorch, so feel free to give me any pointers or even scrap everything if it is not useful.

Related Issues

References

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • I have signed the commits, e.g. git commit -s -m "your commit message".
  • This PR is being made to staging branch AND NOT TO main branch.

WIP Tasks

  • Implementing the model as PyTorch's nn.Module
    • Allowing additional embeddings
    • Allowing additional continuous features
    • Allowing cross-features (sorry, I don't understand it yet)
  • Creating a wrapper with .fit() and .recommend_k_items() methods
    • Allowing additional embs in wrapper's .fit()
    • Allowing additional cont. feat. in wrapper's .fit()
    • Caching the "ranking pool"
    • Save model every save_checkpoints_steps iterations in model_dir
    • Eval test loss every log_every_n_iter iterations instead of every iter
  • Creating a torch.data.Dataset to pass to the wrapper class
  • Creating tests
  • Updating Jupyter Notebooks

Signed-off-by: David Davó <david@ddavo.me>
Signed-off-by: David Davó <david@ddavo.me>
Signed-off-by: David Davó <david@ddavo.me>
@miguelgfierro miguelgfierro marked this pull request as ready for review September 21, 2024 07:39
@miguelgfierro
Copy link
Collaborator

miguelgfierro commented Sep 21, 2024

Sorry @daviddavo I pressed the wrong action. I changed the PR to ready for review because we are having some problems with the tests. Hopefully, we can fix it by next week.

@daviddavo daviddavo marked this pull request as draft September 21, 2024 15:09
@daviddavo
Copy link
Collaborator Author

NP, I still have quite some work to do

@miguelgfierro
Copy link
Collaborator

FYI @daviddavo the tests should be working now after #2169

@daviddavo
Copy link
Collaborator Author

The tensorflow estimators approach currently used by recommenders uses a binary regressor (default value of n_classes). To get the recommendations, all user-item pairs are used as input (created using the user_item_pairs method).

On the PyTorch notebook the output is not binary, instead there is a class for each movie. The aim of this notebook is to predict the next movie to watch, and not to make a top-k recommendations, a different problem.

The question is, what does the original paper do? Does it output a single scalar? Or does it output a vector with a value corresponding to each item? As I understand it, it should be a single value ( $P\left(Y=1|X\right)=\sigma\left(...\right)$ ), but that notebook made me doubt.

Nevertheless, I think I will modify the current model so the "head" returns a scalar and to get the top-k recommendations we pass all the possible user-item pairs, as the recommenders' tensorflow implementation does.

Edit: NVIDIA's deep learning examples also output a single value. My doubts have been resolved but I'll keep this post as some kind of documentation. I'll finish the model and do the training soon.

Signed-off-by: David Davó <david@ddavo.me>
@daviddavo
Copy link
Collaborator Author

Loss function decreases over time in my jupyter notebook. the only thing remaining is the "software engineering" part

Signed-off-by: David Davó <david@ddavo.me>
Signed-off-by: David Davó <david@ddavo.me>
Signed-off-by: David Davó <david@ddavo.me>
Signed-off-by: David Davó <david@ddavo.me>
Signed-off-by: David Davó <david@ddavo.me>
Signed-off-by: David Davó <david@ddavo.me>
@daviddavo
Copy link
Collaborator Author

Now that I'm testing it with the full 100k dataset, I realized its very slow. I will profile it next weekend, but I have a hunch that the problem is the DataLoader, which uses a lot of slow .locs

@miguelgfierro
Copy link
Collaborator

@daviddavo how are you doing with this PR? Let me know if you need any help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants