Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning #17

Open
jucamohedano opened this issue Apr 23, 2022 · 1 comment
Open

Fine-tuning #17

jucamohedano opened this issue Apr 23, 2022 · 1 comment

Comments

@jucamohedano
Copy link

jucamohedano commented Apr 23, 2022

Hi,

I have been successful picking up objects with a gripper different from the panda robot using the weights from scene_test_2048_bs3_hor_sigma_001, but I would like to attempt fine-tuning for the gripper I'm using with those pretrained weights. I have also attempted training as you explain in the readme, but I could only do it with a V100 GPU 16GB, which is smaller in terms of size to what you suggest. The result from training is not good, the confidence of the grasps doesn't meet the threshold set (high=0.25 and low=0.2), and if I visualize them on the test data without using the threshold they look all over the place, only a few of them look okay. I believe that fine-tuning will require much less computational resources. I think I could do this by setting the optimizer to only apply the gradient to the layers I want, which are the last fully connected layers, but let me know if there is a better way please. So, when I run train.py with ckpt_dir='checkpoints/scene_test_2048_bs3_hor_sigma_001' and I get the following error:

  File "contact_graspnet/train.py", line 225, in <module>
    train(global_config, ckpt_dir)
  File "contact_graspnet/train.py", line 78, in train
    loss_ops = load_labels_and_losses(grasp_estimator, contact_infos, global_config)
  File "/home/juancm/contact_graspnet/contact_graspnet/tf_train_ops.py", line 86, in load_labels_and_losses
    tf_pos_finger_diffs, tf_scene_idcs = load_contact_grasps(contact_infos, global_config['DATA'])
  File "/home/juancm/contact_graspnet/contact_graspnet/tf_train_ops.py", line 254, in load_contact_grasps
    tf_pos_contact_points = tf.constant(np.array(pos_contact_points), tf.float32)
  File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 161, in constant_v1
    allow_broadcast=False)
  File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py", line 522, in make_tensor_proto
    "Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Do you think that this is due to my hardware (3060 RTX and RAM 32GB)? I wasn't planning to train on my laptop, but at least checking that the training works before doing it on a server.

Thank you! :)

@MartinSmeyer
Copy link
Contributor

Hi @jucamohedano ,

sorry for the late answer. I used a rather simple but efficient approach by loading all contact points to GPU memory as a tf.constant. However, the size of a single tensor is limited to 2GB in TF. I guess the problem is that your finetuning dataset is larger than my dataset with more contact points so this 2GB size is exceeded. You could try to cast it to tf.float16, or load the contact data not all at once but in chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants