You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been successful picking up objects with a gripper different from the panda robot using the weights from scene_test_2048_bs3_hor_sigma_001, but I would like to attempt fine-tuning for the gripper I'm using with those pretrained weights. I have also attempted training as you explain in the readme, but I could only do it with a V100 GPU 16GB, which is smaller in terms of size to what you suggest. The result from training is not good, the confidence of the grasps doesn't meet the threshold set (high=0.25 and low=0.2), and if I visualize them on the test data without using the threshold they look all over the place, only a few of them look okay. I believe that fine-tuning will require much less computational resources. I think I could do this by setting the optimizer to only apply the gradient to the layers I want, which are the last fully connected layers, but let me know if there is a better way please. So, when I run train.py with ckpt_dir='checkpoints/scene_test_2048_bs3_hor_sigma_001' and I get the following error:
File "contact_graspnet/train.py", line 225, in <module>
train(global_config, ckpt_dir)
File "contact_graspnet/train.py", line 78, in train
loss_ops = load_labels_and_losses(grasp_estimator, contact_infos, global_config)
File "/home/juancm/contact_graspnet/contact_graspnet/tf_train_ops.py", line 86, in load_labels_and_losses
tf_pos_finger_diffs, tf_scene_idcs = load_contact_grasps(contact_infos, global_config['DATA'])
File "/home/juancm/contact_graspnet/contact_graspnet/tf_train_ops.py", line 254, in load_contact_grasps
tf_pos_contact_points = tf.constant(np.array(pos_contact_points), tf.float32)
File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 161, in constant_v1
allow_broadcast=False)
File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_impl
allow_broadcast=allow_broadcast))
File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py", line 522, in make_tensor_proto
"Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
Do you think that this is due to my hardware (3060 RTX and RAM 32GB)? I wasn't planning to train on my laptop, but at least checking that the training works before doing it on a server.
Thank you! :)
The text was updated successfully, but these errors were encountered:
sorry for the late answer. I used a rather simple but efficient approach by loading all contact points to GPU memory as a tf.constant. However, the size of a single tensor is limited to 2GB in TF. I guess the problem is that your finetuning dataset is larger than my dataset with more contact points so this 2GB size is exceeded. You could try to cast it to tf.float16, or load the contact data not all at once but in chunks.
Hi,
I have been successful picking up objects with a gripper different from the panda robot using the weights from
scene_test_2048_bs3_hor_sigma_001
, but I would like to attempt fine-tuning for the gripper I'm using with those pretrained weights. I have also attempted training as you explain in the readme, but I could only do it with a V100 GPU 16GB, which is smaller in terms of size to what you suggest. The result from training is not good, the confidence of the grasps doesn't meet the threshold set (high=0.25 and low=0.2), and if I visualize them on the test data without using the threshold they look all over the place, only a few of them look okay. I believe that fine-tuning will require much less computational resources. I think I could do this by setting the optimizer to only apply the gradient to the layers I want, which are the last fully connected layers, but let me know if there is a better way please. So, when I runtrain.py
withckpt_dir='checkpoints/scene_test_2048_bs3_hor_sigma_001'
and I get the following error:Do you think that this is due to my hardware (3060 RTX and RAM 32GB)? I wasn't planning to train on my laptop, but at least checking that the training works before doing it on a server.
Thank you! :)
The text was updated successfully, but these errors were encountered: