Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Loss NAN #17

Open
hychiang-git opened this issue Sep 14, 2021 · 0 comments
Open

Training Loss NAN #17

hychiang-git opened this issue Sep 14, 2021 · 0 comments

Comments

@hychiang-git
Copy link

Hi, I tried to reproduce your experiment with Cifar10, but I got training loss NaN. I am using a four GPUs machine with tensorflow-gpu 1.12 for the experiment.

image

Here is the Option I used, and I only modified the saveModel

import time
import tensorflow as tf

debug = False
Time = time.strftime('%Y-%m-%d', time.localtime())
Notes = 'vgg7_2888'
# Notes = 'temp'

GPU = [0]
batchSize = 128

dataSet = 'CIFAR10'

loadModel = None
# loadModel = '../model/' + '2017-12-06' + '(' + 'vgg7 2888' + ')' + '.tf'
# saveModel = None
saveModel = '../model/' + Time + '_' + Notes + '.tf'

bitsW = 2  # bit width of we ights
bitsA = 8  # bit width of activations
bitsG = 8  # bit width of gradients
bitsE = 8  # bit width of errors

bitsR = 16  # bit width of randomizer

lr = tf.Variable(initial_value=0., trainable=False, name='lr', dtype=tf.float32)
lr_schedule = [0, 8, 200, 1,250,1./8,300,0]

L2 = 0

lossFunc = 'SSE'
# lossFunc = tf.losses.softmax_cross_entropy
optimizer = tf.train.GradientDescentOptimizer(1)  # lr is controlled in Quantize.G
# optimizer = tf.train.MomentumOptimizer(lr, 0.9, use_nesterov=True)

# shared variables, defined by other files
seed = None
sess = None
W_scale = []

WAGE Folder structure.

.                                                                                                                                        
|-- README.md                                                                                                                            
|-- dataSet                                                                                                                              
|   |-- CIFAR10.npz                                                                                                                      
|   |-- CIFAR10.py
|   |-- cifar-10-batches-py
|   |   |-- batches.meta
|   |   |-- data_batch_1
|   |   |-- data_batch_2
|   |   |-- data_batch_3
|   |   |-- data_batch_4
|   |   |-- data_batch_5
|   |   |-- readme.html
|   |   `-- test_batch
|   `-- cifar-10-python.tar.gz
|-- log
|   |-- 2018-01-30(vgg7\ 2888).txt
|   |-- 2021-09-14(temp).txt
|   `-- 2021-09-14(vgg7_2888).txt
|-- model
`-- source
    |-- Log.py
    |-- Log.pyc
    |-- NN.py
    |-- NN.pyc
    |-- Option.py
    |-- Option.pyc
    |-- Quantize.py
    |-- Quantize.pyc
    |-- Top.py
    |-- getData.py
    |-- getData.pyc
    |-- myInitializer.py
    `-- myInitializer.pyc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant