Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the description of the custom LSTM cell used in the code? #12

Open
ritwikmishra opened this issue Dec 21, 2022 · 0 comments
Open

Comments

@ritwikmishra
Copy link

ritwikmishra commented Dec 21, 2022

I see that the code calls for a custom LSTM cell defined by

  def __call__(self, inputs, state, scope=None):
    """Long short-term memory cell (LSTM)."""
    with tf.variable_scope(scope or type(self).__name__):  # "CustomLSTMCell"
      c, h = state
      h *= self._dropout_mask
      concat = projection(tf.concat([inputs, h], 1), 3 * self.output_size, initializer=self._initializer)
      i, j, o = tf.split(concat, num_or_size_splits=3, axis=1)
      i = tf.sigmoid(i)
      new_c = (1 - i) * c  + i * tf.tanh(j)
      new_h = tf.tanh(new_c) * tf.sigmoid(o)
      new_state = tf.contrib.rnn.LSTMStateTuple(new_c, new_h)
      return new_h, new_state

Equations of LSTM are:

i = sigmoid(x_t * U^i + h_{t-1} * W^i ) # input_gate_t
f = sigmoid(x_t * U^f + h_{t-1} * W^f ) # forget_gate_t
o = sigmoid(x_t * U^o + h_{t-1} * W^o ) # output_gate_t

new_c = f * c + i * tanh(x_t * U^g + h_{t-1} * W^g)
new_h = o * tanh(new_c)

image

Then why new_c is different in the code?

Why f (forget_gate) equals to 1 - i (1 minus input_gate) ?

Why x_t * U^g + h_{t-1} * W^g equals to forget_gate ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant