A modular implementation of the Neural Turing Machine introduced by Alex Graves et al.
Currently, two tasks have been implemented, Copy Task and Associative Recall Task as tf.keras.Model wrapper, available in the NTM_Model.py
Use them as showed in the Training Notebooks
Since the paper only provides the mathematical operations for the generation and use of the Heads' Weighings, not the full architecture, thus the complete architecture becomes an open ended problem, where I've used the following architecture:
Training the above NTM on randomized sequence length between 1 and 20 yields the following results.
Sequence Length = 9, including the Start Of File and End Of File delimeters.
Sequence Length = 33, including the Start Of File and End Of File delimeters.
Note that it is more than what the above NTM is trained upon.
Sequence Length = 73, including the Start Of File and End Of File delimeters.
Sequence Length = 90, including the Start Of File and End Of File delimeters.
Training the Associative Recall Model for 158,000 episodes on randomized item numbers between 2 and 6 yields the following results:
- Wed, Jan 15:-
- Completed the NTMCell Implementation along with various Vector Generation Tasks.
- Also tested it's result with dynamic_RNN, observed some NaN values in the result, was fixed by initializing states by a considerably low (0.5 in this case) value.
- Sun, Jan 19:-
- Added sigmoid layer on Heads_w_t which produced much better results on one time step passes (not the training)
- Random Initialization works well now too
- In the process of finalizing the training schedule.
- Fri, Jan 31:-
- First Complete version, added Inputs Generator for Copy Task and some minor bug fixes.
- One still needs to train this though, there maybe some problems during training which one needs to solve.
- Sun, Feb 2:-
- Training with Cross Entropy Loss Function proved to be difficult as loss seem to be stuck somewhere between 0.4 - 0.55
- *Using Huber Loss Function seem to generate much better results, as loss seem to decrease linearly from 1.2 to 0.6 on max sequence length in about 10,000 epochs, after 1 injection of randomized initial states while preserving the weights.
- Wed, Feb 6:-
- More careful analysis brought some more subtle bugs, which were holding back the generalisation of the model, removing those increases generalization much better with Cross Entropy Loss now.
- Sun, Feb 16:-
- Added Associative Recall Task.