Deliverable contribution.txt

In embodied computation (or morphological computation), part of the complexity of motor control is offloaded to the body dynamics. UGent investigates how embodied computation in low-level motor control can be achieved using Reward-modulated Hebbian learning. In our previous work, gait motor patterns for a compliant tensegrity robot, were first learned using evolutionary optimisation (CMA-ES). Direct proprioceptive feedback was then trained to mimic those desired using noise-based supervised Hebbian learning.  Here, the reward signal directly expressed the similarity to the desired  actuator signals, which is not biologically plausible.  
UGent has now used reward-modulated Hebbian learning in a more biologically plausible setting to achieve end-effector control for the tensegrity  robot. For each targeted trajectory, the actuators are  driven by a sequence of approximate actuator signals that could have been generated from a self-organised static predictive body model. As these control signals do not take the body dynamics into account, the resulting end-effector trajectories bare little resemblance to the desired ones. Again using reward-modulated Hebbian learning, a linear feedback controller (using analog neurons as an approximation for rate-codes) was trained to correct the actuator signals and achieve the right trajectory. Between the proprioceptive signals and the feedback neurons, a decorrelation layer was added. The reward, representing  the average deviation from the desired trajectory, was presented after each run of a  trajectory was completed. Our algorithm, which is analogous to existing work in the context of neural networks, adds exploration noise to the actuator signals. By correlating the effects of the noise and the resulting change in reward, the learning rule attempts to improve the average reward. 
The results of this work were submitted for a special issue of Frontiers in Neurorobotics (draft version available with the source code). As our robotics experiments were performed using our in-house robotics simulator, the code submitted for D4.6.3 consists of an illustration of the learning rule in which a neural network is trained  to drive an untrained neural network is used as dummy for a body. The script and library (Python code) can be used to represent one of the experiments described in the submitted paper.