Dissertation update – October 2015

Dissertation, Updates
Quick review: I have a working version of a relational auto-encoder and have used it to learn a transfer function between two reinforcement learning tasks.  As has been done in other research, I've used state-action-state triplets as the training data.  My hypothesis is that a relational auto-encoder will build a common feature space for the transition dynamics between the two different reinforcement learning domains. There are two problems that I'm trying to solve.  Both deal with the learning algorithm for the interdomain mapping function.   The first addresses how the data enters training and the second is in the characteristics of the data once run through the trained model. Dealing with uncorrelated data Currently, the relational auto-encoder learns on pairs of triplets presented together by using standard back-propagation.  This approach could have…
Read More
Random or Not?

Random or Not?

Updates
One of the basic questions that needs to be answered about using the autoencoder architecture to learn a mapping function between two domains is a question of randomness and of what model the autoencoder is learning. Do I have to pair correlated SARS samples together for input or can I, as with a probabilistic model  (See Ammar for TrRBM.), introduce pairs randomly?  
Read More
Training a gated autoencoder

Training a gated autoencoder

Updates
I trained a gated auto-encoder using code mentioned in "Gradient-based learning of higher-order image features" (See: http://www.cs.toronto.edu/~rfm/code/rae/index.html) There are two input layers.  I used 5000 random quadruples <s,a,s',r> from a MountainCar Task and 5000 from an Inverted Pendulum.  Discrete actions were converted into separate features per action through binarization.  Samples were paired randomly by sorting.  All input features were normalized. The training results are shown in the cost vs epoch graph here.  I want to check these results next.
Read More

Preparing data for training an autoencoder

Dissertation, Research, Updates
I'm training a relational autoencoder using data from two Markov Decision processes tasks: the Mountain Car task and the Inverted Pendulum (or Cart-Pole) task.  To do this, I need to map the sample data to nodes in the input layer (and output layer) in the relational autoencoder.  I see this as follows: Mountain Car Position - real/real Velocity - real/real forward action - boolean {0,1} / sigmoidal backward action - boolean {0,1} / sigmoidal coast action - boolean {0,1} / sigmoidal Cart Pole  θ - real/real ω - real/real x - real/real v - real/real force action - real/real My chair and I discussed the actions as being discrete, but most implementations have a variable force as an action.  So, we're going to address it as such.  This gives a different action space than…
Read More
Draft of Analysis of Relational Autoencoder

Draft of Analysis of Relational Autoencoder

Dissertation, Updates
From code used in the paper "Gradient-based learning of higher-order image features" by Roland Memisevic, I've diagrammed the structure of the relational autoencoder.  Note that input is in the form of corrupted samples from two different sources (X and Y).   These are mapped, via 3rd-order tensor (W decomposed into wxf, wyf, and whf_in) to a hidden layer.  On the other side of the hidden layer, the activations of the hidden layer are split according to the activations of the inputs and their weights.  The actual output (corresponding to the input) is the dot product of the multiplicative activation of hidden and input with the transpose of the weights for that input.  Reconstructed output is based on the type of output needed (i.e. binary or real).     [1] R. Memisevic, “Gradient-based learning of higher-order image features,”…
Read More

Development Plan

Dissertation, Updates
This generally outlines my development plan.  The plan is composed around two objectives, as follows: TrAE - Develop a framework to learn an intertask mapping function similar to the Reinforcement Transfer Learning framework [Ammar] based on building common feature subspaces between tasks/domains.  Rather than using a three-way Restricted Boltzmann Machine where training is based on an energy function, use a denoising autoencoder where training is based on reconstruction error. Extend the use of this framework towards cross-modal transfer using a custom set of tasks. Towards Objective 1 Source Learning Agents and Basic Tasks Setup Mountain Car and Cart Pole tasks, and Setup an agent that can learn these.  This agent will provide training and transfer samples for the TrAE (Transfer AutoEncoder). Intertask Mapping - TrAE - Transfer Agent Setup the…
Read More

Thoughts on Modality in RL

Dissertation, Updates
I've been thinking about what modality means in terms of a reinforcement learning agent. I had initially thought about state features and action features subsets but this doesn't cover changes in kinematic structure or transition functions. But perhaps this is okay.  If I consider this from a purely neurophysiological influence then the analogues are straightforward.
Read More
Status of code writing for Dissertation

Status of code writing for Dissertation

Dissertation, Updates
Current Status 2015-06-24 : I've got the Gated AutoEncoder (3way) script currently running on their example data.  (http://www.cs.toronto.edu/~rfm/code/rae/index.html)  I need to get familiar with this code. 2015-06-21 : 22:49 hrs : I found a java package from Brown called BURLAP that will work with my current setup (i.e. the RL-Glue arrangement).  It has Fitted Value Iteration as well as Least Squares Policy Iteration.  LSPI was used by Ammar in his research on learning an autonomous intertask mapping function using a Three-way RBM.  This paper is foundationally a jump off point.  My research will partly be in the use 3-way Denoising Autoencoder to build a common feature subspace between two tasks.  This will play a major role as a shared feature node in a hierarchical structure. I also have found a 3-way denoising auto-encoder…
Read More