I trained a gated auto-encoder using code mentioned in “Gradient-based learning of higher-order image features” (See: http://www.cs.toronto.edu/~rfm/code/rae/index.html)
There are two input layers. I used 5000 random quadruples <s,a,s’,r> from a MountainCar Task and 5000 from an Inverted Pendulum. Discrete actions were converted into separate features per action through binarization. Samples were paired randomly by sorting. All input features were normalized.
The training results are shown in the cost vs epoch graph here. I want to check these results next.