Quick review: I have a working version of a relational auto-encoder and have used it to learn a transfer function between two reinforcement learning tasks. As has been done in other research, I’ve used state-action-state triplets as the training data. My hypothesis is that a relational auto-encoder will build a common feature space for the transition dynamics between the two different reinforcement learning domains.
There are two problems that I’m trying to solve. Both deal with the learning algorithm for the interdomain mapping function. The first addresses how the data enters training and the second is in the characteristics of the data once run through the trained model.
Dealing with uncorrelated data
Currently, the relational auto-encoder learns on pairs of triplets presented together by using standard back-propagation. This approach could have a problem in that pairs of triplets in the training data have been picked randomly and generally have no raw semantic correlation with each other.
It may be that a gated, relational auto-encoder will learn the transition dynamics in each separate layer and simultaneously build a common model between them. I would be fortunate in this case because we can stick with the existing learning algorithm. In fact, I’ve already got a model trained and resulting data from it. So, I want to test that first. In which case, I have to solve the second problem and run an experiment with the existing.
Resulting data is blown out of scale
This may be a symptom of a big problem. Once I ran random samples through my “trained” model. The results on the target side of the model were outside of the range defined by the target task. This could be a problem with my learning algorithm. The results in the learning algorithm’s performance were weird. Cost seems to be constant for a number of training epochs, then there’s a sharp drop to another rather steady value. My expectation is that the algorithm would learn smoothly and that cost would start dropping immediately and converge to a optimal value. This seems to be what most learning algorithms do. But, this is a different model so, it isn’t clear if this is proper learning behavior though.
In any case, the resulting data being blown out of scale raises a red flag. It makes me think that the current learning approach is flawed. There could be a problem with parameters which makes me want to fish around and try again.
Which is probably what I’m going to be doing once I get everything installed here on the laptop.