H. Ammar and D. Mocanu, “Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines,” Mach. Learn. …, 2013.
Existing reinforcement learning approaches are often hampered by learning tabula rasa. Transfer for reinforcement learning tackles this problem by enabling the reuse of previously learned results, but may require an inter-task mapping to encode how the previously learned task and the new task are related. This paper presents an autonomous framework for learning inter-task mappings based on an adaptation of restricted Boltzmann machines. Both a full model and a computationally efficient factored model are introduced and shown to be effective in multiple transfer learning scenarios.
Quotes & Notes
Re:Random or not
Unfortunately, learning in this model cannot be done with normal CD. The main reason is that if CD divergence was used as is, FTrRBM will learn to correlate random samples from the source task to random samples in the target. To tackle this problem, as well as ensure computational efficiency, a modified version of CD is proposed. In Parallel Contrastive Divergence (PCD), the data sets are first split into batches of samples. Parallel Markov chains run to a certain number of steps on each batch. At each step of the chain, the values of the derivatives are calculated and averaged to perform a learning step. This runs for a certain number of epochs. At the second iteration the same procedure is followed but with randomized samples in each of the batches. Please note that randomizing the batches is important to avoid fallacious matchings between source and target triplets.
Re: End result is reconstruction
When FTrRBM learns, weights and biases are tuned to ensure a low reconstruction error between the original samples and the predicted ones from the model.
The source task is sampled greedily according to a (near-)optimal source task policy to acquire optimal state transitions.
The triplets are passed through the visible source layer of FTrRBM and are used to reconstruct initial target task samples at the visible target layer, effectively transferring samples from one task to another.