Notes from Memisevic – Learning to Relate Images

Home / Research / Dissertation / Annotations / Notes from Memisevic – Learning to Relate Images


Memisevic, Roland. “Learning to Relate Images.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 8 (2013): 1829–1846. doi:10.1109/TPAMI.2013.53.


A fundamental operation in many vision tasks, including motion understanding, stereopsis, visual odometry, or invariant recognition, is establishing correspondences between images or between images and data from other modalities. Recently, there has been increasing interest in learning to infer correspondences from data using relational, spatiotemporal, and bilinear variants of deep learning methods. These methods use multiplicative interactions between pixels or between features to represent correlation patterns across multiple images. In this paper, we review the recent work on relational feature learning, and we provide an analysis of the role that multiplicative interactions play in learning to encode relations. We also discuss how square-pooling and complex cell models can be viewed as a way to represent multiplicative interactions and thereby as a way to encode relations.


  • This paper offers that using a relational auto-encoder trained symmetrically on the sum of the two predictive objectives learns the joint distribution of the input.
  •  Clue that we can use a denoising autoencoder in place of an RBM.
  • This paper cites: Gradient-based learning of higher-order image features by the same author.

Quotes & Notes

Re: Overcompleteness of the hidden layer

It has become obvious recently that it is more useful in most applications to use an over-complete representation, that is,  K \gt J , and to constrain the capacity of the latent variables instead by forcing the hidden unit

activities to be sparse.

Re: sparsity

Alternatively, one can train auto-encoders, such that they denoise corrupted version of their inputs, which can be achieved by simply feeding in corrupted inputs during training (but measuring reconstruction error with respect to the original data). This turns auto-encoders into “denoising auto-encoders” [47], which show properties similar to other sparse coding methods, but inference, like in a standard auto-encoder, is a simple feed-forward mapping.

Re: RBMs define the joint probability distribution

A technique similar to the auto-encoder is the Restricted Boltzmann machine (RBM): RBMs define the joint probability distribution.

Re: turning auto-encoder into relational auto-encoder

As another example, we can turn an auto-encoder into a relational auto-encoder, by defining the encoder and decoder parameters A and W as linear functions of x ([28], [29]). Learning is then essentially the same as in a standard auto-encoder modeling y. In particular, the model is still a directed acyclic graph, so one can use simple back-propagation to train the model.

Re: training symmetrically on the sum of the two predictive objectives amounts to learning a joint probability distribution (on the input).

As an alternative to modeling a joint probability distribution, [29] show how one can instead use a relational auto-encoder trained symmetrically on the sum of the two predictive objectives

\sum_{j}(y_{j}^{\alpha} - \sum_{ik} w_{ijk}x_{i}^ {\alpha} z_k^{\alpha})^2 + \sum_{i}(x_{i}^{\alpha} - \sum_{jk} w_{ijk}y_{j}^ {\alpha} z_k^{\alpha})^2

This forces parameters to be able to transform in both directions, and it can give performance similar to symmetrically trained, fully probabilistic models [29]. Like an auto-encoder, this model can be trained with gradient based optimization.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: