Developing representational and predictive models for SDAR (structural descriptor – activity relationships) on small datasets is a problem for in-silico modeling of compound efficacies in drug discovery and design. While there are large sets of toxicity data available, the information about the effect of a compound when related to a human activity endpoint (e.g. reduction of symptoms) comes from clinical trials data and reports in the market. The relative number of data points for efficacy is low compared to toxicity due, in part, to the relatively small number of drugs making it to market.
The limited number of examples makes it difficult to train robust machine learning models especially with techniques that traditionally require many observations. Using such techniques; however, is desirable because of potential non-linearities in the relationships. Therefore, we need to look at methods of improving learning performance for such smaller datasets.
Transfer Learning is an area of research with the potential to address the problem in this space. For different, yet similar tasks, we can transfer something from one learner to another with the aim of improving the learning behavior. We’ll start exploring what this means for SDAR modeling in the next few posts.
If you have been following along, you’ll note that my prior research – while specifically in the reinforcement learning domain – was largely about transfer learning. It’s my hopes to re-orient my dissertation research in this direction.