Data Preparations

Home / datascience / Data Preparations

Downloaded several GB of data from  There are around 75 tables of data on around 2.1M chemical compounds.  In bioassays for homo sapiens alone, there are:

  •   45,941 – ADMET (A) – ADME and Tox data e.g. t1/2, oral bioavailability, LD50.
  • 122,533 – Binding (B) – Data measuring binding of a compound to a molecular target, e.g. Ki, IC50, Kd.
  • 172,353 – Functional (F) – Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight.

In the compounds table, there are 6846 molecules related to the assay for Inhibitors and Substrates for the Cytochrome P450 3A4.

This is pretty cool.  While I have a some of the data already, I’m going in and exploring the data and seeing what’s there.  I’m arranging some SQL queries to pull some initial data, but will need to go back and make a tool for getting all the structure files too.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

%d bloggers like this: