Blog

Home / Blog
Data Preparations

Data Preparations

datascience, Machine Learning, Research, Work
Downloaded several GB of data from https://www.ebi.ac.uk/chembl/.  There are around 75 tables of data on around 2.1M chemical compounds.  In bioassays for homo sapiens alone, there are:   45,941 - ADMET (A) - ADME and Tox data e.g. t1/2, oral bioavailability, LD50. 122,533 - Binding (B) - Data measuring binding of a compound to a molecular target, e.g. Ki, IC50, Kd. 172,353 - Functional (F) - Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight. In the compounds table, there are 6846 molecules related to the assay for Inhibitors and Substrates for the Cytochrome P450 3A4. This is pretty cool.  While I have a some of the data already, I'm going in and exploring the data and seeing what's there.  I'm arranging some SQL queries…
Read More
Planning Chapters 2 and 3 of my Dissertation

Planning Chapters 2 and 3 of my Dissertation

Dissertation, Updates
Previously, I had set up the framework / outline for the dissertation thesis.  This weekend, I intend to work on Chapters 2 and 3.  Chapter 1 covers the introduction and it is typically something to be tackled last.  I need to get working on Chapter 2 and 3 over the next couple of weeks which includes the background and notation sections in the former, and the related work in the latter.  This will help me completely narrow my focus down and develop the motivation section. A big key component of Chapter 2/3 is to cover the general area of transfer learning and focus it down to my area of interest.  This may be, at this time, a framework for boosting that works across the techniques of transfer; instance-based, parameter-based, etc. A quick…
Read More
Today’s activity

Today’s activity

General
Today, I started getting the structure of my dissertation together. Over the past few days, I've started to focus on my research and try to pick out the most interesting and relevant areas to approach.  Transfer Learning will become a very important part of Machine Learning and there is lots of room for work in this area. I looked at the dissertations of two of my supervising professor's previous students to get an understanding for structure. I also started the Background and Notation section.
Read More
Dissertation pivot

Dissertation pivot

Dissertation, Research
To align my dissertation efforts with the strategic and tactical needs of my employer, Spektron Systems - who is incredibly supportive, I must pivot my efforts.  It is fortunate that I am working with a company that directly uses machine learning and has readily available problems addressed by my research.  I should only have to conduct a small pivot that narrows my research to something relevant. Narrowing my research The direction of my research has been the relationship between the ideas of computational creativity and transfer learning.  In particular, I was looking at transfer learning as the mechanism for computational creativity.  This is a vast problem and unlikely to be useful in the short-term for Spektron. Computational creativity, as a concept, may fit the strategic activities of the company, i.e.,…
Read More

Brains are good, but not that good.

Machine Learning, Research
What’s wrong with the idea that the brain is an excellent learning machine? In machine learning and artificial intelligence research literature, we almost invariably see the argument that we need better algorithms that can learn from fewer training examples or that are one-shot learners. Along with that argument, typically, comes the assertion that humans learn well from few (or single) training examples. The problem is that people forget the neurophysiology of the brain. Notably, within the brain, neural firing is not a constant. At the cellular level and the macro/network-level of neurons, neuronal activity is continuously oscillating. If we’re looking at connectivity between neurons, we may count each oscillation as a training example. Naturally, the phrase ”what fires together, wires together” comes to mind. Say we get a ”single five-second…
Read More

My top favorite podcasts

General, inspirations
These may change from time to time, but the podcasts that I currently listen to the most are: For science-based human interest stories Hidden Brain (hosted by Shankar Vedantam) Radiolab (hosted by Jad Adumrad and Robert Krulwich) 99% Invisible (hosted by Roman Mars) For news about the world The Daily (from the New York Times) Up First (NPR)  
Read More

Transfer Learning for SDAR models from small datasets

Dissertation, Machine Learning, Research
Developing representational and predictive models for SDAR (structural descriptor - activity relationships) on small datasets is a problem for in-silico modeling of compound efficacies in drug discovery and design. While there are large sets of toxicity data available, the information about the effect of a compound when related to a human activity endpoint (e.g. reduction of symptoms) comes from clinical trials data and reports in the market. The relative number of data points for efficacy is low compared to toxicity due, in part, to the relatively small number of drugs making it to market. The limited number of examples makes it difficult to train robust machine learning models especially with techniques that traditionally require many observations. Using such techniques; however, is desirable because of potential non-linearities in the relationships. Therefore,…
Read More

A Test for Developer Candidates

javascript, mean, nodejs, UXUI, Work
This was developed for testing out candidates for a junior developer position with MEAN stack like skills.   Introduction Much of what we do is take lots of information and display it in some meaningful way.  Sometimes that information will be from different sources and needs to be filtered, transformed, combined, and so on.     For this exercise, we’ll look at your ability to consume information from web services, process it, and display it to a user. The Domain The NIH and the National Library of Medicine provide a web service called PUG that we can use for free to pull information about millions of different chemical compounds and substances.  The tutorial for it can be found at the following URL: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST_Tutorial.html. In short, you can construct a URL to…
Read More

Yet Another BBQ Place in Conway

General, Restaurant Reviews
I like BBQ.  I like a sharp smoke flavor.  You can slice it, shred it, make it so tender that it falls off the bone, and I'll eat it up.  Dry-rubbed, sauced-up, spicy, tangy, sweet.  You name it - I like it.  If you do it right, present it right, and give me a good atmosphere to eat it - I'll be sure to sing your praises.  If you don't, then you're likely to get one of these; as it is with Fat Daddy's, the new BBQ restaurant that's opened up in Conway, Arkansas' downtown area.  Fat Daddy's comes to us from Russellville where they've enjoyed some success.  It was on the advice of an extended family member, and resident of that town, that I found myself trying it out.…
Read More
Categorizing Job Orders with a Naive Bayes Classifier

Categorizing Job Orders with a Naive Bayes Classifier

datascience, Machine Learning
Meridian Staffing has about 12,000 job orders from 2010 to present and each is assigned zero or more categories such as "Application Developer", "Project Manager", "Network Engineer", etc.  We regularly extract this job information from our Applicant Tracking System (Bullhorn) and load it into our Posse Analytics server for data analysis and reporting. Unfortunately, nearly 50% of these jobs are either not categorized or categorized as "Other Area(s)".  As MSS moves towards being a data-driven organization, categorization will inform activities like capacity and candidate pipeline planning.  As such, having good, clean data becomes  more and more important and we need to mitigate this issue. Naturally, the first line of approach is to address the source of the data.  But, while we may fix import processes and train people to correctly assign categories when entering…
Read More

Fatal error: Call to undefined function the_posts_pagination() in /home/timdockins/public_html/wp-content/themes/total/index.php on line 44