AI: Machine learning - the when’s and the how’s of machine learning

In our recent post on the use of artificial intelligence in bioinformatics, it was highlighted that simply putting “AI” in the title of a paper, a proposal, or even the name of a company can attract significant traction and, importantly, funding.

However, that does not necessarily mean that machine learning – or artificial intelligence more broadly – is the mystical cure-all for every computational problem’s woes. As discussed in our previous machine learning blog post, a machine learning algorithm can only really learn whatever it is that we program them to learn; and the results of the algorithm vary dramatically depending on both the quality and the quantity of the training data available. In that post we highlighted three questions, the second of which “Should I use machine learning?” is explored here.

When should I use Machine Learning?

Two areas in particular which can benefit from implementing machine learning algorithms are the fields of pattern recognition and multi-variable causal analysis. Machine learning has been applied in these areas in novel and exciting ways across a remarkably broad range of applications, from implementing pattern recognition to improve the responsiveness of traffic management systems to multivariable causal analysis of nuclear fusion reactions in experimental reactors. Almost all successful implementations of machine learning involve datasets that share some key characteristics which serve to demonstrate when you should consider using machine learning algorithms in your data analysis:

  1. The datasets are huge
    Some datasets are simply so big that it is impractical for human analysts to analyse the data. In these instances, machine learning is a valuable tool to speed up the analysis process with relatively few labour-hours required to get the algorithm up and running. This means that you can get results whilst expending less time, and less money!
  2. The causal link or underlying pattern is very complex and/or unknown 
    If you knew what the underlying pattern was or understood how all the variables of your problem linked together to deliver a solution, you would just write a straightforward batch of code to churn through the numbers and give you some answers. However, if the problem you have is stochastic, variable, and/or highly sensitive to initial conditions, then it can feel nigh on impossible to tease out useful information. In these cases, machine learning can be a really useful tool for solving problems that were previously thought to be completely intractable.
  3. Good training/testing data is readily available or obtainable
    As discussed previously, it is the provision of adequate training data, in terms of both quality and quantity that is vital for the success of a machine learning algorithm. You need to have vast swathes of data, for which you already know the result so that you can train, test, and retrain your machine learning algorithm. If you give the algorithm an insufficient amount of training data, it might tumble down a rabbit hole, so make sure you have enough high-quality training data to constrain the problem to the one you want to solve!

How should I use Machine Learning?

So, you’ve decided to use machine learning algorithm to solve your problem. The big question now is how to implement the power of AI to get meaningful and accurate results?

  1. Decide if you’re going to use a supervised or unsupervised algorithm
    Supervised learning typically as fixed starting and end points, i.e. you know the input variables, and you want to see how changing the values of the inputs affects the outputs of one or more known output variables. For this reason, supervised learning is often a good candidate for multivariable causal analysis where it’s the relation between the inputs and outputs that you want to determine. Supervised learning is also commonly employed in classification and regression schemes.
    In contrast, unsupervised learning has no predetermined output variables and generally is used to infer what naturally occurring links are present between a series of data points. For this reason, unsupervised learning is often used for pattern recognition and clustering similar data points together.
  2. Encode for as many input variables as you think are likely to have a material impact on the resultIt should go without saying, but if you omit an input variable that has a key causal link with the output, then the machine learning algorithm will simply ignore the existence (let alone the contribution) of that input variable and your output results could be wildly unrealistic. For this reason, when implementing machine learning, it is not enough to simply have a coding expert constructing the algorithm, but also someone who really understands the intricacies of the problem you’re trying to solve.
  3. Train, Test, Re-Train, Re-Test, Re-Train, Re…….
    It cannot be emphasised enough that a machine learning algorithm stands and falls on the effectiveness and regularity of its training, testing, and re-training. Artificial intelligence programs have the tendency to quite cheerfully tumble down the wrong rabbit hole and generate spurious results. Keeping watch on your machine learning script and testing it regularly is possibly the most important part of implementing the algorithm successfully.

So, in response to the two key questions of this post: “when and how should I use machine learning?”, the answer is hopefully clear. You should consider machine learning, and AI more broadly, as a possible solution when you’re confronted with datasets or problems that won’t easily submit to the more conventional methods of data analysis. When you decide to implement machine learning, the key is to run your algorithms with as much technical insight as can be mustered, and responsibly testing and retraining the data to ensure you get high-fidelity, high-accuracy results.


This blog was originally written by Alexander Savin.