In this article we introduce two new terms, Machine Learning (ML) and Deep Learning (DL) and relate them back to what we covered in the last article about Artificial Intelligence (AI).
As we learned in the last article: AI is the “study of Intelligent Agents: any device that perceives its environment and takes actions that maximise its chance of successfully achieving its goals”.
It’s useful to think about an AI agent as having a “brain” that determines how it behaves. In the world of AI, the brain of an intelligent agent is known as a model.
How the brain (model) is trained and the brain’s internal structure provides a guide for how the field of AI is subdivided.
As we can see from the diagram below, ML and DL are subfields of AI, each being a more specific field than the last. This of course raises the question of “If ML isn’t the only field of AI, what are the others and are they relevant to me?”. There are a number of peers to ML, with Expert Systems being the next most relevant today to an Enterprise audience. As its name suggests, the brain of an agent that’s based on an Expert System, is created by a domain expert and consists of a series of rules and actions instructing the agent to behave in a very specific, predetermined way. Expert Systems, while effective, do have some shortcomings. Training the brain requires working with a domain expert and painstakingly defining rules and subsequently maintaining them. Expert Systems are not good at generalising and handling problems they haven’t seen before.
ML addresses many of the downsides of Expert Systems by not requiring the brain to be explicitly built, rule by rule. It's also able to generalise problems and provide answers to questions it has never seen before. This may seem like magic - computers building brains without a programmer and expert explicitly telling it what to do? It is not magic, but requires the application of often sophisticated algorithms to large amounts of good quality training data. Let’s explore how it is done so we can appreciate that building a brain isn’t a completely “hands off” automated process.
Understanding how to create a brain using ML can be quite challenging. However, I feel a few basic examples will help most readers understand the subject better and relate to it more readily. To do this, let’s consider something that’s hopefully familiar to most readers, a flow diagram. Let’s imagine that the flow diagram represents whether a transaction should be treated as fraudulent or not. At the start of the flow diagram we have information (features) on a transaction such as the amount, location, payee etc. We also have information on the person the transaction is associated with, such as their purchase history and physical location. By following a series of IF THEN questions related to the features, the flow diagram tells us (classifies) whether the transaction is fraudulent or not.
In ML, it’s possible to build (model) a brain that resembles that of a process flow diagram. We do this using a modelling algorithm called Decision Tree. The structure of the brain resembles that of a tree with nodes, branches and leaves. The question is, “how do we automatically decide what questions to ask at each node of the tree (decision point of the flow diagram)?”. The highly simplified answer is we build the tree starting at the root and make our way down to each of the leaves [You might be thinking, don't trees grow upwards? In computer science, the convention is trees are upside down]. At each node, a clever algorithm determines which feature (e.g. transaction location) should be used in the decision box and the value we test it against. It does this by looking at lots of previous transaction data and whether they were really fraudulent or not. So while the brain is constructed automatically, it’s highly dependent upon good quality labelled (fraudulent/not fraudulent data). For example, the algorithm could determine that the question “Is the transaction location overseas?” helps to separate large amounts of fraudulent transactions from non-fraudulent transactions. It would therefore appear towards the top (root) of the tree. By itself, this information isn’t particularly useful, but as part of a large complex series of questions (process flow decision points) this can prove quite effective.
Modelling the brain as a tree is just one of a number of possible options. It should come as no surprise that while exploring new ways to model a brain, scientists turned to how biological brains are structured. This category of modelling algorithm is unsurprisingly referred to as an Artificial Neural Network (NN). The human brain has an estimated 86 billion neurons arranged in complex layers. A neuron "fires" if the stimulus from neurons connected to it exceeds a certain threshold. In doing so, information is communicated between layers.
If we consider a simple NN, it may consist of just three layers, an input layer (corresponds to input features e.g transaction location) a middle layer (combines inputs) and an output layer (the brain’s answer to the problem). In practice, NNs typically require large numbers of hidden layers to solve complex problems such as Computer Vision (CV) or Natural Language Processing (NLP).
So far, we have a brain that’s represented by a mesh of sorts with a clearly defined input, some complex combinations of results in the middle layers and finally an output layer that presents the answer. But as we saw in the process flow/decision tree example, a brain is more than just a complex web of interconnections. A brain needs to encode what it has learned. In the decision tree example, we encoded knowledge through questions at each node of the tree. In a NN, instead of questions, we encode learning by setting the firing threshold of each neuron. Some neurons will need to fire aggressively (with few inputs) and others will require many inputs firing together, for the brain to give the right answer. How exactly we determine the neuron weights will take us down an unnecessarily deep technical rabbit hole and one business leaders do not need to understand. For those of you who are interested, you may want to read about Back Propagation.
All that’s left to do is explain how Deep Learning (DL) is related to NN. If a NN has two or more hidden layers then it’s called a Deep Neural Network (DNN) and as you might have guessed by now, DNNs are the focus of the field of DL.