Ethical Artificial Intelligence (AI)

The English word ethics is derived from the Greek word êthos meaning “character or moral nature”. The study of ethics or moral philosophy involves systematising, defending and recommending concepts of right and wrong behaviour.

While some academics and philosophers may argue that ethics can be extended to the realm of animals, ethics is generally considered a human concern. As the systems we develop become increasingly sophisticated, and in some cases autonomous, we remain ethically responsible for those systems. This includes systems based on AI and Machine Learning (ML).

Ethical AI is a multi-disciplinary effort to design and build AI systems that are fair and improve our lives.

The Importance of Ethical AI

Ethical AI systems should be designed with careful consideration of their fairness, accountability, transparency and impact on people and the world.

Advances in AI have meant that we have moved from building systems that make decisions based on human defined rules, to systems trained on data. When systems behave according to rules defined by humans, the ethical implications of each rule tend to be more transparent and are more of a conscious decision made by at least the designer and, one would hope, the developer. This often leads to clearer links between rules and unethical outcomes.

With the introduction of ML and Deep Learning (DL), it is now possible to build AI systems that have no ethical considerations at all. An unconstrained AI system will be optimised for whatever its output is. For example, a system designed to approve loans may unfairly penalise particular demographics that are underrepresented in the training data. This clearly has a negative impact on members of those demographics and potentially to the provider of the service. It may also place the provider in violation of organisational or industry guidelines, or in some cases even the law.

Ethical AI in the News

AI regularly features positively in the news, from how it is being used in driver-assisted vehicles, screening for cancer in radiology images or advances in gene folding. However, AI has received its fair share of negative press either due to overly inflated expectations or as a result of some unethical outcomes. We consider three examples below:


In April 2021, six drivers in the Netherlands were reportedly unfairly terminated by “algorithmic means". The ensuing legal challenge supported by the App Drivers & Couriers Union (ADCU) and Worker Info Exchange (WIE) was in response to Article 22 of the European Union’s General Data Protection Regulation (GDPR). The article is designed to protect individuals against purely automated decisions with a legal or significant impact.

The investigation focussed on two main concerns. Firstly, individuals apparently being dismissed without the decision being reviewed by a human. Secondly, the use of facial recognition in the ride sharing realtime ID system. Earlier in the year, the ADCU challenged the use of facial recognition technology over concerns of its accuracy, citing a 2018 MIT study showing that facial recognition systems had been prone to error rates as high as 20% for people of colour and performed less well on women of all ethnicities.

Insurance Fraud

In May 2021, a US insurance company retracted a statement from their corporate Twitter account on how it was using AI to scan customer faces for hints of fraud. The post referred to using "non-verbal cues that traditional insurers can't, since they don't use a digital claims process". Some Twitter users drew parallels with Phrenology to illustrate the absurdity and unfairness of using a physical characteristic to determine behaviour. Similar concerns have also been raised with an EU funded immigration project designed to speed up immigration with an AI lie detector based on facial recognition.


When Apple introduced the Apple Card, users noticed that women were offered less credit as compared to men as a result of what appeared to be bias in the system. An independent third-party later confirmed that Apple’s issuing credit card partner had not used gender in its models but the author of the article went on to state that:

machine learning systems can often develop biases even when a protected class variable is absent.

Like all technology and tools, AI can provide great value and as we have seen, sometimes produce unethical results. So why is it so hard to build ethical systems?

Challenges in Building Ethical AI Systems

In 2019, the Gradient Institute published a white paper outlining the practical challenges for Ethical AI. They identified four main categories: capturing intent, system design, human judgement & oversight and regulations. We briefly summarise each challenge below.


An AI system trained on data has no context outside of that data. There is no moral compass, no frame of reference of what is fair unless we define one. Designers therefore need to explicitly and carefully construct a representation of the intent motivating the design of the system. This involves identifying, quantifying and being able to measure ethical considerations while balancing these with performance objectives.


Systems should be designed with bias, causality and uncertainty in mind.

Bias should be identified and either reduced or eliminated from data sets when possible. As we have seen in the earlier Credit example, if “protected features” such as gender are not treated correctly, they can actually make a system more biased. The Gradient Institute’s whitepaper shares a powerful example of how omitting gender when screening candidates for roles may unfairly assess a female applicant that has taken time off to raise a family. Even if protected features are removed, they can often be inferred from the presence of proxy features. For example, training an interview screening model using education data often contains gender information.

Bias however is not just a data problem. As discussed in this article, model design can also be a source of bias too. Even something as simple as choosing a loss function can change the bias of a trained model.

Causality vs correlation of factors is another context sensitive problem to solve. The cause and effect of systems needs to be modelled to ensure there are no adverse effects in adjacent systems. For example, consider the case of an AI system used to prioritise patients admitted to hospital. When an AI model doesn’t account for the causal effect of removing a doctor’s judgement, such as prioritising asthma sufferers, it can incorrectly predict the risk profile of some patients.

Uncertainty is a measure of our confidence in the predictions made by a system. We need to understand and provide the greatest human oversight on systems with the greatest levels of uncertainty.

Human Judgement & Oversight

AI systems are consistently and reliably able to make decisions when trained on good quality data. They are not constrained by many of the limitations that we humans have. They do not get tired or have to deal with environmental issues and can scale to volumes of data and complexity far in excess of what we can do. However, as impressive as AI systems are, they lack the emotional intelligence of even a new-born child and cannot deal with exceptional circumstances. The most effective systems are ones that intelligently bring together both human judgement and AI.

Model Drift

There are a number of metrics that can be used to measure the performance of a system; they include accuracy, precision and F-score to name only three. Which measures of performance we choose depends upon the nature of the problem. Tracking key metrics and statistical distributions over time and alerting humans when either of these significantly drift can ensure that systems remain performant and fair.

Confidence intervals and Impact

AI systems are increasingly used for a wider array of applications. We have already covered a few of those applications in this article so far. Some applications such as determining whether to dismiss an employee are clearly so important that it is now regulated. Other applications such as e-book recommendations, clearly less so.

In addition to impact, we need to consider the level of confidence in predictions. Predictions with low confidence levels and of high impact should have the greatest levels of human oversight. The ability to track and alert based on such scenarios and efficiently bring a human into the loop is a valuable capability.


Where and how data scientists and engineers fit into an organisational structure may vary. Some organisations favour a centralised model, some a distributed model with those skills being part of cross-functional teams. In either case, there is significant value and reduced risk in developing centralised governance to ensure best practices are being followed. This includes guidance on algorithms, testing, quality control and reusable artefacts. Another function of a centralised governance capability is to perform quality control spot-checks and assess model performance and suitability based on prior data and problems. This often requires strong data governance, management and lineage controls together with mature ML operational practices.


We saw in the earlier example of how article 22 of the GDPR prohibits certain decisions from being fully automated without explainability. Consequently, organisations should be able to reliably reproduce outcomes or recommendations based on historical data and have strong controls over data management.

Organisations can of course wait for regulations to be enforced upon them, or better still, take a proactive approach working in cross-functional teams with regulators to develop new standards.

The union of organisational, industry and country or regional regulations will form the basis of governance efforts across the entire data lifecycle. This includes everything from what data is collected, to how it is transformed and used and by whom and for what purpose until it is finally purged.

Developing a clear understanding of regulation and working with informed technology business partners will help ensure that organisations can both influence and quickly respond to regulatory change.