Is causal reasoning essential for truly intelligent AI?


One of the best things to happen in artificial intelligence and machine learning this year must be Judea Pearl’s intervention on the field. With the publication of The Book of Why, causality in general and Pearl’s work in particular have become recognised as central to machine learning and artificial intelligence.

The Book of Why and Pearl’s more technical work on the same topic contain useful ideas for researchers in science, policy and business. In particular, if you need to conduct causal inference using observational data, the graphical framework that Pearl has pioneered can be useful. Practical implementations of Pearl’s ideas have emerged recently and his approach is also making inroads to traditionally reluctant fields like economics.

But while the practical applications of Pearl’s work are important and worth studying, one of the main reasons why The Book of Why has made such a splash is the critique it contains about present-day artificial intelligence. Pearl has emerged as one of the leading skeptics regarding AI as it’s currently practiced. In particular, he’s raised the idea that current AI systems are seriously limited because they aren’t able to reason causally - a key element of human intelligence. Below, I’ll examine the exact problems that Pearl thinks follow from AI’s inability to represent causes and effects.

No knock-down arguments

Academics sometimes talk about the elusive “knock-down argument” which settles a scientific debate once and for all. I was secretly hoping I’d find one of these in The Book of Why’s Chapter 10, which is where Pearl discusses his problems with present-day AI. Instead of a knock-down argument, the chapter presents a series of reasonable remarks about the limitations of current approaches and some broad-brush guidance on how to move forward. Taken together, Pearl’s arguments make a reasonable case for the idea that something is amiss, but the chapter is short on detail when it comes to figuring out exactly how to fix the problems.

Argument 1: Transparency

Pearl’s first argument against present-day AI is based on the lack of transparency in deep learning algorithms. Famously, AI researchers often can’t themselves explain how exactly these models work or why they made certain predictions. Over time, this could become a problem if we wanted to ask AI to explain its actions.

Pearl’s particular worry is that present-day AI isn’t able to evaluate counterfactuals. Therefore, it can’t assess questions such as: “Why did you do X rather than Y” For us humans, this is easy: “Had I done Y, the consequence would have been Z. I don’t like Z so I did X.” AI’s inability to think counterfactually is part of the reason why we have a hard time understanding its decisions.

The solution? As you might’ve guessed, Pearl thinks we should embed graphical causal models of the world to AI systems. Together with the algorithms he and his students have developed, such models would enable AI to answer counterfactual questions. This way, computers would be able to tell us that they did X rather than Y after assessing the counterfactual Z, just like humans do.

Argument 2: Generalisability

The most striking successes of deep learning algorithms have occurred in contexts that are governed by a limited set of rules, such as in games like Go. The rules of such games, Pearl suggests, constitute the causal laws of the relevant universe. It’s a different story when an algorithm is applied in a new context whose rules or “causal laws” it doesn’t know yet. Generalising from one context to another may prove difficult. Presumably, the solution for this problem is to embed some basic causal model of the world - devised by humans - into AI. If the algorithm doesn’t need to learn the causal rules of the world from scratch every time, this could make it more robust when the context of the task changes.

Argument 3: Counterfactuals

Humans have a powerful decision making tool that works as follows:

  1. Observe an intent
  2. Compute what would have happen if the intent was satisfied
  3. Decide if the intent should be satisfied based on the counterfactual

The ability to easily compute these types of counterfactuals is a key factor that sets apart human intelligence from other animals. There is also an evolutionary element to this argument which Pearl doesn’t discuss in The Book of Why but which he’s talked about elsewhere. It goes something like this: If counterfactual reasoning has emerged as one of the key elements of human-like intelligence during the history of evolution, then there must be something to it. AI researchers would do well to learn from intelligent animals in a similar way as, say, aircraft engineers have learned from birds and bats.

I think this evolutionary argument is the strongest one for the importance of causal reasoning for AI. It makes sense that agents who need to achieve their goals should think causally. If you don’t engage in causal reasoning, then you can’t distinguish between the following two situations:

  1. X is associated with Y
  2. X causes Y

However, if you want to achieve your goals, then it’s important to make the above distinction. For example, you might learn that yellow fingers are associated with lung cancer, but it would not be a good idea to try to prevent lung cancer by asking smokers to wear gloves. To achieve goals effectively, causation is key. So it’s not surprising that evolution hit upon causal reasoning as a “program” in the brains of humans and other intelligent animals like crows.

It would be a mistake, however, to try to mimic human causal reasoning in its entirety in AI systems. Our evolutionary roots show in the systematic errors we make in causal judgments. My favourite example is that we tend to attribute causes of effects to things that are similar to the effect. That’s why, for example, people used to believe that wearing yellow clothes could be associated with jaundice (a condition that causes skin pigments to turn yellow). This correspondence heuristic has probably proved more helpful than harmful in the course of evolutionary history, but I think it’s easy to agree that we wouldn’t want it in our AI systems.

Causal reasoning seems to have been a good “program” to survive and thrive in the world, but its human form is by no means faultless. If causal reasoning becomes a big field of research in artificial intelligence, it’s going to be fascinating to see if and how we can eventually improve upon the causal inference rules and heuristics cobbled together by evolutionary processes.

Argument 4: Free will

Related to the evolutionary considerations above, Pearl also speculates that AI probably needs to sustain some kind of illusion of free will. The argument here seems to be that the idea of the free will has emerged in humans during our evolutionary history because it enables us to communicate better with each other. Therefore, Pearl surmises that free will might well be required for computers to communicate effectively with each other.

How is this related to causal reasoning? Well, Pearl suggests that counterfactual thinking could be crucial for the concept of free will. When asking from AI why it chose to do something, we’d expect it to answer along the following lines: “I considered doing X. But then I realised doing X would lead to Y. So I chose to do Z.”

Argument 5: Ethics

Pearl’s 5th and final argument for the need of causal reasoning in AI is based on AI ethics. In order for us to be able to assess the ethicality of AI, we need to be able to understand why it did what it did rather than something else. To achieve this, AI needs to be able to declare its intents, evaluate the counterfactual that follows from each of them, and prioritise the intents after evaluating the goodness or badness of the counterfactuals. Again, this type of reasoning requires the ability to compute counterfactuals and consequently, Pearl argues, graphical causal models of the world.


As I mentioned above, Pearl’s five arguments for the importance of causality for AI make a compelling overall case that he’s onto something, although none of them is by any means flawless. What I found particularly convincing is the idea that causal reasoning has emerged under evolutionary pressures as a robust “program” that agents use to achieve their goals.

One thing that’s not clear is how close we’d get to truly intelligent AI even if all Pearl’s ideas were immediately adopted - an interesting counterfactual to think about. For example, while assessing counterfactuals is an important part of human intelligence, a lot of our behaviour doesn’t involve such reasoning. Rather, it’s based on rules and models we’re apparently born with. To achieve human-level performance, we might also need to program these types of innate properties to AI. It might very well be that causal reasoning is necessary but not sufficient for truly intelligent AI.

Read More

Why do data scientist propose complex solutions to simple problems?

I’ve recently seen a lot of criticism of data science of the following two types:

  • A data scientists once said something silly, therefore data science is broken
  • A hypothetical caricature data scientist does this or that, therefore the whole field is all hype and no substance

These types of arguments - and their popularity - is no doubt a normal reaction to the fact that one occupation is suddenly receiving so much publicity. But having met and worked with hundreds of data scientists in the past few years, I can reveal that they are one of the most level-headed bunch of people you’re likely to meet.

That’s not to say there aren’t legitimate criticisms. For example, this morning I saw the following tweet:

This reminded me of something I’ve seen in a few projects, which is that when data scientists are working on a problem, their starting point is some kind of ensemble method or support vector machine. They simply do not consider simpler approaches like the generalized linear model. This not only goes against basic scientific principles like Occam’s razor but it’s also wasteful in terms of time and computing resources.

Why are data scientists sometimes preferring complex approaches over simple ones? I think it’s partly a matter of conflicting interest. On the one hand, for an individual data scientist it makes a lot of sense to be knowledgeable of all the latest tools. First, it is intellectually satisfying to learn new things and most data scientists are researchy types. Second, it’s more helpful for your job prospects to say you built this AI application with Tensorflow than it is to say you ran a logistic regression and it was sufficient for your purposes.

For organisations, on the other hand, it doesn’t really matter what the method is. What matters is that the objective of the project is achieved and preferably using as little resources as possible. Sometimes achieving the objective requires the latest deep learning algorithm with all the bells and whistles. Sometimes a linear regression model or something even simpler works fine. Hence the occasional conflict of interest between individual data scientists and organisations.

So what can be done? Organisations would do well to mandate the use of simple models as baselines for prediction and classification tasks. The improved accuracy (if any) from using the fancy model should be assessed against the time and resource it takes to train that model. What about individual data scientists, what should they do? I don’t know what the correct answer is here, but here’s what I’d like it to be: data scientists should only focus on whatever tools they need to get the job done because whether or not they get the job done is how they are going to be assessed as data scientists. There’s hoping the field moves towards this type of thinking in the years to come.

Read More

Hello world

Psychology is a strange thing. I’ve had this idea of setting up a minimalist GitHub hosted blog for a few years. But the whole thing apparently only became salient enough after Microsoft acquired Github.

Jolted to action in this way, I soon discovered that setting up a basic blog takes just a few minutes using the handy Jekyll Now repo or one of the similar solutions out there.

While waiting for my posts related to experimentation, causal inference, machine learning and data science, feel free to check out these papers I’ve curated for your reading pleasure.

Read More