Using machine learning to predict the weather

Preface: this isn't really a post about how we use machine learning to predict the weather, since we don't predict weather in a way that most people would call “machine learning.” I use predicting the weather as an example of how I think machine learning “should be done”, and as a result, some of the things I say about weather prediction are not cited and may be inaccurate. The point is to show that the statistical models which we call “machine learning” are improperly used in a way that doesn't further understanding of what they're modelling, which leads to a load of unintended consequences and biases.

Part 1: Making an AI

“Machine learning” and “artificial intelligence,” often abbreviated as ML and AI, refer to several different statistical models that are used to analyse data. Generally, ML includes a feedback loop where data is fed into the model, the model is asked to predict something, and then either the model is “rewarded” for making a good prediction or “punished” for making a bad prediction. While this kind of description definitely lends itself to feeling like some kind of “learning” is taking place, it's important to remember that this is just a statistical model, and not actual learning.

Let's talk about a very simple model for the sake of example. Let's say that we're in a room with an unlabelled thermostat. The temperature is controlled with a small slider that goes between “cold” and “hot” but we don't actually know how this affects the temperature. We'd like to be able to know how moving the slider affects the temperature of the room, and we want to make an “AI” to change the temperature. Let's say that we also have a precise thermometer and clock to measure the temperature and current time.

Already, it's clear that there are a lot of different ways to approach this problem. Knowing how thermostats work, you can already make a few guesses as to how the slider affects temperature. We can also assume that there's hidden information that affects our model that we can't measure, like the humidity of the air, what kind of machines are controlling the temperature, etc. Some of these can be solved with additional measurements, but for now, we only have a thermometer and clock, and can't measure anything else.

If we think about this from a machine learning perspective, there are three steps to solving this problem:

  1. What model should we use?

  2. How does our AI interact with the model?

  3. How do we know if our AI is doing a good job?

Because it's not super important for the example, we'll pretend that we came up with a good choice for the first answer. After we pat ourselves on the back for choosing the right model, we need to think of how our AI should interact with the model. In other words: what variables should be provided to the model, and what should be given by the model?

The output of the model is relatively straightforward: after thinking about it, our AI should move the slider on the thermostat to the appropriate position. The inputs are probably a bit less straightforward: at minimum, we need to know the desired and current temperatures, but there are a few other variables that we'll find are useful in the next part.

If we just track the two temperatures and the slider, we won't easily be able to judge if the AI is actually changing the temperature correctly. Since all changes are going to take some time to take effect, we only want to punish the AI if the temperature isn't right and it's been a while since we asked it to change the temperature. Tweaking how we want to affect the AI based upon time will probably require a bit of trial and error, but we can do it.

Another useful value will be to track the rate at which the temperature is changing, and to reward the AI if the temperature is moving toward the desired one. Depending on how we decide to do our model, we could either provide the time and how fast the temperature is changing directly as input variables, or we could simply keep them external to the model and use them as part of training. Again, which option we choose will probably result in a bit of trial and error, but what we pick isn't too important.

Part 2: Making an algorithm

You may have noticed that making our AI, while relatively straightforward and effective, did not actually teach us anything about how the thermostat works. While machine learning models are able to ultimately create complicated relationships between variables, it's completely unclear what those relationships mean, and if we can do better. All of the stuff that we mentioned, like humidity, etc. could be the reasons why our model looks the way it does, but in creating the model, we didn't figure out whether that was the case. Since we can't know the exact relationship between the slider and the temperature, there's no way of quantifying whether our model is the best one. We just know that it works good enough to pass the criteria we set, like moving the temperature in the right direction and not taking too long to change the temperature.

For the simple thermostat case, this is an extremely reasonable application of a machine learning model. We've defined what “good enough” is, and it's very unlikely that we'd gain much benefit from studying exactly how our thermostat works. I suspect that in some cases, “smart thermostats” will employ a strategy very similar to this one. However, this is not usually what we use AI for.

You've probably heard the term Algorithm (with a capital A) used with respect to sites like YouTube, but you may or may not actually know what an algorithm is. In just about any introductory Computer Science course, you'll get this definition: an algorithm is a list of instructions in order that is defined to solve a problem or perform a computation. You might have seen some amusingly obtuse examples of people being asked how to make a sandwich (i.e. design an algorithm to do so) and being met with an intentional misinterpretation of their instructions due to some lack of detail.

While machine learning does meet the criteria for an algorithm, I'd argue that it falls closer to making a sandwich by placing an entire jar of peanut butter between two slices of bread. It works in definition, but fails in spirit.

Going to the example of the YouTube Algorithm, although they intentionally keep a lot of details hidden, here's essentially what it does:

  1. It takes in all of the details about what a user is doing on the website, e.g. what videos they watch.

  2. It gives a list of videos to recommend to users.

And while we can't know all of the details, YouTube is trying to optimise the amount of money it makes from advertisements on the website. Pretty much all of the ways it sets up its model is designed to work toward this goal, and I can imagine that it constantly adds and removes variables on both sides of the model to try and achieve this goal.

The problem with this is, of course, that the relationship between video suggestions and advertising revenue is not only hard to quantify, but fundamentally unknowable. There are loads of unknown variables affecting what videos people watch and suggesting videos in order to increase ad revenue can have unintended consequences, like like spreading conspiracy theories and hateful content.

You may have heard of several contrived examples of why improperly rewarding an AI has unintended consequences, like how asking an AI to produce stamps at all costs may result in the AI levelling the world's forests for the sake of creating more stamps. Arguably, this kind of AI is much less likely than ones that we've already created, like how Twitter's eye-tracking algorithm for image cropping results in most images focusing on women's chests and white men's faces.

Ultimately, although machine learning is a useful tool for understanding statistical modelling, it fails in so many cases to show the path between cause and effect when “the journey is more important than the destination.” Now that computers are powerful and have the ability to crunch hundreds and thousands of variables rather quickly, it's very easy to create machine learning models that get you to your goal without even questioning how or why. Which is why it's important to talk about how we used machine learning back when “computer” was still a job title.

Part 3: Predicting the weather

We can study the causes and effects of statistical models using a rather niche field of maths called control theory, which was pioneered by Hungarian-American mathematician Rudolf Kálmán around 1960. Before even separating variables into inputs and outputs, we can take all of the potential factors at play in our model and categorise them as observable and controllable.

Observability and controllability go hand-in-hand, and in maths, we call them duals of each other. For the sake of explanation, we won't exactly describe how they're related, but the bottom line is that they're very similar. Observable variables are ones that allow observation of the other variables; in other words, if you can measure an observable variable, that means that you can accurately predict the other variables. Controllable variables take this one step further and allow you to control the values of all the other variables by tweaking that variable. This is a relatively simplified explanation and there's a lot more nuance that I can't easily explain here, but the bottom line is that there are mathematical ways of determining what variables are good for inputs (i.e. observable) and what variables are good for outputs (i.e. controllable).

In order to actually determine whether variables are observable or controllable, we have to actually determine the relationship between them. And this is the first part of the control-theory feedback loop: coming up with a model. Although controllability isn't relevant for predicting the weather, we'll get back to it when we talk about our previous example.

All throughout history, humans have been trying to predict the weather. And as history progressed, we gained a better and better idea of how exactly that worked, using the scientific method: make a guess, try it out, and see if it was a good guess. Eventually, we can come up with pretty specific equations to describe the state of the weather.

But, as you might imagine, we can't actually measure the state of the weather. Although meteorology has progressed a lot over the past several decades and new instruments allow us to measure more things more precisely, there's going to just be a few things that we have to leave out. But, as long as we know that the set of variables we have is observable, then we can still predict the other variables.

This leads into the next step of our feedback loop. After the weather is predicted, it happens. If our prediction was wrong, it means one of two things: either our model was wrong, or we didn't accurately know some of the variables. Generally, it means that we need to correct for inaccurate values for the variables we didn't measure, and once we do so, it means that our model will be more accurate for the next prediction.

As the last part of our loop, we might find that our model is truly wrong and that it needs to be updated. And this is perhaps the most crucial step that machine learning misses: manual intervention. Because each step of the way gave you an option to mathematically quantify how accurate your values were, you can start to determine whether you have a good model. And if you don't, it means that you need to try again.

Like I said, all of Kálmán's work was done right before the advent of modern weather prediction. Every time we predict the weather, we add more data to our models and determine the quality of our predictions, much like how machine learning works. But unlike most machine learning models, instead of mindlessly crunching hundreds of variables and getting an answer, we study the relationships between the variables and how they affect each other. Even the variables that we can't measure are given names and values, and we learn more about them.

Part 4: Control

While weather predictions are one application of control theory, it's also used a lot in, as you'd expect, controlling machines. Measurements from the real world are fuzzy, and control theory gives us a way to predict unknown variables and affect them with things we control.

Let's go back to the thermostat example. The relationship between the temperature and thermostat is an extremely simple system which can be studied using control theory. A good way of judging a model is whether the end result is a system that can be controlled using the thermostat slider, and observed using the temperature and current time. Even if we have loads of unknown variables in our equation, we can simply guess when we start using our model and correct them over time.

While it may not seem like it, a lot of large machine learning models actually do try and quantify unmeasured variables that may affect them. For example, when serving targeted advertisements, companies try to guess a person's demographics and interests in order to make their advertisements more relevant. The main issue is that throughout the entire process, even though there are rigorous mathematical ways to determine if these models are correct and whether additional variables are at play, most people don't take that extra step; they only modify the model if it doesn't work as well as they want.

Part 5: Conclusion

I don't think that YouTube should come up with a gigantic, thousand-variable system of differential equations to figure out how to recommend videos. Instead, I want to shift the idea of what “machine learning” means and point out how “learning” is a lot more nuanced than a simple statistical model. Really, all folks are doing is drawing lines on a graph, and they don't really know where all the points are or if the lines are in the right place.

Instead, I think that there should be a larger feedback loop in “artificial intelligence” that involves a more rigorous analysis of how systems are affected by which variables, rather than just coming up with a desired result and throwing more data at the problem until you get some path to that. There should be a lot more insight and transparency into how these systems work and I highly recommend that YouTube, Twitter, and others invest into identifying the consequences of their models and coming up with new ones that minimise them.

And although it should be obvious, if we're going to turn metrics like “engagement” into cold, emotionless numbers, then we should find ways to quantify racial and ethnic bias, gender bias, conspiracy theories, etc. in a way that helps reduce their prevalence in the Era of the Algorithm, preferably by hiring people from the groups most negatively impacted by things “overlooked” in these systems.

There's been a way to figure out these problems; you've just got to care.

Links from this article