Using machine learning and simple features to predict climate – part 1

Earth’s climate is one of the fundamental boundary conditions on many Earth surface processes. For this reason, global climate models (GCMs) are often a critical part of Earth science research. However, they remain highly computationally expensive to run, and often access to a super-computer is needed to run a GCM in a reasonable amount of time. This motivates the question: is it possible to reasonably predict climate without an expensive GCM?

For my current research, this question is particularly motivated by the need to predict climate in a scenario in which the geography (i.e. the position, shape, size, and topography of land masses, among other things) is different from that of today (see the project on the Southeast Asian islands for further details).

Constraining the geography at some point in Earth’s history (i.e. the paleogeography) is not a straightforward task, and there are whole branches of Earth science research that is dedicated to addressing this problem. But for now, let’s assume that we have some paleogeographic map of the world. Given only this paleogeographic map, there are three very easy features that we can extract that we expect, in some fashion, to drive changes in climate:

  • the distance of any land pixel to the closest shoreline
  • the elevation of any land pixel
  • the latitude of any land pixel

I note here that we do not use longitude as a feature for now, since the absolute longitudinal position of land masses should not drive any changes in climatology. However, some atmospheric phenomena are constrained to certain regions of the Earth (e.g. the monsoons over SE Asia). But if we train a model on the geography of the present day adding longitude as a feature, the model would assume that these regionally-constrained atmospheric phenomena are correlated with the specific longitude that they occur in today. This is clearly not generalizable to paleogeographies that are very different to that of today, but is perhaps acceptable if the paleogeography isn’t very different (i.e. in recent history)… more to come later.

So let’s take the geography of the present-day, and extract these three features. For the target, we will use present-day maps of temperature and runoff, since these are the climatic fields that we require for the project on the Southeast Asian islands. I then take a simple train/test split of these data, and evaluate model performance on the testing fold.

You can find details on the implementation of the model in the GitHub repository, so I refer the reader there if you’re interested. The model I use as a first attempt is a random forest regressor, tuning some of the hyperparameters. And here are the initial results:

As we can see, the model does surprisingly well at predicting temperature with just three features! However, it does significantly less well for the runoff (noting in particular that the scale for the runoff plot is logarithmic)… Let’s look at the residuals on a map:

Again, we can see that the model does pretty well with temperature, but doesn’t capture regional runoff phenomena (like that related to the monsoons over SE Asia).

This is a promising start, but we can do better. Stay tuned for further updates…