Everything You Wanted to Know about Agent Based Modeling (but Were Afraid to Ask)

I’ve done a lot of verbal explaining of agent based modeling and its power and potential over the last several months. And after a very interesting and productive meeting with Dr. Tanja Srebotnjak, formerly of Ecologic Institute and now the inaugural Hixon Professor for Sustainable Environmental Design at Harvey Mudd, I decided to put together a summary of my thinking on agent based modeling combined with a list of relevant sources. I’ve discovered a lot of good work, and my own thinking has substantially evolved since my first post on the subject in July of last year.

To start at the beginning I’d like to briefly address modeling more generally in science because the word is associated with very complex exercises by very smart and technical people which spit out results that are all but incomprehensible to most of us without abundant interpretation. Climate models (which often have agent based components) are one example, as are macroeconomic and trade models, and simulations of biological and ecological systems.

We should not, however, get too intimidated. At the end of the day, a model airplane, be it made out of paper or plastic, is just as much a model as the others. It may be used as a toy, but if it is useful for answering scientific questions then it’s a scientific model. None of us have problems understanding the basic thinking behind building a physical model and placing it in a wind tunnel for various tests. The model is not the real thing, it may be of smaller scale, be built out of different materials, lack certain internal components, but as long as it captures the features of interest, it is perfectly adequate. In the case of an airplane or car in a wind tunnel, it is adequate for various scientific tests to inform an engineering process.

It should be noted that all things being equal, as long as the model captures the features of interest, simple is better than complex. Complexity means we are more likely to misunderstand what dynamics and processes drive our results, and means we’re far more likely to make a mistake. This is why we certainly shouldn’t get into the game of trying to build a model that accurately details all aspects of the system or thing we are trying to represent, because at a certain level of complexity you’re better off just observing the system or thing itself. In this case get out of the lab and into the field.

An acceptance that all models are simplifications, and thus all models are wrong, is necessary before one gets too far into designing a model. And as all models are simplifications, they are all in some sense wrong. That they are “wrong” is no cause for concern, provided that we exercise judgement and use our models to improve our understanding. Results require interpretation.

Sometimes the model is bigger than the real thing.

Sometimes the model is bigger than the real thing.

I suspect, however, that complexity isn’t so much a challenge for our understanding as the fact that models are often abstract. While a physical model replicates features of interest in a way that is literally tangible, a lot of scientific models are based on symbols, and substantial education and background is often required to understand how these symbols can be manipulated and interpreted. I have good news for anyone reading this though: you have already mastered the most important symbol system for modern scientific discourse, that is, the English language. It took a lot of work, congrats!

The nice thing about a symbol system is I don’t have to find physical analogues to provide an adequate description of a “thing” I’m talking about. Saying, “The man was big!” communicates an incredible amount of information which it would take an incredible amount of time and effort to relay with hand gestures involving available objects.

“The man was big as a horse!” says even more in a literary way, but poses literal challenges, and here we confront the central challenge of natural language. While it can tell us so much, such a description may define this man better and tell us a lot more about him than a scientific description, it has numerous ambiguities. Horses vary in size, some are very short, we surely don’t mean them. And do we mean physical height? Or weight? And of course weight is dependent on gravity… Surely if we want a scientific description of the man’s dimensions, quantitative expressions rooted in readily observable physical phenomena, i.e. the metric system, will be far superior.

There is nothing inherently scientific about numbers, but what separates numerology from scientific work is that when numbers are rooted in a symbol system with explicit rules, such as mathematics, they suddenly become incredibly powerful tools for describing and theorizing about the world. And when you connect your describing and theorizing with scientific methodologies, well, there you have it. From Newton onward, mathematics, particularly calculus, has been used to powerful effect across the natural sciences. The role of quantification in the social sciences is more contentious. Statistics is of course invaluable to the social sciences though somewhat limited because much of social life is made up of processes which are not readily observable. The area of heated dispute is around the veracity of the assumptions that make it possible for economic theory to be expressed in terms of calculus.

In the social sciences a lot of ink has been spilled on the relative merits of qualitative versus quantitative science. In fact, qualitative and quantitative methods should complement each other and are both perfectly legitimate means of conducting scientific research. Indeed qualitative simulation is now an area of interest for physical systems where information is incomplete and ease of interpretation of outputs is more important than specificity.

The evolution of simulation, quantatiive and qualitative, from Troitzsch 1997

The evolution of simulation, quantitative and qualitative, from Troitzsch 1997

If we draw on Troitzsch 1997 (which I have been doing), we could interpret the quantitative versus qualitative fight over preferences for two abstract symbol systems, natural language and mathematics, each with various merits and deficiencies. The economist likes the explicitness of mathematics while the political scientist prefers the nuance made possible by natural language. And to borrow further from Troitzsch, we now have a third available symbol system, one which allows us to manipulate computer environments. And computers allow us to represent a lot of things qualitatively, but in a manner which lends itself to eventual quantification.

(Now the clever will quickly point out that anything that a digital computer can do can be expressed mathematically. True, but many a smart computer programmer will also tell you all the math he or she learned in college was a waste of time. There are very important times when knowing what exactly a “primitive” in a programming language means mathematically, but these can be learned on an ad hoc basis.Importantly, when we learn how to use the programming language, and let the computer do the number crunching for us. This means that we can represent and model things in ways which would never be practical with a pen and paper.)

To show the potential of computer simulation to compliment and improve on mathematical approaches Grimm and Railsback 2012 give a “motivating example” from field of epidemiology, specifically the transmission and effectiveness of control methods for the spread of rabies in Europe.

Rabies is bad, its spread can pose public health risks as well as risks to livestock, and in Europe wild foxes are known to be the primary vector. Once an outbreak can be detected it can be controlled by deploying bait which will immunize the uninfected fox populations.

As the baiting program is expensive there is a real interest in deploying the minimum

Rabid Fox!

Rabid Fox!

amount of bait to be effective. So differential equations (calculus) were developed to express the spread of disease and bait was deployed in an amount sufficient to halt the predicted spread. This strategy was shown to be effective, i.e. the baiting program was shown to halt the spread of rabies in the wild fox populations.

But what if we could do better, i.e. deploy even less bait and achieve the same rabies fighting effectiveness?

A flying fox, and predictions of herd immunity. (Thulke and Eisinger 2008)

A flying fox, and predictions of herd immunity. (Thulke and Eisinger 2008)

Thulke and Eisinger 2008 took a shot using an agent based model which had an advantage over the calculus based model, it could take into account spatial effects, i.e. foxes could now only transmit the disease to other  foxes with whom they came in direct physical contact. The agent based model predicted that a program which deployed less bait could be just as effective, leading to cost savings. Of course, there would be a risk to decreasing how much bait was deployed: what if the agent based model was wrong? To deal with this problem the researchers ran their model with the spatial effects switched off, i.e. they allowed the foxes to “fly” and transmit rabies to non-local foxes. With the spatial effects switched off, the model predicted the same result as the former, calculus based model. Since this study a new, reduced baiting program has been deployed, leading to substantial cost savings.

The salience of such local interactions is one reason to select an agent based modeling approach over a structural approach that relies on differential equations. Another reason to select an agent based modeling approach is expected diversity of characteristics or behaviors among the agents.

To move to a context with which I am far more comfortable than epidemiology, let’s move back to the social sciences. In confronting the problem of racial segregation Nobel Laureate Thomas Schelling needed to give the agents in his model diverse characteristics, that is each agent had a race. The significant result of his famous early agent based model is that it showed if the agents then sorted themselves based on a preference that one or two of their neighbors shared their race, that relatively extreme patterns of racial segregation could emerge. If you want to play with his model, it comes standard with the modeling library included with Netlogo, a free and easy to use programming environment.

And with emergence, we have the last important consideration for choosing an agent based model. Agent based modeling really got started in the 1990s with ecologists trying to look at the behavioral traits and interactions between individual creatures that led to flocking, a pattern which emerges not from top down, but from bottom up. And with that, I’d say we have it. If you’re modeling a system and concerned with:

  1. The effects of local interactions between individuals and their environment.
  2. The effects of varying characteristics and behaviors of these individuals.
  3. Patterns that emerge from the above.

Well, then you’ll want to consider agent based modeling. And I’d suggest you download Netlogo and start playing with the model library, it’s free and the included models are very visual and entertaining. Now if you want to start using Netlogo for scientific work it will take a bit more investment, but nothing unrealistic. Weeklong courses which are enough to get you started are offered fairly regularly in the U.S. and Europe, and if the timing for a course doesn’t work out, you can teach yourself to program and Netlogo as well as the state of art methodology for doing scientific work with it with Railsback and Grimm 2012. I taught myself from the book, and certainly have no regrets about the time invested (it can be enjoyably worked through in a month).

I hope you enjoyed this introduction. Please comment below if anything was unclear or in need of correction.

Leave a Reply

Your email address will not be published. Required fields are marked *