Maps and Models
One of the main messages of most psychological research into bias, and the hundreds of popular books that have followed, is that most people just aren’t intuitively good at thinking statistically. We make mistakes about the influence of sample size, and about representativeness and likelihood. We fail to understand regression to the mean, and often make mistakes about causation.
However, suggesting statistical competence as a universal cure can lead to new sets of problems. An emphasis on statistical knowledge and tests can introduce its own blind spots, some of which can be devastating.
The discipline of psychology is itself going through a “crisis’ over reproducibility of results, as this Bloomberg view article from the other week discusses. One recent paper found that only 39 out of a sample of 100 psychological experiments could be replicated. That would be disastrous for the position of psychology as a science, as if results cannot be replicated by other teams their validity must be in doubt. The p-value test of statistical significance is overused as a marketing tool or way to get published. Naturally, there are some vigorous rebuttals in process.
It is, however, a problem for other disciplines as well, which suggests the issues are genuine, deeper and more pervasive. John Ioannidis has been arguing the same about medical research for some time.
He’s what’s known as a meta-researcher, and he’s become one of the world’s foremost experts on the credibility of medical research. He and his team have shown, again and again, and in many different ways, that much of what biomedical researchers conclude in published studies—conclusions that doctors keep in mind when they prescribe antibiotics or blood-pressure medication, or when they advise us to consume more fiber or less meat, or when they recommend surgery for heart disease or back pain—is misleading, exaggerated, and often flat-out wrong. He charges that as much as 90 percent of the published medical information that doctors rely on is flawed.
The same applies to economics, where many of the most prominent academics apparently do not understand some of the statistical measures they use. A paper (admittedly from the 1990s) found that 70% of the empirical papers in the American Economic Review, the most prestigious journal in the field,” did not distinguish statistical significance from economic, policy, or scientific significance.” The conclusion:
We would not assert that every econoimst misunderstands statistical significance, only that most do, and these some of the best economic scientists.
Of course, the problems and flaws in statistical models in the lead up to the great crash of 2008 are also multiple and by now famous. If bank management and traders do not understand the “black box” models they are using, and their limits, tears and pain are the usual result.
The takeaway is not to impugn statistics. It is that people are nonetheless very good at making a whole set of different mistakes when they tidy up one aspect of their approach. More statistical rigor can also mean more blind spots to other issues or considerations, and use of technique in isolation from common sense.
The more technically proficient and rigorous you believe you are, often the more vulnerable you become to wishful thinking or blind spots. Technicians often have a remarkable ability to miss the forest for the trees, or twigs on the trees.
It also means there are (even more) grounds for mild skepticism about the value of many academic studies to practitioners.
“From the New Yorker to FiveThirtyEight, outlets across the spectrum failed to grasp the Trump phenomenon.” – Politico
It’s the morning after Super Tuesday, when Trump “overwhelmed his GOP rivals“.
The most comprehensive losers (after Rubio) were media pundits and columnists, with their decades of experience and supposed ability to spot trends developing. And political reporters, with their primary sources and conversations with campaigns in late night bars. And statisticians with models predicting politics. And anyone in business or markets or diplomacy or politics who was naive enough to believe confident predictions from any of the experts.
Politico notes how the journalistic eminences at the New Yorker and the Atlantic got it wrong over the last year.
But so did the quantitative people.
Those two mandarins weren’t alone in dismissing Trump’s chances. Washington Post blogger Chris Cillizza wrote in July that “Donald Trump is not going to be the Republican presidential nominee in 2016.” And numbers guru Nate Silver told readers as recently as November to “stop freaking out” about Trump’s poll numbers.
Of course it’s all too easy to spot mistaken predictions after the fact. But the same pattern has been arising after just about every big event in recent years. People make overconfident predictions, based on expertise, or primary sources, or big data, and often wishful thinking about what they want to see happen. They project an insidery air of secret confidences or confessions from the campaigns. Or disinterested quantitative rigor.
Then they mostly go horribly wrong. Maybe one or two through sheer luck get it right – and then get it even more wrong the next time. Predictions may work temporarily so long as nothing unexpected happens or nothing changes in any substantive way. But that means the forecasts turn out to be worthless just when you need them most.
The point? You remember the old quote (allegedly from Einstein) defining insanity: repeating the same thing over and over and expecting a different result.
Markets and business and political systems are too complex to predict. That means a different strategy is needed. But instead there are immense pressures to keep doing the same things which don’t work in media, and markets, and business. Over and over and over again.
So recognize and understand the pressures. And get around them. Use them to your advantage. Don’t be naive.
One of the greatest minds in twentieth-centry strategy was Thomas Schelling, who won the Nobel Prize for Economics in 2005 for his work on game theory back in the 1950s. Schelling was, however, extremely skeptical about treating strategy as “a branch of mathematics.” According to Lawrence Freedman, Schelling claimed he learned more about strategy from reading ancient Greek history and looking at salesmanship than studying game theory.
So, as often happens, one of the brilliant and creative founders of an abstract approach warned (slightly dimmer) followers against misusing or over applying it.
“One cannot, without empirical evidence,” Schelling observed, “deduce whatever understandings can be perceived in a non-zero-sum game of maneuver any more than one can prove, by purely formal deduction, that a particular joke is bound to be funny .”
Mainstream economics, however, went in a different direction. Now think of what this means for all the economic papers on policy rules and credibility/communication in monetary policy that I referred to in the last post. Most of the problem of central bank communication come from trying to prove by deduction much the same thing as a joke is funny.
I’ve been talking about the difference between variable or random error, and systemic or constant errors. Another way to put this is the difference between precision and accuracy. As business measurement expert Douglas Hubbard explains in How to Measure Anything: Finding the Value of Intangibles in Business,
“Precision” refers to the reproducibility and conformity of measurements, while “accuracy” refers to how close a measurement is to its “true” value. .. To put it another way, precision is low random error, regardless of the amount of systemic error. Accuracy is low systemic error, regardless of the amount of random error. … I find that, in business, people often choose precision with unknown systematic error over a highly imprecise measurement with random error.
Systemic error is also, he says, another way of saying “bias”, especially expectancy bias – another term for confirmation bias, seeing what we want to see – and selection bias – inadvertent non-randomness in samples.
Observers and subjects sometimes , consciously or not, see what they want. We are gullible and tend to be self-deluding.
That brings us back to the problems on which Alucidate sets its sights. Algorithms can eliminate most random or variable error, and bring much more consistency. But systemic error is then the main source of problems or differential performance. And businesses are usually knee-deep in it, partly because the approaches which reduce variable error often increase systemic error in practice. There’s often a trade-off between dealing with the two kinds of error, and that trade-off may need to be set differently in different environments.
I like most of Hubbard’s book, which I’ll come back to another time. It falls into the practical, observational school of quantification rather than the math department approach, as Herbert Simon would put it.
But one thing he doesn’t focus on enough is learning ability and iteration – the ability to change your model over time. If you shoot at the target and observe you hit slightly off center, you can adjust the next time you fire. Sensitivty to evidence and the ability to learn is the most important thing to watch in macro and market decision-making. In fact, the most interesting thing in the recent enthusiasm about big data is not the size of datasets or finding correlations. It’s the improved ability of computer algorithms to test and adjust models – Bayesian inversion. But that has limits and pitfalls as well.
Markets increasingly rely on quantitative techniques. When does formal analysis help make better decisions? In the last post I was talking about the difference between the errors produced by “analytic” and “intuitive” thinking in the context of the collapse of formal planning in corporate America. Those terms can be misleading, however, because it implies it is somehow all a matter of rational technique versus “gut feel.”
Here’s another diagram, from near the beginning of James Reason’s classic analysis, Human Error (an extremely important book which I’ll come back to another time.) Two marksmen aim at a target, but the pattern of errors is very different.
A is the winner, based on raw scores. He is roughly centered, but dispersed and sloppy. B is much more consistent but off target.
This shows the difference between variable error (A), on the one hand, and constant or systematic error (B) on the other. B is probably the better marksman even though he lost, says Reason, because his sights could be misaligned or there could be an additional factor throwing him off. B’s error is more predictable, and potentially more fixable. But fixing it depends on the extent to which the reasons for the error are understood.
What else could cause B to be off? (Reason doesn’t discuss this in the context). In real life decisions (and real life war) the target is often moving, not static. That means errors like B makes are pervasive.
Let’s relate this back to one of the central problems for making decisions or relying on advice or expertise. Simple linear models make far better predictions than people in a vast number of situations. This is the Meehl problem, which we’ve known about for fifty years. In most cases, reducing someone’s own expertise to a few numbers in a linear equation will predict outcomes much better than the person him- or her-self. Yes, reducing all your years of experience to three or four numerical variables and sticking them in a spreadsheet will mostly outperform your own best judgement. (It’s called ‘bootstrapping.’)
In fact, the record of expert prediction in economics and politics – the areas markets care about – is little better than chimps throwing darts. This is the Tetlock problem, which is inescapable for any research firm or hedge fund since he published his book in 2005. Why pay big bucks to hire chimps?
But the use of algorithms in markets and formal planning in corporations has also produced catastrophe. It isn’t just the massive failure of many models during the global financial crisis. The most rigorously quant-based hedge funds are still trailing the indices, and it seems like the advantage quant techniques afforded is actually becoming a vulnerability as so many firms use the same kind of VAR models. So what’s the right answer?
Linear models perform better than people in many situations because they reduce or eliminate the variable error. Here’s how psychologist Johnathan Baron explains why simplistic models usually outperform the very person or judge they are based on, in a chapter on Quantitative judgment in his classic text on decision-making, Thinking and Deciding:
Why does it happen? Basically, it happens because the judge cannot be consistent with his own policy .. He is unreliable, in that he is likely to judge the same case differently on different occasions (unless he recognizes the case the second time. As Goldberg (1970) puts it…. ‘If we could remove some of this human unreliability by eliminating the random error in his judgements, we should thereby increase the validity of the resulting predictions.’ (p406)
But algorithms and planning and formal methods find it much harder to deal with systematic error. This is why, traditionally, quantitative methods have been largely confined to lower and middle-management tasks like inventory or production scheduling or routine valuation. Dealing with systematic error requires the ability to recognize and learn.
Formal optimization does extremely well in static, well-understood, repetitive situations. But once the time period lengthens, so change becomes more likely, and the complexity of the situation increases, formal techniques can produce massive systematic errors. The kind that kill companies and careers.
What’s the upshot? It isn’t an argument against formal models. I’m not remotely opposed to quantitative techniques. But it is a very strong argument for looking at boundary conditions for the applicability of techniques to different problems and situations. It’s like the Tylenol test I proposed here: two pills cure a headache. Taking fifty pills at once will probably kill you.
It is also a very strong argument for looking carefully at perception, recognition and reaction to evidence as the core of any attempt to find errors and blind spots. It is essential to have a way to identify and control systematic errors as well as variable errors. Many companies try to have a human layer of judgment as a kind of check on the models for this reason. But that’s where all the deeper problems of decision-making like confirmation bias and culture rear their head. The only real way to deal with the problem is to have an outside firm which looks for those specific problems.
You can’t seek alpha or outperformance by eliminating variable error any more. That’s been done, as we speak, by a million computers running algorithms. Markets are very good at that. The only way to get extra value is to look for systematic errors.
How do we explain why rigorous, formal processes can be very successful in some cases, and disastrous in others? I was asking this in reference to Henry Mintzberg’s research on the disastrous performance of formal planning. Mintzberg cites earlier research on different kinds of errors in this chart (from Mintzberg, 1994, p327).
.. the analytic approach to problem solving produced the precise answer more often, but its distribution of errors was quite large. Intuition, in contrast, was less frequently precise but more consistently close. In other words, informally, people get certain kinds of problems more or less right, while formally , their errors, however infrequent, can be bizarre.
This is important, because it lies underneath a similar distinction that can be found in many other places. And because the field of decision-making research is so fragmented, the similar answers usually stand alone and isolated.
Consider, for example, how this relates to Nicholas Nassim Taleb’s distinction between Fragile and Antifragile approaches and trading strategies. Think of exposure, he says, and the size and risk of the errors you may make.
A lot depends on whether you want to rigorously eliminate small errors, or watch out for really big errors.
How much should you trust “gut feel” or “market instincts” when it comes to making decisions or trades or investments? How much should you make decisions through a rigorous, formal process using hard, quantified data instead? What can move the needle on performance?
In financial markets more mathematical approaches have been in the ascendant for the last twenty years, with older “gut feel” styles of trading increasingly left aside. Algorithms and linear models are much better at optimizing in specific situations than the most credentialed people are (as we’ve seen.) Since the 1940s business leaders have been content to have operational researchers (later known as quants) make decisions on things like inventory control or scheduling, or other well-defined problems.
But rigorous large-scale planning to make major decisions has generally turned out to be a disaster whenever it has been tried. It has generally been about as successful in large corporations as planning also turned out to be in the Soviet Union (for many of the same reasons). As one example, General Electric originated one of the main formal planning processes in the 1960s. The stock price then languished for a decade. One of the very first things Jack Welch did was to slash the planning process and planning staff. Quantitative models (on the whole) performed extremely badly during the Great Financial Crisis. And hedge funds have increasing difficulty even matching market averages, let alone beating them.
What explains this? Why does careful modeling and rigor often work very well on the small scale, and catastrophically on large questions or longer runs of time? This obviously has massive application in financial markets as well, from understanding what “market instinct” is to seeing how central bank formal forecasting processes and risk management can fail.
Something has clearly been wrong with formalization. It may have worked wonders on the highly structured, repetitive tasks of the factory and clerical pool, but whatever that was got lost on its way to the executive suite.
I talked about Henry Mintzberg the other day. He pointed out that contrary to myth, most successful senior decision-makers are not rigorous or hyper-rational in planning, Quite the opposite. In the 1990s he wrote a book, The Rise and Fall of Strategic Planning, which tore into formal planning and strategic consulting (and where the quote above comes from.)
There were three huge problems, he said. First, planners assumed that analysis can provide synthesis or insight or creativity. Second, that hard quantitative data alone ought to be the heart of the planning process. Third, assuming the context for plans is stable, or predictable. All of them were just wrong. For example,
For data to be “hard” means that they can be documented unambiguously, which usually means that they have already been quantified. That way planners and managers can sit in their offices and be informed. No need to go out and meet the troops, or the customers, to find out how the products get bought or the wards get flight to what connects those strategies to that stock price; all that just wastes time.
The difficulty, he says, is that hard information is often limited in scope, “lacking richness and often failing to encompass important noneconomic and non-quantitiative factors.” Often hard information is too aggregated for effective use. It often arrives too late to be useful. And it is often surprisingly unreliable, concealing numerous biases and inaccuracies.
The hard data drive out the soft, while that holy ‘bottom line’ destroys people’s ability to think strategically. The Economist described this as “playing tennis by watching the scoreboard instead of the ball.” .. Fed only abstractions, managers can construct nothing but hazy images, poorly focused snapshots that clarify nothing.
The performance of forecasting was also woeful, little better than the ancient Greek belief in the magic of the Delphic Oracle, and “done for superstitious reasons, and because of an obsession with control that becomes the illusion of control. ”
Of course, to create a new vision requires more than just soft data and commitment: it requires a mental capacity for synthesis, with imagination. Some managers simply lack these qualities – in our experience, often the very ones most inclined to rely on planning, as if the formal process will somehow make up for their own inadequacies. … Strategies grow initially like weeds in a garden: they are not cultivated like tomatoes in a hothouse.
Highly analytical approaches often suffered from “premature closure.”
.. the analyst tends to want to get on with the more structured step of evaluation alternatives and so tends to give scant attention to the less structured, more difficult, but generally more important step of diagnosing the issue and generating possible alternatives in the first place.
So what does strategy require?
We know that it must draw on all kinds of informational inputs, many of them non-quantifiable and accessible only to strategists who are connected to the details rather than detached from them. We know that the dynamics of the context have repeatedly defied any efforts to force the process into a predetermined schedule or onto a predetermined track. Strategies inevitably exhibit some emergent qualities, and even when largely deliberate, often appear less formally planned than informally visionary. And learning, in the form of fits and starts as well as discoveries based on serendipitous events and the recognition of unexcited patterns, inevitably plays a role, if not the key role in the development of all strategies that are novel. Accordingly, we know that the process requires insight, creativity and synthesis, the very thing that formalization discourages. [my bold]
If all this is true (and there is plenty of evidence to back it up), what does it mean for formal analytic processes? How can it be reconciled with the claims of Meehl and Kahneman that statistical models hugely outperform human experts? I’ll look at that next.
Have you noticed how much the business world increasingly talks about “insight’, but in vague, undefined and often murky ways? The term is becoming ever more common as the perceived value of raw information goes down. What does it mean?
Insight is that flash of recognition when you see something in a fresh way. What seemed murky becomes clear. What seemed confusing now has regularities, or at least patterns. It is about recognition and intuitive understanding of what action needs to be taken.
That’s why I’ve been talking about research into decision-making recently. It isn’t because of academic interest. Decisions are a very practical matter, about specific situations rather than generalities. But for twenty years I saw the brightest policymakers and leading market players make decisions that went terribly wrong. And I noticed when they got things very right. What made the difference between success and failure?
To answer that you need to recognize the patterns, and that means you need to look at grounded, empirical research. Of course, experience is essential. But you need to learn the lessons from experience, too.
All research involves taking an abstract step back and trying to find patterns. The trouble is most academic research in economics and finance is centered around models which seek to explain things in general terms.
But a model isn’t the only way to identify the important features in a situation (despite what many academic economists believe.) In fact, most problems confronting decision-makers are more like “how do I get from A to B” or “What are the major risks just ahead of me and how do I go round them?” For most real problems, a map is more useful than an abstract model. You can see the lie of the land and where you need to go. You recognize and name the features of the landscape. You know which direction to head next. That’s what you need if you want to go places.
How is thinking in terms of maps different? Maps retain more useful detail relating to particular purposes and tasks. They are specific about facts on the ground, but they have the right scale and representation of the problem. For example, you use a road map for driving from New York to Boston. It leaves out most of the detail of roads in urban subdivisions or farm tracks, but Interstate 95 is very clear. You use a nautical chart for taking a sailboat into Mystic Harbor.
Models offer generalized “explanation” based on a few easily quantified variables. But if you want to reach harbor safely you are better off with a chart showing the actual, very specific rocks in the channel, instead of a mathematical model of boats.
Maps can show the appropriate scale of detail for the task in hand. They can show shorter routes to your destination. They are less reliant on assumptions. They orient you on the landscape and let you know where you are, even when the outlook is foggy and unclear. They are traditionally drawn by triangulating from different viewpoints rather than a single perspective.
So I’m looking at research on this blog which helps map the territory, and find the blindspots – the cliffs, the marshes, the six-lane interstates to your destination. You need to see what people have already observed about the landscape. (The actual reports for clients don’t go into the research, just the results – the map itself, not the why. I just find it fascinating and love writing about it)
Decision-makers are like explorers. You can wander off into the desert yourself. But it helps if you have a map. Insight means you’ve recognized how to get to where you want to go.
Paul Krugman was calling for “hard thinking” the other day. Discussion about decisions can sometimes break down into a shouting march between those who want “hard”, quantified, consistent, rigorous approaches, and those who want “soft” attention to attention, context, social influence and limits. (You can also see it reflected in wider debates about big data, or the merits of journalism, or the role of science.)
Modern economics is nothing if not highly mathematical, and there are always broader demands for more “rigor” and “quantification.”
But there is a big difference between “rigorous” and “quantitative.” I was also just talking about Herbert Simon, the towering intellectual pioneer who won the Nobel Prize for Economics in 1978, and also founded much of modern cognitive psychology and software engineering. Simon was a brilliant mathematician, but he rejected the demands for “rigor.”
For me, mathematics had always been a language for thought. Mathematics – this sort of non-verbal thinking – is my language of discovery. It is the tool I use to arrive at new ideas. This kind of mathematics is relatively unrigorous, loose, heuristic. Solutions reached with its help have to be checked for correctness. It is physicist’s mathematics or engineer’s mathematics, rather than mathematician’s mathematics.
Economics as a discipline has largely adopted mathematicians’ mathematics, with its aesthetic belief in elegance and rigor and consistency. Simon argued the point repeatedly with some of the giants of mainstream economics, like Tjalling Koopmans and Kenneth Arrow.
For Tjalling Koopmans, it appeared, mathematics was a language of proof. It was a safeguard to guarantee that conclusions were correct, that they could be derived rigorously. Rigor was essential. (I heard the same views, in even more extreme form, expressed by Gerard Debreu, and Kenneth Arrow seems mainly to share them.) I could never persuade Tjalling that ideas are to be arrived at before their correctness can be guaranteed – that the logic of discovery is quite different to the logic of verification. It is his view, of course, that prevails in economics today, and to my mind it is a great pity for economics and the world that it does.
This brings up a deeper point about different mindsets. I argued here that “hedgehogs” are deductive by nature, in the same way Euclid was as far back as 290 BC. But modern science did not really begin until a more inductive “fox-like” experimental approach was adopted in the 17th century. Science is inductive. Formal mathematics is deductive.
Quantification straddles the divide between foxes and hedgehogs, on a different axis. You can be mathematical and quantitative – but interested in observation and approximation, rather than proof and consistency. You can adopt engineer’s math, and make the bridge stand up. Quantification does not necessarily entail “rigor” in the math department sense. Instead, you might want to go measure something in the first place.
When hedgehogs become policymakers, elegance and rigor become serious problems because they inhibit inductive learning.
It turns out that one of the major advertised achievements of big data, Google Flu Trends (GFT), doesn't work, according to new research. Google claimed it could detect flu outbreaks with unparalleled real-time accuracy by monitoring related search terms. In fact, the numbers were 50% or more off.
Just because companies like Google can amass an astounding amount of information about the world doesn’t mean they’re always capable of processing that information to produce an accurate picture of what’s going on—especially if turns out they’re gathering the wrong information. Not only did the search terms picked by GFT often not reflect incidences of actual illness—thus repeatedly overestimating just how sick the American public was—it also completely missed unexpected events like the nonseasonal 2009 H1N1-A flu pandemic.
If you wanted to project current flu prevalence, you would have done much better basing your models off of 3-week-old data on cases from the CDC than you would have been using GFT’s sophisticated big data methods.
One additional problem is the actual Google methods (including the search terms and the underlying algorithm) are opaque, proprietary and difficult to replicate. That also makes it much harder for outside scientists to work out what went wrong, or improve the techniques over time.
It doesn't mean there isn't serious value in big data. But it is often overhyped and overgeneralized for commercial reasons by big tech, like Google and IBM and Facebook.
Most of the value isn't the bigness or the statistical sophistication, but the fact it is often new data, observations that we did not have before. I was at a conference last year where some data researchers talked about how they had improved ambulance response times in New York. They had a look at GPS data on where ambulances waited, and could move some of them closer to likely cases.
It is marvellous. But the key element, of course, is the fact that GPS receivers have become so cheap and omnipresent that we can put them in ambulances and smartphones. In essence, it's the same value as Galileo turning his telescope to the skies for the first time and observing the moons of Jupiter more accurately. It's something we've been doing for five hundred years: using new instruments to get better observations.
You still need to sift and weigh the evidence, and you still have the usual problems and serious risks involved with that. You need to be aware of assumptions, and ask the right questions, and look in the right places, and avoid cherry-picking data which confirms what you already think.
Big data techniques work well for some kinds of problems. There is genuine innovative value in new Bayesian inference techniques in particular.
But can also lead to some specific kinds of carelessness and blind spots, and misapplication to wrong problems. Overemphasizing correlation is one of them. Financial market players have repeatedly had to find the limits of similar quantitative techniques the hard way. Just ask Long Term Capital or the mortgage finance industry.
And most interesting commercial and social problems are not very large aggregations of homogenous data, but smaller dynamic systems, which need very different techniques. There are some very important issues here, and specific kinds of mistakes which I will return to.