Home/Lens Model

The Difference between Puzzles and Mysteries

Companies and investment firms go extinct when they fail to understand the key problems they face. And right now, the fundamental nature of the problem many corporations and investors and policymakers face has changed. But mindsets have not caught up.

Ironically, current enthusiasms like big data can compound the problem. Big data,  as I’ve argued before, is highly valuable for tacking some kinds of problems, when you have very large amounts of data of essentially similar, replicable events. Simple algorithms and linear models also beat almost every expert in many situations, largely because they are more consistent.

The trouble is many of the most advanced problems are qualitatively different. Here’s an argument by Gregory Treverton, who argues there is a fundamental difference between ‘puzzles’ and ‘mysteries.’

There’s a reason millions of people try to solve crossword puzzles each day. Amid the well-ordered combat between a puzzler’s mind and the blank boxes waiting to be filled, there is satisfaction along with frustration. Even when you can’t find the right answer, you know it exists. Puzzles can be solved; they have answers.

But a mystery offers no such comfort. It poses a question that has no definitive answer because the answer is contingent; it depends on a future interaction of many factors, known and unknown. A mystery cannot be answered; it can only be framed, by identifying the critical factors and applying some sense of how they have interacted in the past and might interact in the future. A mystery is an attempt to define ambiguities.

Puzzles may be more satisfying, but the world increasingly offers us mysteries. Treating them as puzzles is like trying to solve the unsolvable — an impossible challenge. But approaching them as mysteries may make us more comfortable with the uncertainties of our age.

Here’s the interesting thing: Treverton is former Head of the Intelligence Policy Center at RAND, the fabled national security-oriented think tank based in Santa Monica, CA, and before that Vice Chair of the National Intelligence Council.  RAND was also arguably the richly funded original home of the movement to inject mathematical and quantitative rigor into economics and social science, as well as one of the citadels of actual “rocket science” and operations research after WW2.  RAND stands for hard-headed rigor, and equally hard-headed national security thinking.

So to find RAND arguing for the limits of “puzzle-solving” is a little like finding the Pope advocating Buddhism.

The intelligence community was focused on puzzles during the Cold War, Treverton says. But current challenges fall into the mystery category.

Puzzle-solving is frustrated by a lack of information. Given Washington’s need to find out how many warheads Moscow’s missiles carried, the United States spent billions of dollars on satellites and other data-collection systems. But puzzles are relatively stable. If a critical piece is missing one day, it usually remains valuable the next.

By contrast, mysteries often grow out of too much information. Until the 9/11 hijackers actually boarded their airplanes, their plan was a mystery, the clues to which were buried in too much “noise” — too many threat scenarios.

The same applies to financial market and business decisions. We have too much information. Attention and sensitivity to evidence are now the prime challenge facing many decision-makers. Indeed, that has always been the source of the biggest failures in national policy and intelligence.

It’s partly a consequence of the success of many analytical techniques and information gathering exercises. The easy puzzles, the ones susceptible to more information and linear models and algorithms ,  have been solved, or at least automated. That means it’s the mysteries and how you approach them that move the needle on performance.


2017-05-11T17:32:42+00:00 October 10, 2014|Assumptions, Big Data, Decisions, Lens Model, Security, Uncategorized|

Two Kinds of Error (part 3)

I’ve been talking about the difference between variable or random error,  and systemic or constant errors.  Another way to put this is the difference between precision and accuracy. As business measurement expert Douglas Hubbard explains in How to Measure Anything: Finding the Value of Intangibles in Business,

“Precision” refers to the reproducibility and conformity of measurements, while “accuracy” refers to how close a measurement is to its “true” value. .. To put it another way, precision is low random error, regardless of the amount of systemic error. Accuracy is low systemic error, regardless of the amount of random error. … I find that, in business, people often choose precision with unknown systematic error over a highly imprecise measurement with random error.

Systemic error is also, he says, another way of saying “bias”, especially expectancy bias – another term for confirmation bias, seeing what we want to see –  and selection bias – inadvertent non-randomness in samples.

Observers and subjects sometimes , consciously or not, see what they want. We are gullible and tend to be self-deluding.

That brings us back to the problems on which Alucidate sets its sights. Algorithms can eliminate most random or variable error, and bring much more consistency. But systemic error is then the main source of problems or differential performance. And businesses are usually knee-deep in it, partly because the approaches which reduce variable error often increase systemic error in practice. There’s often a trade-off between dealing with the two kinds of error, and that trade-off may need to be set differently in different environments.

I  like most of Hubbard’s book, which I’ll come back to another time. It falls into the practical, observational school of quantification rather than the math department approach, as Herbert Simon would put it.

But one thing he doesn’t focus on enough is learning ability and iteration – the ability to change your model over time.  If you shoot at the target and observe you hit slightly off center, you can adjust the next time you fire. Sensitivty to evidence and the ability to learn is the most important thing to watch in macro and market decision-making. In fact, the most interesting thing in the recent enthusiasm about big data is not the size of datasets or finding correlations. It’s the improved ability of computer algorithms to test and adjust models – Bayesian inversion. But that has limits and pitfalls as well.

Two Kinds of Error (part 2)

Markets increasingly rely on quantitative techniques. When does formal analysis help make better decisions? In the last post I was talking about the difference between the errors produced by “analytic” and “intuitive” thinking in the context of the collapse of formal planning in corporate America.  Those terms can be misleading, however, because it implies it is somehow all a matter of rational technique versus “gut feel.”

Here’s another diagram, from near the beginning of James Reason’s classic analysis, Human Error (an extremely important book which I’ll come back to another time.) Two marksmen aim at a target, but the pattern of errors is very different.


JamesReason 1


A is the winner, based on raw scores. He is roughly centered, but dispersed and sloppy. B is much more consistent but off target.

This shows the difference between variable error (A), on the one hand, and constant or systematic error (B) on the other. B is probably the better marksman even though he lost, says Reason, because his sights could be misaligned or there could be an additional factor throwing him off. B’s error is more predictable, and potentially more fixable. But fixing it depends on the extent to which the reasons for the error are understood.

What else could cause B to be off? (Reason doesn’t discuss this in the context). In real life decisions (and real life war) the target is often moving, not static. That means errors like B makes are pervasive.

Let’s relate this back to one of the central problems for making decisions or relying on advice or expertise. Simple linear models make far better predictions than people in a vast number of situations.  This is the Meehl problem, which we’ve known about for fifty years. In most cases, reducing someone’s own expertise to a few numbers in a linear equation will predict outcomes much better than the person him- or her-self. Yes, reducing all your years of experience to three or four numerical variables and sticking them in a spreadsheet will mostly outperform your own best judgement. (It’s called ‘bootstrapping.’)

In fact, the record of expert prediction in economics and politics – the areas markets care about – is little better than chimps throwing darts. This is the Tetlock problem, which is inescapable for any research firm or hedge fund since he published his book in 2005.  Why pay big bucks to hire chimps?

But the use of algorithms in markets and formal planning in corporations has also produced catastrophe. It isn’t just the massive failure of many models during the global financial crisis. The most rigorously quant-based hedge funds are still  trailing the indices, and it seems like the advantage quant techniques afforded is actually becoming a vulnerability as so many firms use the same kind of VAR models.  So what’s the right answer?

Linear models perform better than people in many situations because they reduce or eliminate the variable error. Here’s how psychologist Johnathan Baron explains why simplistic models usually outperform the very person or judge they are based on,  in a chapter on Quantitative judgment in his classic text on decision-making, Thinking and Deciding:

Why does it happen? Basically, it happens because the judge cannot be consistent with his own policy .. He is unreliable, in that he is likely to judge the same case differently on different occasions (unless he recognizes the case the second time. As Goldberg (1970) puts it…. ‘If we could remove some of this human unreliability by eliminating the random error in his judgements, we should thereby increase the validity of the resulting predictions.’  (p406)

But algorithms and planning and formal methods find it much harder to deal with systematic error. This is why, traditionally, quantitative methods have been largely confined to lower and middle-management tasks like inventory or production scheduling or routine valuation.  Dealing with systematic error requires the ability to recognize and learn.

Formal optimization does extremely well in static, well-understood, repetitive situations. But once the time period lengthens, so change becomes more likely, and the complexity of the situation increases, formal techniques can produce massive systematic errors.  The kind that kill companies and careers.

What’s the upshot? It isn’t an argument against formal models. I’m not remotely opposed to quantitative techniques. But it is a very strong argument for looking at boundary conditions for the applicability of techniques to different problems and situations. It’s like the Tylenol test I proposed here: two pills cure a headache. Taking fifty pills at once will probably kill you.

It is also  a very strong argument for looking carefully at perception, recognition and reaction to evidence as the core of any attempt to find errors and blind spots. It is essential to have a way to identify and control systematic errors as well as variable errors.  Many companies try to have a human layer of judgment as a kind of check on the models for this reason. But that’s where all the deeper problems of decision-making like confirmation bias and culture rear their head. The only real way to deal with the problem is to have an outside firm which looks for those specific problems.

You can’t seek alpha or outperformance by eliminating variable error any more. That’s been done, as we speak, by a million computers running algorithms. Markets are very good at that. The only way to get extra value is to look for systematic errors.


2017-05-11T17:32:44+00:00 May 12, 2014|Adaptation, Decisions, Human Error, Lens Model, Quants and Models, Risk Management|

How can experts know so much, and predict so badly?

That’s a question that recurs all the time, including in the aftermath of the Fed decision last week.  But the question isn’t inspired by last week’s events.

In fact, it’s part of the title of a classic paper in decision research, over 20 years old, by Colin Camerer and Eric Johnson, The Process-Performance Paradox in expert judgment; How can the experts know so much and predict so badly?  (1991).   I’ve talked before about the increasing evidence that most predictions in economics and politics are terrible.

Evidence on poor expert prediction has been accumulating for fifty years, which is why we have to find better ways to do this. That’s why I am focusing on the way people make decisions, not simply looking at recent data or forecasts.

In most situations, expert judgment can be outperformed by a very simple linear model. That is why I’ve been interested in developing such a model for economic decisions, as one major tool in a decision arsenal.

Note that this means a particular kind of model. “Model” is one of those weasel words that can mean ten different things to different people. We’re not talking about a parsimonious economic model, of rational agents optimizing subject to constraints; or an economic forecast model which attempts to simulate the economy; or a valuation model, to calculate the value of a bond or other asset; or a risk model which examines correlations. It’s definitely not big data or data-mining. Using the wrong kind of model without awareness of their weaknesses can produce catastrophe, as the 2008 crisis demonstrated.

Instead, it means a model to weigh and combine evidence. In essence, it’s a slightly more sophisticated version of a list of pros and cons. Indeed, more complicated models actually do worse than very basic approaches. Just having a model with the signs, plus or minus, correct often does better than models which use the most advanced math.

Why would such a simple model usually do better than the most credentialed, famous, knowledgeable people, from i-banks to nobel laureates? Why does a very dumb equation do better than the most renowned experts in just about all fields that have been studied, from clinical medical judgment to buying bullets for police departments to selecting the best applicants for college?

Camerer and Johnson, in the paper above, suggest a major part of the answer is experts use configural rules, when the impact of one variable depends on the values of other variables. For example, “look at X if Y and Z are positive.” Partly as a result, experts do not weigh the different cues independently and consistently. They zoom in on a small subset of cues, and treat each situation as if it is unique and special.

Why, then,  have experts at all? The answer is people are still indispensable to recognizing what matters.

One of the most famous things ever written in the field is Robyn Dawes 1979 paper, The Robust Beauty of Improper Linear Models in Decision-Making. He reviews the grim evidence on expert performance. But, he says,

.. people are important. The statistical model may integrate the information in an optimal manner, but it is always the individual (judge, clini- cian, subjects) who chooses variables. Moreover, it is the human judge who knows the directional relationship between the predictor variables and the criterion of interest, or who can code the variables in such a way that they have clear direc- tional relationships.

What are experts good at?

In summary, proper linear models work for a very simple reason. People are good at picking out the right predictor variables and at coding them in such a way that they have a conditionally monotone relationship with the criterion. People are bad at integrating information from diverse and incomparable sources. Proper linear models are good at such integration when the predictions have a conditionally monotone relationship to the criterion.

In fact, the Fed and other economic policy is a little more difficult than most other fields for using models, so it can’t just stand on its own.  One reason is . what Camerer and Johnson refer to as “broken leg” situations are more common in economic policy.  A statistical model predicting whether Fred will go to the movies tonight will fail compared to someone who knows that Fred actually broke his leg this morning for the first time ever. In addition, the relationships in policy decisions may sometimes not be conditionally monotone – they may “cross.”  And the amount of relevant statistical evidence of similar situations is often limited.

The upshot is the most important thing is to have better ways to recognize what matters – awareness – and better ways to weight and test your views. We have to think about how people think.

In fact,  many if not most people in markets do the opposite. They limit or distort the variables they look at, often by confining their attention just to easily quantified variables, or data which is reported by BLS or their Bloomberg terminal. And they fail to have any way to calibrate or test their views.

Expert judgment is important, but it’s important to be aware of where the potential blind spots are.  There are patterns of misperceptions. One of them is to watch out for too much focus on the cues limited to one situation, and configural rules. I’ll come to some others in the next post.


2017-05-11T17:32:51+00:00 September 26, 2013|Decisions, Expertise, Lens Model, Monetary Policy, Perception, Quants and Models|

Should you be replaced by an algorithm?

If a computer can do your job, you’re probably not going to keep it much longer. So if you’ve been following the eurozone crisis, the suggestion that an algorithm can make better predictions than you can about issues like Cyprus is not welcome news.

That is just what this recent David Brooks column in the NYT suggests. It deals with Philip Tetlock’s findings about experts and successful forecasts.  (I mention his earlier book in the motivating framework for Alucidate here. ) Tetlock finds “foxes”, who look at issues from multiple angles, do much better at prediction than “hedgehogs”, who prefer grand theories or single perspectives.

Tetlock has since run another tournament with the Intelligence Advanced Research Projects Agency. According to Brooks’ article,

In these discussions, hedgehogs disappeared and foxes prospered. That is, having grand theories about, say, the nature of modern China was not useful. Being able to look at a narrow question from many vantage points and quickly readjust the probabilities was tremendously useful.

The researchers also developed an algorithm to weigh results.

The Penn/Berkeley team also came up with an algorithm to weigh the best performers. Let’s say the top three forecasters all believe that the chances that Italy will stay in the euro zone are 0.7 (with 1 being a certainty it will and 0 being a certainty it won’t). If those three forecasters arrive at their judgments using different information and analysis, then the algorithm synthesizes their combined judgment into a 0.9. It makes the collective judgment more extreme.

This algorithm has been extremely good at predicting results. Tetlock has tried to use his own intuition to beat the algorithm but hasn’t succeeded.

So the algorithm does better than the experts at predicting the chances of things like a Cyprus or eurozone crisis.

In one sense, this is no surprise. Most people in the industry know even a simple average of the Blue Chip economic forecasts tends to perform better than any single forecaster. If you have a productive meeting of smart people, you most often get better outcomes than just one person alone.

In fact, it’s not a new issue either. We’ve known for over fifty years that simple mechanical algorithms can do better than skilled experts in some fields. Paul Meehl demonstrated in 1954 that algorithms outperformed skilled physicians and surgeons on diagnosis , as they can combine different information more reliably.

But that largely applies to situations which do not change. Human anatomy is much the same as it was fifty years ago. The functioning of the international financial system is not.

One of the marks of expertise is to notice what kinds of change and what kinds of data are relevant. It isn’t the information. It is the sense of what matters that is important.

And there is a counterpoint to this algorithmic theme: apparent sophistication and algorithms can also conceal gibberish, as I noted in this post the other day. If people become over-reliant on “black box” models, it can produce catastrophic results. Sophisticated credit scoring in mortgage origination arguably led to the near-collapse of the financial system.  The old saw is still true about computers; garbage in, garbage out. No algorithm can replace the need to think for yourself.

What this does underline is what people need, more than anything else, is not basic information. You don’t have any comparative advantage there. Information is commoditized, tweeted, automated, pervasive, and a computer algorithm can probably replace you in weighing it up. The few items of information that really are tradable still are also most likely classed as “material” and “nonpublic.” And by the time you hear about them they are likely priced in already in any case.

It is the perception of what matters in a situation and what is being overlooked that is essential, and which cannot easily be replaced by a program.  Once the situation is standardized and formulated, you’re superfluous. There is much more to say about this, and I’ll come back to it.