Archive for the ‘statistics’ tag
I’m a bit of connoisseur of this type of thing, and so I’m embarrassed that I just today found an utterly fantastic plain-English argument from Alex Tabarrok about why you should discount almost all news story about a really interesting new finding by scientists. (I’m a connoisseur of this kind of thing because of the number of intelligent people who seem to treat every new study about a wonder-substance or agent-of-death as meaningful.) These guidelines are a good summary:
1) In evaluating any study try to take into account the amount of background noise. That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.
2) Bigger samples are better. (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).
3) Small effects are to be distrusted.
4) Multiple sources and types of evidence are desirable.
5) Evaluate literatures not individual papers.
6) Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.
(via Tabarrok himself, in a shorter but good post about a specific study’s failure)
Another in the large pile of “most things about wine are bullshit” stories. This author did a statistical analysis:
Using descriptions of 3,000 bottles, ranging from $5 to $200 in price from an online aggregator of reviews, I first derived a weight for every word, based on the frequency with which it appeared on cheap versus expensive bottles. I then looked at the combination of words used for each bottle, and calculated the probability that the wine would fall into a given price range. The result was, essentially, a Bayesian classifier for wine.
(via more of what i like)
Timothy Snyder offers some new details on the age old question of “who was worse?” Doing the morbid calculus with new data leads to a result that turns the conventional wisdom — Hilter the eviler, Stalin the deadlier — on it’s head (I wouldn’t pull this but that I understand that not everyone likes to read NYRB articles):
All in all, the Germans deliberately killed about 11 million noncombatants, a figure that rises to more than 12 million if foreseeable deaths from deportation, hunger, and sentences in concentration camps are included. For the Soviets during the Stalin period, the analogous figures are approximately six million and nine million. These figures are of course subject to revision, but it is very unlikely that the consensus will change again as radically as it has since the opening of Eastern European archives in the 1990s.
Some scientific researchers are worried that the strength of experimental effects seem to decline over time. And I know science’s fallibility is something of an old saw around here, but until I see more smart people taking it seriously I doubt that will change. Jonah Lehrer’s conclusion pretty well captures what I want more people to realize:
We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.
The New York Times ran a story last week to warm a teetotalers heart:
No study, these critics say, has ever proved a causal relationship between moderate drinking and lower risk of death — only that the two often go together. It may be that moderate drinking is just something healthy people tend to do, not something that makes people healthy.
They ask us compare angles, but we tend to underestimate acute angles, overestimate obtuse angles, and take horizontally bisected angles as much larger than their vertical counterparts.
The piece’s thesis: that data visualization cannot save you, is certainly one worth taking to heart.
(via Idea of the Day)
This chart is impeccably executed.
Though you may find it depressing, this BuzzFeed post has amassed a very impressive collection of interesting charts about suicide (mostly in the United States).
The National Weather Service think they may have found a driver going 130 miles per hour around Chicago. Using a weather doppler. Who knew? As Gizmodo explains, It works something like this:
Sometimes, when a warm layer of air rolls in up above the surface, the beam from the Doppler radar can be deflected towards the ground—picking up traffic and other objects much like a police radar gun. The weather service alluded to the fact that the “speeder” could have been nothing more than noise, but it still makes you wonder how long it will be before they figure out how to bust motorists from space.
Psychology Today has a great article about the errors in reasoning that (vestigial) fear causes us to make. The ten:
- Risk and emotion are inseparable.
- Fear skews risk analysis in predictable ways.
- We underestimate threats that creep up on us.
- We prefer that which (we think) we can control.
- We substitute one risk for another.
- Using your cortex isn’t always smart.
- The “risk thermostat” varies widely.
- Risk arguments cannot be divorced from values.
- “Natural” risks are easier to accept.
- Worrying about risk is itself risky.
(via Lone Gunman)
In a simple chart, The Economist makes the interesting point that though the United States, Russia, and China fought fairly evenly for the most medals in 2004 — and most other games — it’s actually countries like The Bahamas, Australia, and Cuba that did the best per capita.
On April first, The Economist decided to teach it’s readers a special lesson about the power of unexpected parallels in statistics. Truly surprising.
Also of note, Mahalo Daily managed to land an interview with Steve Jobs.
Though I’m wary of most new organization playing with economics or statistics, this Reuters story qualifies for being both modestly interesting and completely plausible:
At U.S. warehouse club stores, a growing number of shoppers are giving up steak for cheaper chicken. Coffee sales are soaring at McDonald’s, while higher-priced Starbucks slows. Restaurants are serving fewer customers because more people are eating at home.
Stung by the housing slump, tightening credit terms, and rising inflation, U.S. households are finding ways to cut back, putting a damper on the consumer spending that is the driving force behind the economy.
The American has compiled some interesting data about the way we live today. What if found especially interesting, however, is this: in 1950 29% reported bathing once a day, 63% said less than that. In 1999, 75% reported bathing once a day, and on 21% said “less frequently.”
The New Yorker’s James Surowiecki argues that conflicting (and imprecise) headline numbers that come in America’s official job creation and unemployment numbers leads to both confusion and unexpected market moves. From this, some valuable wisdom comes:
As many studies have shown, people don’t have an intuitive understanding of things like margins of error and random sampling; they prefer to focus on a single number, even if it’s falsely precise, and so end up overemphasizing the report’s headline number.Investors are also subject to the so-called “salience bias”—high-profile information is weighted heavily even if it’s flawed. That’s why market moves in response to government reports are often surprisingly big—especially when, as now, they seem to substantiate investors’ worst fears. At this point, the market is locked in a hard-to-break feedback loop: the fact that traders act as if the jobs report were definitive makes it so. A little information can be a dangerous thing.