EA - Making better estimates with scarce information by Stan Pinsent

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Categorie:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Making better estimates with scarce information, published by Stan Pinsent on March 22, 2023 on The Effective Altruism Forum.TL;DRI explore the pros and cons of different approaches to estimation. In general I find that:interval estimates are stronger than point estimatesthe lognormal distribution is better for modelling unknowns than the normal distributionthe geometric mean is better than the arithmetic mean for building aggregate estimatesThese differences are only significant in situations of high uncertainty, characterised by a high ratio between confidence interval bounds. Otherwise, simpler approaches (point estimates & the arithmetic mean) are fine.SummaryI am chiefly interested in how we can make better estimates from very limited evidence. Estimation strategies are key to sanity-checks, cost-effectiveness analyses and forecasting.Speed and accuracy are important considerations when estimating, but so is legibility; we want our work to be easy to understand. This post explores which approaches are more accurate and when the increase in accuracy justifies the increase in complexity.My key findings are:Interval (or distribution) estimates are more accurate than point estimates because they capture more information. When dividing by an unknown of high variability (high ratio between confidence interval bounds) point estimates are significantly worse.It is typically better to model distributions as lognormal rather than normal. Both are similar in situations with low variability, but lognormal appears to better describe situations of high variability..The geometric mean is best for building aggregate estimates. It captures the positive skew typical of more variable distributions.In general, simple methods are fine while you are estimating quantities with low variability. The increased complexity of modelling distributions and using geometric means is only worthwhile when the unknown values are highly variable.Interval vs point estimatesIn this section we will find that for calculations involving division, interval estimates are more accurate than point estimates. The difference is most stark in situations of high uncertainty.Interval estimates, for which we give an interval within which we estimate the unknown value lies, capture more information than a point estimate (which is simply what we estimate the value to be). Interval estimates often include the probability that the value lies within our interval (confidence intervals) and sometimes specify the shape of the underlying distribution. In this post I treat interval estimates as distribution estimates as the same thing.Here I attempt to answer the following question: how much more accurate are interval estimates and when is the increased complexity worthwhile?Core examplesI will explore this through two examples which I will return to later in the post.Fuel Cost: The amount I will spend on fuel on my road trip in Florida next month. The abundance of information I have about fuel prices, the efficiency of my car and the length of my trip means I can use narrow confidence intervals to build an estimate.Inhabitable Planets: The number of planets in our galaxy with conditions that could harbour intelligent life. The lack of available information means I will use very wide confidence intervals.Point estimates are fine for multiplication, lossy for divisionLetâ€™s start with Fuel Cost. Using Squiggle (which uses lognormal distributions by default; see the next section for more on why), I enter 90% confidence intervals to build distributions for fuel cost per mile (USD per mile) and distance of my trip (miles). This gives me an expected fuel cost of 49.18USDWhat if I had used point estimates? I can check this by performing the same calculation using the expected values of each of the distrib...

Visit the podcast's native language site