I have to confess that I've never firmly understood life expectancy statistics. Razib has been putting up graphs on GNXP of life expectancy by county, such as this one of white male life expectancy (highest in southern Minnesota, lowest in the South, Oklahoma, and the Coal Belt). The maps are fun to look at, but when trying to make sense out of them, I realized that I don't understand the basics of what a life expectancy number even means. For example, where are people placed -- where they were born, where they lived most of their lives, where they died?

Or, say, there are two rural counties that are very similar but one has an old folks home and the other doesn't, so many of the old people in the two-county region end up dying in just one of the counties. What impact does that have on life expectancy statistics by county?

And then a reader sent me a simple question:

If you are less baffled by these matters than me, please comment below.

Or, say, there are two rural counties that are very similar but one has an old folks home and the other doesn't, so many of the old people in the two-county region end up dying in just one of the counties. What impact does that have on life expectancy statistics by county?

And then a reader sent me a simple question:

I have a numeracy question for you because I don't know anyone else to ask. I have been thinking about life expectancy which I can easily determine for myself and for my wife. I know that she will probably outlive me, but how do I compute the probability that she will do so? (Yes, I know that it is strictly incorrect to assign a probability to a single event, but if we had 100 similar mes and 100 similar hers, then what?)

If you are less baffled by these matters than me, please comment below.

My published articles are archived at iSteve.com -- Steve Sailer

## 51 comments:

For starters, see the IRS's life expectancy tables on p. 94 of this PDF.

I had a similar thought about Japanese life expectancy -- calculated as the highest in the world. People smoke like chimneys over here so it seems like there would be a lot of youthful cancer deaths. On the other hand some people end up living well past 100, they must be screwing up the average. If you control for the extremely long lived, what does their life expectancy come out to?

"Third: You can use Amazon. Just click here"

That link does not work for me, you might want to test it yourself.

In regards to the question:

The person states he can estimate the life expectancy for himself and his wife.

Assume that this is the mean of a normal distribution. You'll have to find the standard deviation as well. It'll probably be around 15 years.

So this means, for a million yous and a million hers, the dataset of their lifespans will approximately follow a normal distribution with an average that you estimated for both you and your wife.

Then, simply subtract the two normal distributions. This results in a third normal distribution with a mean equal to the difference in the means between you and your wife. The standard deviation of this new normal distribution is sqrt(2)*(estimated standard deviation whhich I said was about 15). Then, use normdist to figure out the probability this new normal distribution falls below zero.

Using the average life expectancy data for white females and males (80.1,75.3):

http://www.newjersey.gov/health/chs/lifexp/index.html

and assuming a standard deviation of 15 for both men and women,

I estimate there's a 60% chance your wife outlives you.

Of course, this is contingent on the lifespan data actually following a normal distribution. It probably follows it well, but of course there would be skewing towards more deaths to the right than to the left (e.g. old people die more than young).

Here's a website showing a statistical analysis, but it doesn't have data for a female distribution so can't use it for this question.

http://people.hofstra.edu/Stefan_Waner/RealWorld/cprob/cprob4.html

http://gravityandlevity.wordpress.com/2009/07/08/your-body-wasnt-built-to-last-a-lesson-from-human-mortality-rates/

not a direct answer, but related and interesting

http://www.soa.org/research/pension/research-simple-life-calculator.aspx

The above links to a page with an Excel spreadsheet which will do some calculations for you -- life expectancy and various survival percentiles.

I wrote the spreadsheet, so let me know if you have any questions.

Oh, and there are loads and loads of mortality tables to be found here:

http://xtbml.soa.org:8080/xtbml/jsp/index.jsp

and here:

http://www.mortality.org/

There are lots of different populations/uses for these tables, and you need to be careful in your interpretation.

Life is a bucket which has a hole in its base so that the level of water in the bucket slowly falls over time until it drains.

Emptying the bucket can be put off by adding water to the system from external sources. You could top it up from a tap or another bucket of water.

Whatever you choose to do, delay is more effective when the bucket is full because rate of drainage from the bucket is inversely proportional to the volume of water in the bucket.

When it's full, drainage is a long way off, when it's nearly empty complete drainage of the bucket is hard to stop.

If two buckets of water with the same sized holes are filled to the brim what will determine which bucket empties first? This will be external inputs - the rate of feeding from the tap and other environmental influences.

Not all buckets come identical from the manufacturer. Some have bigger holes. Buckets also can spring leaks in addition to the one they start with.

The point I want to make is does prediction and questions of probability become easier if life is modeled like a leaky bucket?

Testosterone determines the basic size of the hole in the bucket. Wealth determines the basic rate of feeding from the tap.

To this basic model of life expectancy would be added variable inputs into the bucket or leak-springness factors: e.g. genetic propensity to disease, climate, neighbourhood, social status, IQ, education, care infrastructure availability.

To compare the probability for two individuals, run the model 1000 times with 1000 slightly randomised starting data for each. If there is anything to it you would get an "ensembles" signature distinct to the husband and wife.

The number usually given as "life expectancy" is life expectancy at birth. Because new-borns have a rather high death rate, once you get to be a year or so old, your life exectancy FROM THAT AGE actually increases. Once you get into your twenties, it starts going down.

If your writer wants to know how to determine whether his wife will out live him, he can go to the tables David cited.

I read the paper here:

http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0030260

To answer one of your questions: "The researchers used figures from the US Census Bureau and the National Center for Health Statistics to calculate mortality (death) rates for the years 1982–2001. They took note of the county of residence and of the race of all the people who died during that period of time."

So they're going by county of residence, not county of birth.

The page is relatively easy for the non-expert to read. They made certain adjustments, but I don't think that would necessarily have made a huge impact.

I believe there has been a study done about how far people end up from their birthplace; from what I understand a lot of people really do stay put, relatively.

to your reader's question (I am not an actuary):

this entails a comparison between the left truncated distributions for men and women. So if you were 40 then the mens distribution would start at 40 on the left (and if your wife were 35, the womens distn would start at 35).

Its safe to assume that data will be discrete (e.g. a percentage value for each age beyond current age). Lets say that the distributions values go till age 100, in steps of 1 year. So there are 66 values for women and 61 values from the men distribution. Rank these 127 values from least to highest as a fractional rank (0 is the min and 1 is the max) or percentile (0 to 100).

Then the probability that you want (of your wife outlasting you) should be

Mean(women values) - Mean (men values) + 0.5 {0.5 is a tossup and happens if the 2 means are identical} If you were using percentiles, then the difference is added by 50.

I forgot one pre-processing step. that is normalize each left truncated distribution first. So if the total number of men 40 or above is 70 percent, then divide the values in the men distribution by 0.7; And if the percentage of women at or above 35 is 85, divide the women values by 0.85. The ranking is done after both distributions are normalized (so that all values within each add up to 1 {or 100, if percentiles are used} )

People smoke like chimneys over here so it seems like there would be a lot of youthful cancer deaths.I'm on day whatever of cold turkey. 14 or something. I went looking for a way to calculate my odds of cancer to motivate myself to quit.

Worst idea ever. The calculator I found wouldn't let me enter an age under 50 or so, wouldn't let me enter a "years as a smoker" under 30 or so, etc. When you enter the minimum stats, you get a 1% chance of lung cancer. Basically you have to be old to die of lung cancer from smoking.

(Sorry, I thought I'd saved the link but I can't find it)

Maybe I should go look for the emphysema calculator.

~Svigor

A clarification/correction. The ranking can be done as follows.

1. Obtain left truncated life expectancy distributions for men and women (based on current age for yourself and your wife).

2. As these are histograms, normalize them by the total percentage of men (women) in the respective left truncated distributions.

3. Generate observations for each age (for men and women) in proportion to their normalized percentages. So if 5% of men (who reach 40), die at 50, then you can generate 5 values of 50 for men. So you will have 100 ages for men and 100 ages for women (if you want more precision as the percentage values for each age may have 2 or more digits of precision you can generate 10000 ages for each distribution to capture small differences)

4. Combine the men and women age of death values (say 20000 in all) and rank them (in percentiles)

5. Take the difference between the mean female age percentile rank and the mean male age percentile rank. Add 50 to this number to get the probability (in percent) of your wife outlasting you.

a further clarification: in the step when the 10000 ages per group are generated, convert it to years from present (so subtract 35 from the values generated for women and 40 from the values generated for men) so that the comparison is between expected years to live from present.

Feeling mortal Steve?

Like educational performance, life expectancy statistics are a pretty blunt tool. I'm doubtful there's much individually meaningful information predicting your life span from location alone.

Lumping all white males together in a single statistic makes about as much sense as lumping the academic performance of WV hillbillies with Exeter prep school boys.

Life expectancy calculators take into account some genetic and behavioral factors like family history of heart disease/stroke and smoking/obesity/exercise. I wonder how much info is left out by leaving out zip code, IQ and SES (social economic status). Some calcs try to use IQ by proxy of highest level of education.

Let's say you can reasonably estimate the number of people living in a county at every age on a given date. Also let's say that you know how many people die in that county at every age in a given year (and can assume that everybody dies in the county they live in, or at least that the deviations from this are random).

Then you can find the death rate (and the survival rate, i.e. one minus the death rate) for every age.

Then by chaining those survival rates together you can find the probability that a person now born will live to any age, say 60. Note that this doesn't apply to any actual specific person; instead, it applies to a hypothetical construct who lives the year of age 0 in Dallas County, TX, in 2007; and also lives the year of age 1 in Dallas County, TX, in 2007; and also lives the year of age 59 in Dallas County, TX, in 2007.

From that, you can say the probability of living n years and then dying before age n+1, for all n. On average people dying in that year have lived roughly n+0.5 years (except for the first year of life where deaths are concentrated toward the start of the year, but let's keep it simple by ignoring this).

To use an example with unrealistically few numbers, let's say that nobody dies until age 40, then 30% of people die before age 41, then the survivors live to age 80 and then all die before age 81. In that case the life expectancy would be

30% * 40.5 + 70% * 80.5

Real calculations have more numbers but that is the principle.

Life expectancy is one statistic you can derive from a life table.

A life table is a table showing the probability of death (or survival) for one more year given your age (or given your age, sex, race, etc). Life tables for a particular place (country, city, county, whatever) are calculated in the obvious way: probability of death for 65-year-old guys is # of such guys residing in the place who died / # of such guys residing in the place, each in the relevant index year.

To calculate life expectancy, you just assume that the death/survival rates in the life table for whatever index year you are using will continue to be correct for the rest of the life of the person you are calculating for. So, life expectancy at birth using a 2005 life table would be the average age at death of a bunch of people whose age-specific mortality rates are given by the 2005 life table and who are assumed to be alive at birth.

Thus, life expectancy is not really a measure of how long a person can be expected to live, rather it is a measure of current age adjusted mortality. Your "true" expected further lifespan is (on average) longer than your life expectancy, as long as medical progress is positive.

The calculation Steve's reader requested is also easy, given the life tables for a man and a woman.

Here is a link on life tables:

http://www.cdc.gov/nchs/products/life_tables.htm

Oh, on your specific questions, I am pretty sure that people are assigned the location in which they were living when they died --- which is almost but not quite where they died. If they cross a county line to go to the ER and die there (or die on vacation), they get assigned to the county where they reside, subject to the vagaries of death records.

So, if old people all die on vacation in Houston or go to Houston for treatment right before they die, that does not screw up Houston's life table. If sick old people all *move* to Houston right before they die, that does screw up Houston's life table.

The linked paper gives an overview of the construction. They used NCHS data on mortality, which had county and race information attached to get the death information and the census to estimate the starting population in each age group. Technical information is available in Method for Constructing

Complete Annual

U.S. Life Tables

So, to answer your question: its the county of death that goes into the calculation. There does not appear to be any adjustment for old age homes.

Sure, we can get a useful number by ignoring many of the traits that characterize the particular people of concern here, but current age, in addition to gender, is probably not among them. Anyhow, upon finding a satisfactorily applicable pair of average lifespans, I guess you might plug them into the density function listed here

http://en.wikipedia.org/wiki/Skellam_distribution

and integrate it over the positive real line using some kind of numerical technique (maybe you can find a way to get WolframAlpha to do that for you). But, honestly, that's just a wild guess.

For starters, it is perfectly legitimate to assign a probability to a single event. If you want to think about it as if multiple events could occur, that is a reasonable mental exercise to help you understand probability (and one that I find helps many of my students), but it is not necessary.

Whoops. Seems that that Skellam density applies only if the two parties concerned are approximately independent of one another, which is obviously not the case when they are partners.

Wikipedia gives a pretty good definition:

http://en.wikipedia.org/wiki/Life_expectancy#Calculating_life

I think they use mortality and population data to calculate the probability of being alive at age X and then being dead at age X + 1 using a life table (http://en.wikipedia.org/wiki/Life_table) and then add these up.

So it does not matter for the calculations where you live only where you die.

a

You can approximate some of these answers with statistical distributions, but I believe insurers generally rely on standard mortality tables.

Here are some fun programs that allow you to answer some of your questions

http://demonstrations.wolfram.com/search.html?query=insurance&start=1&limit=40

In particular

http://demonstrations.wolfram.com/The2001CSOMortalityTables/

http://demonstrations.wolfram.com/LifeExpectancyInTheUSPopulation/

http://demonstrations.wolfram.com/LifeInsurancePricing/

You will be prompted to download the free Mathematica player to run these

http://www.wolfram.com/products/player/download.cgi

A little off-topic, but I have been told that if you throw out African-Americans and Aboriginal Hispanics, then e.g. the [normal old-fashioned blue-blooded Caucasian] American survival rate for cancer just dwarfs that of the average European [or Canadian or Japanese or Korean or anyone else].

I.e. if you have the choice between being treated for a medical condition in a socialized medicine country versus fleeing to [what's left of] free-market capitalistic medicine in the good ol' US of A, then it pays to flee as fast as your feet will take you.

[But don't worry, Rahm & David & Chuckie and Barnie will see to it that we lose even that most meager of luxuries.]

I'm no expert here, but I've always wondered about the robustness of these data. For example, I once lived in a large city with a fair number of hospitals. Using Medicare data, the feds published the crude mortality rates for all the hospitals. One hospital stood out as much higher than all the others. I happened to have staff privileges at that hospital and knew it to be very high quality. It turned out that this hospital had the city's only hospice unit, and all the hospice deaths were rolled into the hospital's numbers. The hospital was outraged and called on the feds to redo the numbers, but they seemed baffled by the whole thing and, of course, did nothing. So the hospital paid a private firm to recalculate the mortality data and found that its death rate was actually lower that the city average. I wonder how much of this stuff really goes on?

Steve, I noticed that you have never written about the Singularity, which will supposedly lead to radical life-extension. Theoretically, it is possible; however, I am not as optimistic as some (it may take more than 34-40 years).

I think life expectancy means something like the following.

In a large population, you see what fraction of those who turn, say, 47 are still alive to turn 48.

That observed fraction is defined to be the probability of survival from the 47th birthday until the 48th birthday for a person in that population. Likewise for other ages.

The probability of death at age 59 for a newborn is then calculated by multiplying all the survival probabilities for ages 0, 1, 2, ...,58 times the non-survival probability for age 59.

Life expectancy (for that population) would them be the average age of death for those calculated probabilities for age of death.

So, if in two populations, the fraction dying is the same for both populations at every age, the life expectancies would be equal, even if the age distributions are not equal.

Real actuaries may do something a bit more refined, but I'm pretty sure that's the basic idea.

Life expectancy stuff is often annoying. For example, speaking colloquially, people often say,"Life expectancy in 18th century England was 37," or even "The average person in 18th century England was dead at 37," or some such thing, but if you read 18th century books or books written about the 18th century, you see there are lots and lots of old people running around. The high infant and child mortality rate of the 18th century lowers the average for everybody, but if an 18th century European makes it to his teens, he will likely live about as long as a 20th century European.

http://www.psychologytoday.com/blog/the-scientific-fundamentalist/200811/common-misconceptions-about-science-ii-life-expectancy

Social Security Administration publishes tables of life expectancy at different ages. So you can see life expectancy for people who have already reached a certain age.

http://www.ssa.gov/OACT/STATS/table4c6.html

Similar tables can be found elsewhere as well.

I am an actuary. Life expectancies are based on something called mortality tables, which are tabulated probabilities of dying between your last and your next birthday at any given age. These tables are calculated for various segments of population. A whole bunch of them can be downloaded from here: http://www.soa.org/professional-interests/technology/tech-table-manager.aspx

The life expectancy at a given age, say 50, is the probability of surviving to your 51st birthday, plus the probability of surviving to your 52nd birthday, and so on. When they say simply "life expectancy" without mentioning the age they usually mean life expectancy at birth, i.e. age zero.

If two counties in your example are governed by the same mortality table, their life expectancy would be the identical, no matter the age distribution.

The reader might want to compare the life spans of his parents to that of his wife's parents for more data to predict his vs. hers lifespan. Long-livedness does tend to run in families. But of course anything can happen in any one individual case.

I'll assume you have a mean and variance for both your age at death and hers. Assuming gaussian distributions for both, the joint distribution is bivariate normal (a 2-d gaussian). Integrate this distribution over the region above (or below) the line x=y. That's the probability you (or she) will outlive the other.

And it's only technically incorrect to consider the probability of a single event if you took statistics from a math professor. Most other fields which use probability have moved past this non-issue.

Take the death rate for the relevant group at each age, assume it remains constant forever, and calculate the expected value of the lifespan of a person born today. You can also use a recursive formula -- life expectancy of an N-year-old today is approximately (p+((1-p)/2))*(1+X) where X is the life expectancy of an (N+1)-year-old today and p is the probability that an N-year old will survive the next year. (The (1-p)/2 term is because even the ones who won't survive the year will live an average of 6 months more.)

http://en.wikipedia.org/wiki/Actuarial_notation

joint life contingency:

http://www.utstat.utoronto.ca/~sheldon/Exam-M-temp.pdf

http://www.business.uiuc.edu/ormir/JRI%20Dec%202000.pdf

Search on: joint life expectancy table.

The relevant technique is something called cohort survival. It requires a bit of matrix algebra. I studied it in grad school but never used it professionally. I taught statistics for several years but never taught matrix manipulations.

All of which is to explain (excuse ?) the fact that although I know it's the appropriate technique, I can't for the life of me remember how it works.

I can help.

Life expectancy is just the average time of death of a population, whatever that population is defined as. So regional life expectancy would only take into account where each person dies, not how long they have lived there or where they came from.

As for your reader's question:

First he needs to determine the probability distribution of time of death random variable he is going to use.

He could use a normal distribution, or a uniform distribution (De Moivre's law), but it would probably be easiest to use use an exponential distribution.

Here are the variables:

X = time of death of the man.

Y = time of death of female.

E[X] = life expectancy of man.

E[Y] = life expectancy of female.

Here is the algorithm:

The joint probability density function is going to be (using Excel notation)....

f(X,Y) = EXP[-(X/E[X] + Y/E[Y])]/(E[X]*E[Y])

All that needs to be done at this point is a double integral over the probability space where Y is greater then X. I would integrate the density function in this order, with these arguments:

1) X from 0 to Y

2) Y from zero to infinity

This method assumes independent lives but given the information available it is a pretty good assumption. Anyway, hopefully that helpful in addressing any of your questions.

The question of finding the probability of the wife outliving the husband doesn't seem to have a simple answer. This link gives an integral that is equal to this probablility:

http://books.google.ca/books?id=ny2MJU4A3DIC&pg=PA467&lpg=PA467&dq=how+to+find+probability+that+husband+lives+longer+than+wife&source=bl&ots=NkI0yfe4xw&sig=IjyxuUhhHhTSfjcYfZXphDi1ePA&hl=en&ei=BeZ5SruuLJCoswOAs8jlBA&sa=X&oi=book_result&ct=result&resnum=2#v=onepage&q=&f=false

The functions are defined on the previous (page 466). Sorry it is such a long URL.

Blessings you deserve for the confession of your "sin" of ignorance re life expectancy, something I don't understand either. Everyone's ignorant of something, but most prefer to feign omniscience.

Nietzsche dreamed of a culture where young intellectuals were taught: "above all, do not

pretend!"I had a history prof for a grad/undergrad course - a good man who I otherwise wanted to like - who told us to "fake it til you make it." What

possibledefense of this could there be?The question of finding the probability of the wife outliving the husband doesn't seem to have a simple answer.Yes, because you're talking about taking two curves (or the tails of two curves) and then doing subtraction(s) of their area (and then looking at a ratio). You can't do it without integrating, and any formula is going to be an approximation as the curves don't follow a simple mathematical shape. For example neither the poisson nor the normal distribution is applicable.

If actuarial data are available, I dont see the need for messing around with theoretical distributions at all. One must simply be willing to apply the truncated male and female life expectancy distribution (from the present age of husband and wife respectively) and calculate the dominance of the wife distribution over the husband distribution (assuming wife is younger). The calculations require nothing more than simple algebra and perhaps excel/spss (it is functionally the equivalent of the double integral for 2 continuous overlapping normal or other distributions that others are talking about here) which I made in a post here (but it hasnt appeared yet).

Oh, and feel free to unlock my spreadsheet to see how I did the calculations.

If you're a White American male be/live near Scandinavians, Canadians and/or Jews. In a pinch, Aleuts, Polynesians or (gasp) Hispanics will do.

Maybe the 'always fighting' Scots-Irish are doing too much always fighting?

Gold dust not coal dust?

Violins not fiddles?

Ha-Ha not Hee-Haw?

I am the one who posed the question to Steve, so I'll begin by thanking all of you above for your kind suggestions. Some were over my head, so maybe they are better than my own idea as follows:

I looked at the CDC website and found the life expectancy for a white male of my age (62). Using another table at that site I calculated the percent of white females of my wife's age (55) expected to live beyond the span given for me (20). As the result seemed intuitively plausible I accepted it. It looks as though there is a 78% probability that she'll be there to put a flower on my grave and only a 22% probability I'll be burying her - a fringe benefit of marrying a woman seven years younger.

BUT if you experts think I've gone about this all wrong, please blast away. (Comments that I should have married an even younger woman will be deemed hostile.)

"I.e. if you have the choice between being treated for a medical condition in a socialized medicine country versus fleeing to [what's left of] free-market capitalistic medicine in the good ol' US of A, then it pays to flee as fast as your feet will take you."

Rich Japanese fly to the USA for treatment for anything complicated. The Japanese system is good at prevention and care for small things but terrible at treatment of serious illnesses. That is one of the conundrums of socialized medicine vs the US system.

Man age 62. Wife age 55. Assume max age for man is 100.

Sum over y from 62 to 100:

probability man lives to age y and dies by age y+1

* prob wife lives after age

55+ (y-62)+1.

This still is an approximation but is close enough for now.

To get probability to die between year y and y+1 for a man from a table of prob to survive after age y, take difference of each succeeding element of prior element. If you can find the data in a csv format you can download it and do the calculations.

In something close to actuarial notation:

sum[y:62 to 100] qm[y] pw[y-62+55]

qm[y] is prob man lives to age y and dies by age y+1

pw[z] is prob woman lives after age z = y-62+55+1

You can replace 100 by max age man.

A few years ago in Berkeley I ran into a black guy I'd played basketball with regularly in the 90's. He said, "I'm surprised you're still alive." He was referring to my habit of speaking to black guys sans the usual deference.

I didn't try to wade through all of the responses but here is an answer to the wife vs. husband life expectancy question as supplied by my friend the actuary (BTW, for what it is worth, this guy is an actuaries actuary who knows his stuff in spades):

Oh, ok. Here's a general idea of how to calculate the odds that wife will die first, only slightly simplified. You break it up into pieces, or individiual years, then add up the totals. We have estimated probabilities of death for each person for each age from their current ages and the rest of their lives. Calculate the chance that wife will next in the next year and husband will live. Put that aside. Calculate the chance that both of them will survive the next year, but wife will die in the following year and husband will survive. Put that aside. Calculate the chance that both of them will survive the next 2 years, but wife will die in the following year and husband will survive. Put that aside. You keep going until the probabilities of survival are so small that they are negligible. Then you add up all the "Put that asides" This ignores the case where both of them die in the same year, but that's just a refinement. Now that I think about it, maybe this is good enough for Steve Sailer, so send it to him if you like."Post a Comment