UpdateThe Engg professor said he's away and "other activities demand his attention at this time"
One math professor replied with some comments, let's take a look together.
First, what i sent to each of them:
Although we have never met, I hope you don't find this email too abrupt. I decided to contact you because you are an expert in Bayesian statistics and computation, I am wondering if I could trouble you for a minute to help me wrap my head around this "statistical inference puzzle" I made up, because I think I fell into a logic pitfall somewhere along the way.
Fundamentally this is an inference problem (I think), I will use a question I saw on mathisfun as the template for the set up. https://www.mathsisfun.com/data/probability-false-negatives-positives.html
Suppose you suspect that you are allergic to something, which 1% of the population does, you went to the clinic and did a test. The test turned out to be positive.
The test has a false positive rate of 10% and a false negative rate of 20%. What's the chance of you actually being allergic?
The standard 2x2 table would look like this
1% have it test yes test no
Have allergy 10 8 2
Don't have 990 99 891
1000 107 893
so, 107 people are test positive but only 8 people are actually allergic; even with a positive test, your chances of being actually allergic is only 7%, despite the test having only 20% false negative rate.
Now, I am going to apply the same method for "actually being a criminal". In this hypothetical case, the numbers are just an estimate with assumptions built in (let's assume they are appropriate), but that is not the main issue I am concerned about.
I am assuming 5% of the population are criminals. 5% as the false allegation rate. Assuming only about 1/3 of the crimes are reported, so I am using 2/3 as false negative rate.
The 2x2 table would look like this
5% accused not accused
Criminals 50 17 33
Not Criminals 950 48 902
1000 65 935
so, 65 people are accused but only 17 people are actual criminals; that's 26% chance of a person being actual criminal when accused of a crime, despite the false allegation rate being only 5%.
Now my questions are (if you find this interesting enough to answer): Was my framing/formulation of the problem appropriate? What logic trap did I fall in which yielded this puzzling result? Under what circumstances can I or can't I set up the table like this?--------------------------------------------------------------------
His reply (whom we will call Math Professor #1 in case the other math professor replies later) Bold added.
"I really haven't time for a long discussion about this, so I will just make a few comments.
The situations are really essentially the same in the two examples. The numbers are different, but both illustrate the same sort of application of Bayes' Theorem. The takeaway is that conditional probability calculations (though perfectly logical) are often counter-intuitive and often seem to go against (most people's) common sense.
As far as I could see (from a quick glance), your calculations are correct (up to rounding error). You framed it fine. There is no logic trap. I think you are just confusing / mixing up two different conditional probability statements.
The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other.
In the first example, the ordering was one way. In the second example, it was reversed.
If you want to read more, I suppose the main related ideas for you to look into are conditional probability and Bayes' Theorem.
I have not read it myself, but this book has many favourable reviews:
https://www.amazon.com/Theory-That-Would-Not-Die/dp/0300188226So no, I was not wrong, my framing was correct. Let "The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other. " for a bit.
what it means: False allegation rate is 5%; the person accused actually guilty 26%. They are internally consistent and tell the SAME story.
These numbers are also
consistent when we look at the observable such as the low rape conviction rate: 4-8%. For the longest time people are wondering how could the conviction rate be that low when the false allegation rate is only 5%, the law enforcement and the justice system must all be rotten. To go from 95% to 4%! Unthinkable!
But no,
that's not the case. conviction rate = (convicted cases / ALL cases, weather charged or not), so at best the conviction rate should not exceed the being guilty rate of 26%. If we were omniscient every reported rape would result in a conviction, but sadly we are not.
Out of this 26% not every case can be prosecuted (i am using 30%,
2/6) , and out of every case that actually goes on trial, of these, only some result in conviction (
50%).
What's 26/3/2? Lo and behold: ~4%.
Yes, to go from 26% to 4% is STILL BAD and SHOULD BE IMPROVED. But again, this tells the SAME story just like the 26% and 5%. What I presented to you are indeed counter intuitive, but it is internally consistent and perfectly logical.