**Update**The Engg professor said he's away and "other activities demand his attention at this time"

One math professor replied with some comments, let's take a look together.

First, what i sent to each of them:

* Although we have never met, I hope you don't find this email too abrupt. I decided to contact you because you are an expert in Bayesian statistics and computation, I am wondering if I could trouble you for a minute to help me wrap my head around this "statistical inference puzzle" I made up, because I think I fell into a logic pitfall somewhere along the way.*

Fundamentally this is an inference problem (I think), I will use a question I saw on mathisfun as the template for the set up. https://www.mathsisfun.com/data/probability-false-negatives-positives.html

Suppose you suspect that you are allergic to something, which 1% of the population does, you went to the clinic and did a test. The test turned out to be positive.

The test has a false positive rate of 10% and a false negative rate of 20%. What's the chance of you actually being allergic?

The standard 2x2 table would look like this

1% have it test yes test no

Have allergy 10 8 2

Don't have 990 99 891

1000 107 893

so, 107 people are test positive but only 8 people are actually allergic; even with a positive test, your chances of being actually allergic is only 7%, despite the test having only 20% false negative rate.

Now, I am going to apply the same method for "actually being a criminal". In this hypothetical case, the numbers are just an estimate with assumptions built in (let's assume they are appropriate), but that is not the main issue I am concerned about.

I am assuming 5% of the population are criminals. 5% as the false allegation rate. Assuming only about 1/3 of the crimes are reported, so I am using 2/3 as false negative rate.

The 2x2 table would look like this

5% accused not accused

Criminals 50 17 33

Not Criminals 950 48 902

1000 65 935

so, 65 people are accused but only 17 people are actual criminals; that's 26% chance of a person being actual criminal when accused of a crime, despite the false allegation rate being only 5%.

Now my questions are (if you find this interesting enough to answer): **Was my framing/formulation of the problem appropriate? What logic trap did I fall in which yielded this puzzling result? Under what circumstances can I or can't I set up the table like this?**--------------------------------------------------------------------

His reply (whom we will call Math Professor #1 in case the other math professor replies later) Bold added.

*"I really haven't time for a long discussion about this, so I will just make a few comments. *

**The situations are really essentially the same in the two examples.** The numbers are different, but both illustrate the same sort of application of Bayes' Theorem. The takeaway is that conditional probability calculations (though perfectly logical) are often counter-intuitive and often seem to go against (most people's) common sense.

**As far as I could see (from a quick glance), your calculations are correct (up to rounding error). You framed it fine. There is no logic trap.** I think you are just confusing / mixing up two different conditional probability statements.

**The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other. **

In the first example, the ordering was one way. In the second example, it was reversed.

If you want to read more, I suppose the main related ideas for you to look into are conditional probability and Bayes' Theorem.

I have not read it myself, but this book has many favourable reviews:

https://www.amazon.com/Theory-That-Would-Not-Die/dp/0300188226**So no, I was not wrong, my framing was correct.** Let "The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other. " for a bit.

what it means: False allegation rate is 5%; the person accused actually guilty 26%. They are internally consistent and tell the SAME story.

These numbers are also

*consistent* when we look at the observable such as the low rape conviction rate: 4-8%. For the longest time people are wondering how could the conviction rate be that low when the false allegation rate is only 5%, the law enforcement and the justice system must all be rotten. To go from 95% to 4%! Unthinkable!

But no,

*that's not the case*. conviction rate = (convicted cases / ALL cases, weather charged or not), so at best the conviction rate should not exceed the being guilty rate of 26%. If we were omniscient every reported rape would result in a conviction, but sadly we are not.

Out of this 26% not every case can be prosecuted (i am using 30%,

2/6) , and out of every case that actually goes on trial, of these, only some result in conviction (

50%).

What's 26/3/2? Lo and behold: ~4%.

Yes, to go from 26% to 4% is STILL BAD and SHOULD BE IMPROVED. But again, this tells the SAME story just like the 26% and 5%. What I presented to you are indeed counter intuitive, but it is internally consistent and perfectly logical.