I understand you may find it nonsensical to account for 312000 accusations which were not reported, but they are not fabricated.
Well, yes. If an accusation is not reported, it has not been made. It is not an accusation.
Former Player is right. This is nonsensical and impossible. The number of accusations are by definition the number of accusations reported to authorities. And the number of false allegations are the subset of that number that are shown to be false.
So the number of false allegations made in 2012 can be assumed to be around 4350, applying the 5% historical rate to the 87000.
Or the number could be a total of 312000 false accusations based on a 0.1% false positive rate that we calculated earlier, applied to the population. I agree with this math. If the FP rate is correct, this is the number of false allegations we should see.
But both of these numbers can´t be right. Which means we screwed up our math somewhere, and the false positive rate of 0.1% is wrong.
I was mulling it over last night, and I believe I know where we went wrong. The numbers for false positive rate that we calculated earlier are nonsensical because we changed units mid calculation. Let me explain.
positive negative
50 17 33 this part doesn't change
950 1 949 this changes and 1-PPV = 1/18 ~5%
18 982
But when you do the problem like this, you can derive the fp for the innocent group. 1/950,
To start, we took a population of 1000 men, and applied an assumed statistic of 5% have assaulted in their lifetime to come up with 50 men who have assaulted sometime in the past, and 950 who have never assaulted.
So far, so good.
But then we took that 50 and changed it to 50 assaults, in order to apply the assumed statistic that 2/3 assaults are not reported. And came up with 17 reported assaults and 33 non-reported assaults.
But the number of men who have ever assaulted, and the number of assaults in a year are not the same number. This is where we screwed up.
So the 949, the true negative number we calculated, is nonsensical because it was derived from nonsense inputs.
So, I´m going to try to find the false positive rate again, without making the same logical error.
Inputs: false allegations (1-ppv) of 5% and 2/3 of cases not reported.
positive negative
285 95 190
??? 5 ???
??? 100 ???
I don´t think we have enough information to finish the derivation. If there are 285 assaults, how many non-assaults do we have in the same time frame? How would we even define non-assaults? Number of consensual sexual encounters? Number of attempted sexual encounters that failed? Number of days that an assault didn´t happen, multiplied by the at-risk population?
Okay, try again with the 2012 population numbers.
Inputs: total reported cases 87000, false allegation rate of 5%, and male population of 156M (312M / 2), 95% of whom are non-rapists.
positive negative
7.8M 82650 ???
148.2M 4350 ???
156M 87000 ???
If the first column is total population of men, the second column would be total men accused in a year. The third column should thus be total men not accused this year, and can be calculated with simple subtraction.
positive negative
7.8M 82650 7.7M
148.2M 4350 148.2M
156M 87000 155.9M
So the false positive rate in this example is 4350 / 148.2M = 0.003%