The Money Mustache Community

Other => Off Topic => Topic started by: anisotropy on October 04, 2018, 01:50:58 PM

Title: Statistics update
Post by: anisotropy on October 04, 2018, 01:50:58 PM
Hi Sol (and all others interested),

I have reached out to three professors at the local college whose main research interests are in Bayesian statistics and computations and simulations (2 math and 1 engg) to see if I had made a logical error when I formulated the inference problem.

Note I am effectively cold-calling them, or in this case, cold-emailing. I will be around for another two-three weeks until I move on to my next FIRE destination, where I will have limited connections of any kind. But if they don't get back to me within a week they prob can't be bothered to help.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 11:49:18 AM
Update

The Engg professor said he's away and "other activities demand his attention at this time"

One math professor replied with some comments, let's take a look together.

First, what i sent to each of them:

Although we have never met, I hope you don't find this email too abrupt. I decided to contact you because you are an expert in Bayesian statistics and computation, I am wondering if I could trouble you for a minute to help me wrap my head around this "statistical inference puzzle" I made up, because I think I fell into a logic pitfall somewhere along the way.

Fundamentally this is an inference problem (I think), I will use a question I saw on mathisfun as the template for the set up.  https://www.mathsisfun.com/data/probability-false-negatives-positives.html

Suppose you suspect that you are allergic to something, which 1% of the population does, you went to the clinic and did a test. The test turned out to be positive.
The test has a false positive rate of 10% and a false negative rate of 20%. What's the chance of you actually being allergic?

The standard 2x2 table would look like this

                              1% have it                      test yes                          test no
Have allergy                10                                 8                                      2
Don't have                 990                                99                                   891
                                1000                              107                                  893

so, 107 people are test positive but only 8 people are actually allergic; even with a positive test, your chances of being actually allergic is only 7%, despite the test having only 20% false negative rate.

Now, I am going to apply the same method for "actually being a criminal". In this hypothetical case, the numbers are just an estimate with assumptions built in (let's assume they are appropriate), but that is not the main issue I am concerned about.
I am assuming 5% of the population are criminals. 5% as the false allegation rate. Assuming only about 1/3 of the crimes are reported, so I am using 2/3 as false negative rate.

The 2x2 table would look like this

                                  5%                             accused                          not accused
Criminals                    50                                  17                                   33
Not Criminals             950                                48                                  902
                                 1000                                65                                  935

so, 65 people are accused but only 17 people are actual criminals; that's 26% chance of a person being actual criminal when accused of a crime, despite the false allegation rate being only 5%.

Now my questions are (if you find this interesting enough to answer): Was my framing/formulation of the problem appropriate? What logic trap did I fall in which yielded this puzzling result? Under what circumstances can I or can't I set up the table like this?


--------------------------------------------------------------------

His reply (whom we will call Math Professor #1 in case the other math professor replies later) Bold added.

"I really haven't time for a long discussion about this, so I will just make a few comments.

The situations are really essentially the same in the two examples. The numbers are different, but both illustrate the same sort of application of Bayes' Theorem. The takeaway is that conditional probability calculations (though perfectly logical) are often counter-intuitive and often seem to go against (most people's) common sense.

As far as I could see (from a quick glance), your calculations are correct (up to rounding error). You framed it fine. There is no logic trap. I think you are just confusing / mixing up two different conditional probability statements.

The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other.

In the first example, the ordering was one way. In the second example, it was reversed.

If you want to read more, I suppose the main related ideas for you to look into are conditional probability and Bayes' Theorem.

I have not read it myself, but this book has many favourable reviews:

https://www.amazon.com/Theory-That-Would-Not-Die/dp/0300188226


So no, I was not wrong, my framing was correct. Let "The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other. " for a bit.

what it means: False allegation rate is 5%; the person accused actually guilty 26%. They are internally consistent and tell the SAME story.

These numbers are also consistent when we look at the observable such as the low rape conviction rate: 4-8%. For the longest time people are wondering how could the conviction rate be that low when the false allegation rate is only 5%, the law enforcement and the justice system must all be rotten. To go from 95% to 4%! Unthinkable!

But no, that's not the case. conviction rate = (convicted cases / ALL cases, weather charged or not), so at best the conviction rate should not exceed the being guilty rate of 26%. If we were omniscient every reported rape would result in a conviction, but sadly we are not.

Out of this 26% not every case can be prosecuted (i am using 30%, 2/6 (http://webarchive.nationalarchives.gov.uk/20100408125722/http://www.homeoffice.gov.uk/rds/pdfs05/hors293.pdf)) , and out of every case that actually goes on trial, of these, only some result in conviction (50% (https://www150.statcan.gc.ca/n1/pub/85-002-x/2017001/article/54870-eng.htm)).

What's 26/3/2? Lo and behold: ~4%.

Yes, to go from 26% to 4% is STILL BAD and SHOULD BE IMPROVED. But again, this tells the SAME story just like the 26% and 5%. What I presented to you are indeed counter intuitive, but it is internally consistent and perfectly logical.
Title: Re: Statistics update
Post by: Davnasty on October 05, 2018, 12:16:51 PM
1% of the entire population is allergic to x. 10% of the entire population receives a false negative when tested for allergies to x.

5% of the entire male population is a rapist. 5% of the subset of the male population accused of being a rapist is proven to have been falsely accused.

Based on the inputs you gave to Math Professor #1, their response was correct. The inputs were not. If we were only testing the population that has been accused* of being allergic to x, the results would be different.

For the record I'm not stating that any of these numbers are accurate, only following the hypothetical situation presented

*snark
Title: Re: Statistics update
Post by: former player on October 05, 2018, 12:19:02 PM
Your 5% false accusation rate.  That's 5% of the accusations made are false?

So that's 5% of the accused population of 17, not the non-accused population of 950, right?  Because you can't have accusations within a population that you've just described as not accused, right?
Title: Re: Statistics update
Post by: sol on October 05, 2018, 12:25:58 PM
As we've previously tried to explain, your math is fine but you're solving the wrong problem.  We're not interested in the overlap of the two probabilities you have posed, we're only interested in which people who have already been accused are guilty, which is transparently given by fp in the problem statement without the necessity of any additional information, parallel probabilities, or other populations.

Chances are good that the people you contacted would tell you this too, if you just explained to them the real problem you are trying to solve about rapists.  They will understand what you do not, that the infection rate problem and the rapist problem are structurally dissimilar.
Title: Re: Statistics update
Post by: shenlong55 on October 05, 2018, 12:32:11 PM
I know it's called a "false allegation rate", which sounds similar to "false positive rate", but you need to look at how the actual numbers your using (2-10%) were derived, not just what it's named.  I'll admit, I haven't looked at the actual paper, but the description on Wikipedia sure makes it seem like it's derived differently than a false positive rate.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 12:42:04 PM
As we've previously tried to explain, your math is fine but you're solving the wrong problem.  We're not interested in the overlap of the two probabilities you have posed, we're only interested in which people who have already been accused are guilty, which is transparently given by fp in the problem statement without the necessity of any additional information, parallel probabilities, or other populations.

Chances are good that the people you contacted would tell you this too, if you just explained to them the real problem you are trying to solve about rapists.  They will understand what you do not, that the infection rate problem and the rapist problem are structurally dissimilar.

The Professor was explicit, my framing was correct, and the two questions are essentially the same, so no, my conclusion was correct. If you feel this is an error on his part, you can do what I did, reach out to current scholars whose main research areas are in Bayesian Statistics and Computations, see what they have to say. Look, you and I both have advanced degrees involving statistics, and we are clearly butting heads.

We can sit here and argue forever, why not find someone impartial to judge, like I did?

Also, I did not identify so much of an "overlap", rather I instead identified given X what is Y. You have to understand, the 26% and 5% describe two different events like the professor said.

5% describes false allegation rate; 26% describes odds of a person being actual criminal when accused of a crime. They tell the same story.
Title: Re: Statistics update
Post by: Davnasty on October 05, 2018, 12:57:00 PM
If we did in fact have a false accusation rate for the entire population of the US, perhaps we could use it as the input.

In 2012 there were 87,000 reported rapes; 87,000 * .05 = 4350
The average male life expectancy is 75.5; 75.5 * 4350 = 328,425
The US population in 2012 was 312,800,000; 328,425 / 312,800,000 = ~.001

.1% is closer to the actual (proven) false accusation rate even though my method for finding it was pretty sloppy. I realize the first line here suggests something anisotropy is still not in agreement with, but I thought looking at the numbers from another angle might help.
Title: Re: Statistics update
Post by: JLee on October 05, 2018, 12:59:21 PM
what it means: False allegation rate is 5%; the person accused actually guilty 26%. They are internally consistent and tell the SAME story.

You're arguing that the true allegation rate is 95% and an accused person is actually guilty 26% of the time.

19 out of 20 are telling the truth, but 3 out of 4 accused are innocent?  There's no fucking way that math works.   
Title: Re: Statistics update
Post by: sol on October 05, 2018, 01:14:47 PM
The Professor was explicit, my framing was correct,

Yes, the answer was clear and your math is fine.

Quote
and the two questions are essentially the same

No, not even close.  Very much not the same, which is the whole problem here. This would be revealed to you if you would just ask the authority.  Why did you choose to ask about the infection rate problem, and not the rape problem?  Go ahead, repose the same question and then tell them that you think 75% of people accused of rape are innocent because everyone gets randomly accused of rape and only a small number of them are rapists, so there must be lots of false allegations out there.  If they're polite, they won't laugh at you.

If you won't believe all of us, maybe you'll believe them.

Quote
If you feel this is an error on his part,

Not on his part, on your part.  You've misapplied the solution to a non-analogous problem.  You keep saying "Brett Kavanaugh is just some random dude to me" but he is not some random dude, he's a specific individual who has been credibly accused or sexual assault.  The math used to determine the infection rate in whole populations is irrelevant, because you don't go around randomly accusing people of sexual assault the same way you randomly go around testing people for infection. 

It's the wrong problem.  Your stats prof told you that you have correctly solved one problem, and I agree, but then you have falsely applied that solution to a different and only tangentially related problem.

If you don't believe me, just pose your real question to the stats prof. 

Don' claim a correct answer to a different problem means you have this problem right. 

Also, you're pissing off lots of people on the forum with your stubborn adherence to something only a gross dude would try to advocate.  It's okay to say that you love stats, but mistook these two problems as identical when they are not, and aren't actually accusing the vast majority of sexual assault victims of being liars.  Because right now, that is exactly what you are doing.



Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 01:17:03 PM
If we did in fact have a false accusation rate for the entire population of the US, perhaps we could use it as the input.

In 2012 there were 87,000 reported rapes; 87,000 * .05 = 4350
The average male life expectancy is 75.5; 75.5 * 4350 = 328,425
The US population in 2012 was 312,800,000; 328,425 / 312,800,000 = ~.001

.1% is closer to the actual (proven) false accusation rate even though my method for finding it was pretty sloppy. I realize the first line here suggests something anisotropy is still not in agreement with, but I thought looking at the numbers from another angle might help.

Hi Dabnasty,

I am not sure how to phrase this, so please bear with me. When we say false allegation rate being 5%, it means, there is 5% chance the allegation is false when an allegation is made. And that's it. This tells us nothing about the general population.

But together with a fn rate. We can tackle it as an inference problem, as I had done here.

Now there is a good reason to doubt 0.1% being the actual false accusation rate.

In statistics, when we frame these sort of problems, we often have to deal with type I and type II errors. Namely, false positive and false negative. They tend to be offsetting each other, what I mean by that is, the smaller type I, the bigger the type II and vice versa. When your fp is 0.1%, generally your fn would be extremely high, perhaps as high as 99%. I find this quite unrealistic, don't you?

But who knows, maybe it is 0.1% for the entire population. But you have to remember, this 0.1% tells a different story. It's no longer when an allegation is made, 5% being false. The 0.1% would mean in the population, 0.1% of the time, you would be accused of rape (randomly). You see how the orders change? I hope this helps.
 
JLee,

please read the professor's comment. Most people do find it counter intuitive, if you still cant understand, I cant help you. maybe read the book he recommended.
Title: Re: Statistics update
Post by: JLee on October 05, 2018, 01:19:35 PM
JLee,

please read the professor's comment. Most people do find it counter intuitive, if you still cant understand, I cant help you. maybe read the book he recommended.

If you were right, society would be rife with falsely accused people.  It isn't.

What is it about not understanding..?
Title: Re: Statistics update
Post by: sol on October 05, 2018, 01:27:41 PM
When we say false allegation rate being 5%, it means, there is 5% chance the allegation is false when an allegation is made. And that's it. This tells us nothing about the general population.

Brett Kavanaugh is not part of the general population, he is already accused.  You just agreed that there is a 5% chance that the allegation is false when an allegation is made.  An allegation has been made.  Do you still think there's a 5% chance he's innocent, or do you think it's a 75% chance he's innocent?

Quote
please read the professor's comment.

The prof's comments are totally irrelevant to the rape problem, because you didn't ask about that.  You asked about an infection rate problem which is not the same.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 01:27:56 PM
The Professor was explicit, my framing was correct,

Yes, the answer was clear and your math is fine.

Quote
and the two questions are essentially the same

No, not even close.  Very much not the same, which is the whole problem here. This would be revealed to you if you would just ask the authority.  Why did you choose to ask about the infection rate problem, and not the rape problem?  Go ahead, repose the same question and then tell them that you think 75% of people accused of rape are innocent because everyone gets randomly accused of rape and only a small number of them are rapists, so there must be lots of false allegations out there.  If they're polite, they won't laugh at you.

If you won't believe all of us, maybe you'll believe them.

Quote
If you feel this is an error on his part,

Not on his part, on your part.  You've misapplied the solution to a non-analogous problem.  You keep saying "Brett Kavanaugh is just some random dude to me" but he is not some random dude, he's a specific individual who has been credibly accused or sexual assault.  The math used to determine the infection rate in whole populations is irrelevant, because you don't go around randomly accusing people of sexual assault the same way you randomly go around testing people for infection. 

It's the wrong problem.  Your stats prof told you that you have correctly solved one problem, and I agree, but then you have falsely applied that solution to a different and only tangentially related problem.

If you don't believe me, just pose your real question to the stats prof. 

Don' claim a correct answer to a different problem means you have this problem right. 

Also, you're pissing off lots of people on the forum with your stubborn adherence to something only a gross dude would try to advocate.  It's okay to say that you love stats, but mistook these two problems as identical when they are not, and aren't actually accusing the vast majority of sexual assault victims of being liars.  Because right now, that is exactly what you are doing.

Real question as in replace crime with rape? Ya that would go well, hey we've never met but let's talk about rape.

Let me ask you this, what difference does the kind of crime make? Surely by your logic, the false accusation rate would be much higher when it comes to actual criminals and lower regarding the general population anyway no matter the crime??


And stop calling this gross or imply bad intention on my part, this is deeply offensive. I am telling you your view is wrong. If you don't believe me just reach out to a Bayesian Stat expert/scholar at your local college.

I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent." This has nothing to do with BK specifically. Given any person, the statement stands. The statement is simply a logical statement given 5% fp and 66% fn.

Like I said to Debnasty, the 0.1% would tell a different story. It's no longer when an allegation is made, 5% being false. The 0.1% would mean in the population, 0.1% of the time, you would be accused of rape (randomly). You see how the orders change? I hope this helps.

It is clear to me now you do not seek to discover objective reality but rather are more interested in "removing ammunition" from the other side, I find this partisan behavior rubbish to say the least.

Quote
You asked about an infection rate problem which is not the same.

and omg, you really missed the table I sent to him about crime? I am done, seriously.
Title: Re: Statistics update
Post by: sol on October 05, 2018, 01:29:49 PM
I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent."

No one is randomly accused of rape.  People can be randomly tested for infection.  No one is randomly accused of rape.  All math based the assumption that random people are accused of rape is irrelevant to the situation in which a person has already been accused.


Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 01:32:30 PM
I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent."

No one is randomly accused of rape.  People can be randomly tested for infection.  No one is randomly accused of rape.  All math based the assumption that random people are accused of rape is irrelevant to the situation in which a person has already been accused.

Go talk to a Bayesian Statistics expert at your local college as I have repeatedly suggested. I am done with you here.
Title: Re: Statistics update
Post by: FrugalToque on October 05, 2018, 01:34:10 PM

I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent." This has nothing to do with BK specifically. Given any person, the statement stands. The statement is simply a logical statement given 5% fp and 66% fn.


Why are you still going on about this?

One of the premises of your question is that the false accusation rate is 5%.  That means that the probability of a person being falsely accused of rape is 5%.

How can you ignore the premise of your question and play statistical games with this really, really basic fact?

Toque.
Title: Re: Statistics update
Post by: FrugalToque on October 05, 2018, 01:36:27 PM

I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent." This has nothing to do with BK specifically. Given any person, the statement stands. The statement is simply a logical statement given 5% fp and 66% fn.


Why are you still going on about this?

One of the premises of your question is that the false accusation rate is 5%.  That means that the probability of a person being falsely accused of rape is 5%.

How can you ignore the premise of your question and play statistical games with this really, really basic fact?

Toque.

What are you going to do next?  Re-apply the 26% to your original filter?

Then come out with a 75% false accusation rate?  Do you understand the fundamental, mathematical absurdity of what you're doing?

Toque.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 01:37:07 PM

I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent." This has nothing to do with BK specifically. Given any person, the statement stands. The statement is simply a logical statement given 5% fp and 66% fn.


Why are you still going on about this?

One of the premises of your question is that the false accusation rate is 5%.  That means that the probability of a person being falsely accused of rape is 5%.

How can you ignore the premise of your question and play statistical games with this really, really basic fact?

Toque.

Toque, with respect. Please read the professors comment "the two numbers describe two different events."

One is false allegation, the other is guilty, they are NOT the same thing.

I seriously don't understand why it is so difficult for people to wrap their heads around this.

Also absurd? I have to say, you guys are the absurd ones. When one has two allegations against them, it goes up to 70% precisely because you would then replace the 5% population to 26%. This is just how the math works.

Which is why the legal system strongly leans on a PATTERN of behavior. Re: multiple allegation => much higher chance of being guilty.
Title: Re: Statistics update
Post by: JLee on October 05, 2018, 01:38:36 PM

I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent." This has nothing to do with BK specifically. Given any person, the statement stands. The statement is simply a logical statement given 5% fp and 66% fn.


Why are you still going on about this?

One of the premises of your question is that the false accusation rate is 5%.  That means that the probability of a person being falsely accused of rape is 5%.

How can you ignore the premise of your question and play statistical games with this really, really basic fact?

Toque.

Toque, with respect. Please read the professors comment "the two numbers describe two different events."

One is false allegation, the other is guilty, they are NOT the same thing.

I seriously don't understand why it is so difficult for people to wrap their heads around this.

My friend, we are all wondering the same thing about you right now.
Title: Re: Statistics update
Post by: sol on October 05, 2018, 01:42:16 PM
Go talk to a Bayesian Statistics expert at your local college as I have repeatedly suggested. I am done with you here.

Sounds like you are the one who needs to go ask your question.  I already know the answer.  For some reason, you are refusing to ask them the real question that you have, and are instead asking a different question.

Note that not once in your correspondences with the stats prof did you mention sexual assault or false allegations of sexual assault.  You only asked about infection rates, which is not the same problem.  For reasons I have tried to explain to you over and over again.

If you were to pose the actual problem you are claiming to have solved, they will set you straight.  Go ahead, I'll wait.
Title: Re: Statistics update
Post by: former player on October 05, 2018, 01:43:20 PM
This is your table -

The 2x2 table would look like this

                                                                 accused                          not accused
Criminals                    50                                  17                                   33
Not Criminals             950                                48                                  902
                                 1000                                65                                  935


The problem with this table is that you've laid it out the wrong way around.  You can't start the problem with who is a criminal and who is not, because who is a criminal comes after the accusation.  So you start with who is accused, and then of the accused population you look at who is guilty and who is not.  So, using your figures -

                                                                 guilty                          innocent
Accused                    17                                  16                                  1
Not accused              983                                 33                               950
                              1000                                49                               951


And really, you say you are an ally.   I don't know what's behind your fanaticism in trying to prove an unrealistic false accusation rate, but whatever it is is blinding you to logic and to what is needed to be an ally of victims of sexual violence.




Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 01:46:38 PM
I already know the answer. 

lol ok , then what's the harm in asking an expert that is impartial to judge it as I had done?

Reminds me of how BK didn't want to have a full FBI investigation.

By actual problem you mean exchange words crime with rape? You didn't answer my question, what difference does the kind of crime make? Surely by your logic, the false accusation rate would be much higher when it comes to actual criminals and lower regarding the general population anyway no matter the crime??
Title: Re: Statistics update
Post by: FrugalToque on October 05, 2018, 01:47:59 PM

I want to make it VERY CLEAR, as I have repeatedly done. All the 26% says is: "When a random person is accused of rape by a single alleger,there is a 26% chance  is actually innocent." This has nothing to do with BK specifically. Given any person, the statement stands. The statement is simply a logical statement given 5% fp and 66% fn.


Why are you still going on about this?

One of the premises of your question is that the false accusation rate is 5%.  That means that the probability of a person being falsely accused of rape is 5%.

How can you ignore the premise of your question and play statistical games with this really, really basic fact?

Toque.

Toque, with respect. Please read the professors comment "the two numbers describe two different events."

One is false allegation, the other is guilty, they are NOT the same thing.

I seriously don't understand why it is so difficult for people to wrap their heads around this.

Also absurd? I have to say, you guys are the absurd ones. When one has two allegations against them, it goes up to 70% precisely because you would then replace the 5% population to 26%. This is just how the math works.

Which is why the legal system strongly leans on a PATTERN of behavior. Re: multiple allegation => much higher chance of being guilty.

How can we get this clear to you?  You're mashing things together you shouldn't without any clear reason why.

Probability of a randomly chosen man being a rapist:  5%
Average rapes per rapist: 6

So in a population of 1000 mean, there are 50 rapists who commit 300 rapes.

Of those rapes, only a fraction are reported.  Statistically, according to certain sources, it's about 30%, or 90 in our case. (This sounds high to me, but it's the one I pulled from wikipedia).

So the police are going to get about 94 rape reports, 4 of which are false.  (There's your 5% false reporting rate)

Are you with us so far?

So in a population of 1000, 90 accusations will be real and an additional 4 will be made falsely.

You odds of being falsely accused, as a random man, are about 4 out of 1000.

Toque.
Title: Re: Statistics update
Post by: Kris on October 05, 2018, 01:48:46 PM
I already know the answer. 

lol ok , then what's the harm in asking an expert that is impartial to judge it as I had done?

Reminds me of how BK didn't want to have a full FBI investigation.

By actual problem you mean exchange words crime with rape? You didn't answer my question, what difference does the kind of crime make? Surely by your logic, the false accusation rate would be much higher when it comes to actual criminals and lower regarding the general population anyway no matter the crime??

LOL oh, brother.
Title: Re: Statistics update
Post by: sol on October 05, 2018, 01:53:38 PM
What are you going to do next?  Re-apply the 26% to your original filter?

Then come out with a 75% false accusation rate?  Do you understand the fundamental, mathematical absurdity of what you're doing?

Toque,

Anistropy's math is correct for infection rates, which is a different problem unrelated to sexual assault allegations.

The reason a 5% false accusation rate can be twisted into a 75% false accusation rates is that he's artificially specified that only a tiny fraction of the population is guilty, and that everyone is equally accused whether they are guilty or not.  This scenario results in millions of people who are not guilty being falsely accused.  Then he lumps all of those falsely accused people into the same pool as the correctly accused people, and the guilty ones now make up a minority of the total population of accused people. 

It's just math slight of hand.  By assuming that everyone is accused, and that the real guilty rate is low, he's created an artificially high number of false allegations to dilute the pool and shrink the percentage of guilty among the accused. 

It has no bearing whatsoever on whether or not a person who is accused of sexual assault is guilty, his claims to the contrary, because he's fundamentally misconstructed the problem.  It's a confusing situation though, so I would be forgiving if he weren't then using that sleight of hand to accuse millions of sexual assault survivors of being liars.  That last part kind of pisses me off though.

Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 01:53:57 PM
Alright, I can see we are talking past each other completely. I will stop arguing now as it's pretty much pointless.

I am curious, how many people here took 1st year/2nd year stat courses in college?
Title: Re: Statistics update
Post by: former player on October 05, 2018, 01:55:18 PM
Alright, I can see we are talking past each other completely. I will stop arguing now as it's pretty much pointless.

I am curious, how many people here took 1st year/2nd year stat courses in college?

I am curious, why are you ignoring my posts on this thread?
Title: Re: Statistics update
Post by: sol on October 05, 2018, 01:56:56 PM
Alright, I can see we are talking past each other completely. I will stop arguing now as it's pretty much pointless.

I am curious, how many people here took 1st year/2nd year stat courses in college?

I am curious, why are you ignoring my posts on this thread?

I am curious, why are you refusing to ask the stats prof the real question you have?

These are all rhetorical questions, right?
Title: Re: Statistics update
Post by: Kris on October 05, 2018, 01:57:02 PM
Alright, I can see we are talking past each other completely. I will stop arguing now as it's pretty much pointless.

I am curious, how many people here took 1st year/2nd year stat courses in college?

You do realize -- because they have already told you -- that the people in this thread who are pointing out your problem have advanced knowledge of statistics, right?

Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 02:02:42 PM
Sol,

for the last time, what difference does changing the word rape to crime make? regardless of the crime, if your logic holds, the guilty would be accused way more than the innocent. So what difference does it make? Even if Dabnasty's 0.1% idea turns out to be right. You realize it means: "An innocent person has a 0.1% chance of being accused of rape". VS "when a person is accused of rape, there is a 26% chance he is guilty."

The problem is we can not possibly know if the label "innocent" is applicable at the time of accusation.

Formerplayer, sorry I focused on Sol too much. But it's essentially the same story over and over again. The Stats describe different events.

Kris, and where exactly did you see that mentioned other than Sol's brief rundown of his background? Provide a quote please?
Title: Re: Statistics update
Post by: former player on October 05, 2018, 02:05:24 PM
Sol,

for the last time, what difference does changing the word rape to crime make? regardless of the crime, if your logic holds, the guilty would be accused way more than the innocent. So what difference does it make? Even if Dabnasty's 0.1% idea turns out to be right. You realize it means: "An innocent person has a 0.1% chance of being accused of rape". VS "when a person is accused of rape, there is a 26% chance he is guilty."

Formerplayer, sorry I focused on Sol too much. But it's essentially the same story over and over again. The Stats describe different events.

No, they describe the same events (same number accused, same number not accused, but mine come to a different result because they correctly apply the 5% to the accused number not the unaccused number.


Really, this isn't a stats problem, it's a basic logic and comprehension problem.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 02:07:17 PM
former player,

I am going to quote the professor here
"The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other. "
Title: Re: Statistics update
Post by: sol on October 05, 2018, 02:13:21 PM
for the last time, what difference does changing the word rape to crime make?

It matters because you still didn't ask the correct question.  You asked about the chances of a random person "actually being a criminal", akin to a random person actually being infected.  This has nothing to do with whether or not an accused rapist is actually a rapist, because you're not asking about the chance a random person is actually rapist, you're asking about the chance that an accused rapist is a rapist.

An accused rapist is not a random person.  You've moved it from a precondition to a conditional probability, and then tried to draw conclusions about that one person based on conclusions about everyone in the population.  But as we keep repeating for you, you have incorrectly assumed that everyone in the population is equally randomly accused, when in reality real rapists get accused of rape far more often than random people do. 

In fact, your own stats report that 95 of accused rapists are guilty and 5% are innocent, right?  You refused to answer this question above.  If Brett Kavanaugh is accused of sexual assault, do you think there is a 5% chance he is innocent or a 75% chance he is innocent? 

Your answer basically depends on whether or not you think everyone in the population is randomly accused of sexual assault.  If you think everyone is randomly accused, then the likelihood of an accused person being innocent is high.  If you accept your own preconditional fp that only 5% of allegations are false, then the likelihoood of an accused person being innocent is low.  In order to get from one to the other, you have to seriously misunderstand how this problem is set up.
Title: Re: Statistics update
Post by: former player on October 05, 2018, 02:13:41 PM
former player,

I am going to quote the professor here
"The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other. "


Only one of those events relates to the real world problem you are proposing as respects accused rapists, though.  And it's not the one which ends up with 48 innocent men out of 1000 accused of rape.  I mean, 48 out of 1000 is significantly higher than rates of cancer diagnosis - I think we all would have noticed that.
Title: Re: Statistics update
Post by: PathtoFIRE on October 05, 2018, 02:27:42 PM
Anisotropy, this seems simple to me that you are wrong. Now I'm not familiar with the actual statistics you are talking about, but sticking with your hypotheticals:

False positive rate (FP) = When testing a given population, what % of normal subjects test positive
Positive predictive value (PPV) = Given a positive result, what % are true positives (True positives / Total positives)

When someone tells me that the false rape allegation rate is somewhere around 5%, which of those two statistical values are they referring to? The false positive rate, or 1 - PPV. Given the way the stat is being stated, that 5% of rape allegations are false, or conversely that 95% of rape allegations are true, that sounds like 1 - PPV to me, not FP outright.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 02:30:26 PM
It matters because you still didn't ask the correct question.  You asked about the chances of a random person "actually being a criminal", akin to a random person actually being infected.  This has nothing to do with whether or not an accused rapist is actually a rapist, because you're not asking about the chance a random person is actually rapist, you're asking about the chance that an accused rapist is a rapist.

An accused rapist is not a random person.  You've moved it from a precondition to a conditional probability, and then tried to draw conclusions about that one person based on conclusions about everyone in the population.  But as we keep repeating for you, you have incorrectly assumed that everyone in the population is equally randomly accused, when in reality real rapists get accused of rape far more often than random people do. 

In fact, your own stats report that 95 of accused rapists are guilty and 5% are innocent, right?  You refused to answer this question above.  If Brett Kavanaugh is accused of sexual assault, do you think there is a 5% chance he is innocent or a 75% chance he is innocent? 

Your answer basically depends on whether or not you think everyone in the population is randomly accused of sexual assault.  If you think everyone is randomly accused, then the likelihood of an accused person being innocent is high.  If you accept your own preconditional fp that only 5% of allegations are false, then the likelihoood of an accused person being innocent is low.  In order to get from one to the other, you have to seriously misunderstand how this problem is set up.

I have said many times, based on two allegations, BK's likelihood of being guilty is 70%, I don't understand why you keep saying I am refusing to answer this. I have done so plenty of times.

Quote
you're not asking about the chance a random person is actually rapist, you're asking about the chance that an accused rapist is a rapist.

This is right, this is how inference works, you take what is known: population composition, fp, fn, and work out the likelihood of "Given X what is Y". In my case submitted to the professor: "Odds of an accused criminal being actually a criminal". We are just running in circles at this point.

What I calculated is the odds of an accused person (of any crime, given appropriate fp and fn and comp), being actually guilty of the crime accused of. That is all.

The framing is sound, there is no logical error I had committed. Once again, If you are 100% sure you are correct, what's the harm in seeking an impartial expert to judge?

pathtofire,
The key is to recognize 26% and 5% describe two different events. one is false allegation rate 5% given an allegation; the other is actually being guilty when accused by a single allegation 26%.



TO all that disagree with me, since Sol doesn't want to seek an impartial expert to be the judge, how about one of you take it up and talk to an expert in Bayesian Statistics and computations at your local college like I had done. This can be settled really easily.
Title: Re: Statistics update
Post by: Davnasty on October 05, 2018, 02:30:33 PM
Sol,

for the last time, what difference does changing the word rape to crime make? regardless of the crime, if your logic holds, the guilty would be accused way more than the innocent. So what difference does it make? Even if Dabnasty's 0.1% idea turns out to be right. You realize it means: "An innocent person has a 0.1% chance of being accused of rape". VS "when a person is accused of rape, there is a 26% chance he is guilty."

The problem is we can not possibly know if the label "innocent" is applicable at the time of accusation.

Formerplayer, sorry I focused on Sol too much. But it's essentially the same story over and over again. The Stats describe different events.

Kris, and where exactly did you see that mentioned other than Sol's brief rundown of his background? Provide a quote please?

Yes, but you got 26% using 5% as the input. If you used .1% as the input your result would be 94.4% chance he is guilty.

I should also point out some glaring mistakes in my math like using the whole US population rather than male only (accidentally) and including every year of the average male life when not all ages will be accused at the same rate (intentionally, cause picking an age felt weird)

And I'll also point out that 94.4% is not accurate because there is no guarantee that all accusations not proven false are in fact true. Most accusations are proven neither true nor false which is something that's been completely glazed over throughout this discussion. (I hope I'm not the one misunderstanding this part)
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 02:35:08 PM
Sol,

for the last time, what difference does changing the word rape to crime make? regardless of the crime, if your logic holds, the guilty would be accused way more than the innocent. So what difference does it make? Even if Dabnasty's 0.1% idea turns out to be right. You realize it means: "An innocent person has a 0.1% chance of being accused of rape". VS "when a person is accused of rape, there is a 26% chance he is guilty."

The problem is we can not possibly know if the label "innocent" is applicable at the time of accusation.

Formerplayer, sorry I focused on Sol too much. But it's essentially the same story over and over again. The Stats describe different events.

Kris, and where exactly did you see that mentioned other than Sol's brief rundown of his background? Provide a quote please?

Yes, but you got 26% using 5% as the input. If you used .1% as the input your result would be 94.4% chance he is guilty.

I should also point out some glaring mistakes in my math like using the whole US population rather than male only (accidentally) and including every year of the average male life when not all ages will be accused at the same rate (intentionally, cause picking an age felt weird)

And I'll also point out that 94.4% is not accurate because there is no guarantee that all accusations not proven false are in fact true. Most accusations are proven neither true nor false which is something that's been completely glazed over throughout this discussion. (I hope I'm not the one misunderstanding this part)

when you use 0.1% as fp, what you are stating is that the false allegation rate is 0.1%. This is no longer saying 0.1% of the population will be randomly accused of rape in their lifetime. This is saying when someone makes an accusation towards anyone, there is 0.1% chance the accusation is false.
Title: Re: Statistics update
Post by: PathtoFIRE on October 05, 2018, 02:36:14 PM
pathtofire,
The key is to recognize 26% and 5% describe two different events. one is false allegation rate 5% given an allegation; the other is actually being guilty when accused by a single allegation 26%.

Correct me if I'm wrong, since I was one of those ppl on the Kavanaugh thread trying to ignore the statistics sidebar, but didn't you derive the 26% figure? That was not an empirical finding of any sort reported in the media or literature, correct? That's why I used PPV, that is exactly the statistic we use in medicine when we want to know what percent of positive results are true positives (and subtracting PPV from 1 gives you the percent of false positives, which is what we're interested in your hypothetical). To my reading, you are literally saying "the PPV is 5%" and then saying "the PPV is 26%". I think you think the 5% figure is a false positive rate, but it's not, it's the 1-PPV regarding all rape allegations in the studies that again, I have not actual read.
Title: Re: Statistics update
Post by: shenlong55 on October 05, 2018, 02:41:44 PM
pathtofire,
The key is to recognize 26% and 5% describe two different events. one is false allegation rate 5% given an allegation; the other is actually being guilty when accused by a single allegation 26%.

Correct me if I'm wrong, since I was one of those ppl on the Kavanaugh thread trying to ignore the statistics sidebar, but didn't you derive the 26% figure? That was not an empirical finding of any sort reported in the media or literature, correct? That's why I used PPV, that is exactly the statistic we use in medicine when we want to know what percent of positive results are true positives (and subtracting PPV from 1 gives you the percent of false positives, which is what we're interested in your hypothetical). To my reading, you are literally saying "the PPV is 5%" and then saying "the PPV is 26%". I think you think the 5% figure is a false positive rate, but it's not, it's the 1-PPV regarding all rape allegations in the studies that again, I have not actual read.

+1
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 02:54:40 PM
pathtofire,
The key is to recognize 26% and 5% describe two different events. one is false allegation rate 5% given an allegation; the other is actually being guilty when accused by a single allegation 26%.

Correct me if I'm wrong, since I was one of those ppl on the Kavanaugh thread trying to ignore the statistics sidebar, but didn't you derive the 26% figure? That was not an empirical finding of any sort reported in the media or literature, correct? That's why I used PPV, that is exactly the statistic we use in medicine when we want to know what percent of positive results are true positives (and subtracting PPV from 1 gives you the percent of false positives, which is what we're interested in your hypothetical). To my reading, you are literally saying "the PPV is 5%" and then saying "the PPV is 26%". I think you think the 5% figure is a false positive rate, but it's not, it's the PPV regarding all rape allegations in the studies that again, I have not actual read.

the 26% was derived using 2-10% false allegation rate, which I treated as fp. 2/3 as fn as estimate 2/3 of crimes are unreported. And population comp of 5% rapists in a population. These inputs are empirical numbers based on lits I cited.

In terms of PPV, PPV = True positive / predicted total positive. So the 26% would be PPV.

The 5% is the false positive rate. FPR = false positive / (false positive + true negatives), or FPR = false positive / (sum of condition negatives). As condition negatives include both false positive and true negatives.

1-PPV is actually the False discovery rate, ie, False positive / (false negatives + true negatives), or FDR = false positive / (sum of condition positives). This is not the fp.
Title: Re: Statistics update
Post by: gaja on October 05, 2018, 02:56:03 PM
Alright, I can see we are talking past each other completely. I will stop arguing now as it's pretty much pointless.

I am curious, how many people here took 1st year/2nd year stat courses in college?

I did. And I have also taught math and statistics in high school and junior high. Your error is typical of student who misunderstand the text and therefore get the wrong end result, and get really frustrated for a low score on the test, since their calculations are correct.

Maybe a simpler language will help?

"1000 men live in Oaktown. 5 % of these are accused of rape in 1983. Of these accused rapists, 5 % are innocent. 65 % of the rapists in Oaktown are never accused.
a) How many men in Oaktown are accused of rape? (answer: 50)
b) How many of the accused men are innocent? (answer: 2-3)
c) What is the probability that a man accused of rape in Oaktown is guilty? (answer: 95 %)
d) What is the probability that a random man in Oaktown is a rapist? (answer: 13.6 %)
3) What is the probability that a random man in Oaktown will be innocently accused of rape? (answer: 0.25 %)"

Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 03:03:38 PM
I noticed there's an error in B, instead of b) How many of the accused men are innocent? (answer: 2-3).

It should read: how many of the accused rapist are innocent, because you explicitly stated Of these accused rapists, 5 % are innocent.

But notice, all of your statements are different from:

Given a man accused of rape, what are the odds of him being innocent.

This is different from yours:

What is the probability that a random man in Oaktown will be innocently accused of rape? (answer: 0.25 %). In your case, you are assuming him to innocent. Rewording: Given a man in oaktown is innocent, what are the odds of him being accused of rape.

In my statement, it's simply, Given a man (without being known innocent or guilty) accused of rape, what are the odds of him being innocent.

Given a man in oaktown is innocent, what are the odds of him being accused of rape.
vs
Given a man (without being known innocent or guilty) accused of rape, what are the odds of him being innocent.

Very different events.


Title: Re: Statistics update
Post by: PathtoFIRE on October 05, 2018, 03:11:38 PM
To continue my thought, a false positive rate is dependent on the characteristics of the test itself and the cutoff values. PPV is influenced by the population.

So you have a screening test, and you set your cutoff values such that you get a false positive rate of 5%.

1000 healthy volunteers --> test applied = 950 negative results (true negatives), 50 positive results (false positives)
False positive rate = # of false positives / (# of false positives + # of true negatives) [restated: FP/(FP+TN)]
= 5%

Then you test 100 patients with the disease and get these results

100 known patients --> test applied = 90 positive results (true positives), 10 negative (false negatives)
False negative rate = false negatives / (false negatives + true positives) [restated: FN/(FN+TP)]
=10%

Disease prevalence is 5%.

Now you test two sets of populations, one a random cohort, and the other a mix of patients suspected to have disease.

Random 1000 people (950 should be healthy, 50 should have disease)
                      Total        Positive           Negative
Healthy ppl    950          48 (FP)            902 (TN)
Disease         50            45 (TP)            5 (FN)

PPV = TP/(TP+FP) = 45 / (45+48) = 45 / 93 = 48%     [1 - PPV = 53%]
NPV = TN/(TN+FN) = 902 / (902+4) = 902 / 906 = 99.5%

Now instead test 100 people suspected of having disease based on symptoms, but only half do.
                       Total        Positive          Negative
Disease           50           45 (TP)           5 (FN)
No disease      50           3 (FP)             47 (TN)

PPV = 45 / (45+3) = 45/48 = 94% (1 - PPV = 6%)
NPV = 47 / (47+5) = 47/52 = 90%


What everyone else is saying is that we are talking about the second situation. Given any one allegation within the cohort of all allegations made, what is the positive predictive value, and others are saying that the research says 95% (5% being 1-PPV). If you decided to take the methods that those studies used to determine a true allegation from a false one (testimony, police reports, convictions, etc), and then applied those while accusing a large random group of people, then yes, the PPV of those methods would go down, like in my first example, and I think that is what you are actually arguing. But everyone else is saying that these actual studies are not talking about a random 1000 strangers, but instead are looking at the actual cases of allegations, and from there they determined that only 5% could be called false.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 03:22:07 PM
pathtofire,

Your arguments are similar to gaja's. IF we knew which group we are dealing with given each random person, then yes, you folks would be right.

The problem is, we don't. Does this make sense? I believe this is where most of the confusion is coming from.

When you calculate the odds when accused, you already assumed the individual in either of the subsets.

I am saying, without knowing with subset the individual is in, what are the odds of the him being guilty when accused.

Think about it, if we knew the person was in the rapist group, then ya, the odds would be super high he's actually guilty. 
If we knew the person was in the innocent group, then ya, the odds would be very low he's actually guilty. 0.01% in dabn's case.

But the problem is we dont know. What I calculated was this, Given a person is accused (and we have no idea which group he's in), what are his odds of being guilty.

And EVERYONE's error here is assuming the person is in the rapist group to begin with.
Title: Re: Statistics update
Post by: Glenstache on October 05, 2018, 03:27:30 PM
This is some gold-star arguing about people being wrong on the internet.

This has a lot of great elements:
1. Appeal to authority
2. Arcane tangent
3. Highly emotional content
4. Repetition of arguments
5. Poorly formed analogy

There should be more effort in including past comments in replies to make it harder to read though. There is lots of room for improvement in that category.

And the whole thing boils down to whether or not the same stats apply to a general population and the sub-population of those accused of rape.  In all of this discussion, that is the only assumption that matters. Pretty everyone else here knows how to multiply probabilities, so it isn't a question of understanding method.
Title: Re: Statistics update
Post by: PathtoFIRE on October 05, 2018, 03:31:17 PM
pathtofire,

Your arguments are similar to gaja's. IF we knew which group we are dealing with given each random person, then yes, you folks would be right.

The problem is, we don't.

Exactly, the point is you only do the deep dive when you have a credible accusation, which is what kinda happened hear (I'd argue the dive wasn't so deep). And we don't "test" large random populations for potential sexual assault, we just don't, we wait for some signs and symptoms to develop before instituting those tests, so I'd argue that Kavanaugh's situation is nothing like my first example and what the picture you have been trying to paint, but much closer to the second, and therefore I trust the PPV and NPV of an actual investigation (again, not sure we got one here to be honest).
Title: Re: Statistics update
Post by: sol on October 05, 2018, 03:34:24 PM
I am saying, without knowing with subset the individual is in, what are the odds of the him being guilty when accused.

Except that one of those subsets doesn't exist.

Innocent people are not routinely accused of rape the same way that healthy people are routinely screened for infection, so the math you've presented isn't relevant.  You're solving the wrong problem.

If you remove the nonexistent subset of random people who are accused of rape, your math would make a lot more sense.  Would be easier to follow, too.

For the third (or maybe fourth?) time now, do you believe there is a 5% chance or a 75% chance that a person with a single accusations is innocent?  Because you started out saying 75% innocent, then briefly started saying 5% innocent, and now you appear to be back to 75% innocent.  Keep in mind that if you say 75% innocent, you're also saying 75% of self-identified sexual assault survivors are liars.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 03:38:51 PM
pathtofire,

Your arguments are similar to gaja's. IF we knew which group we are dealing with given each random person, then yes, you folks would be right.

The problem is, we don't.

Exactly, the point is you only do the deep dive when you have a credible accusation, which is what kinda happened hear (I'd argue the dive wasn't so deep). And we don't "test" large random populations for potential sexual assault, we just don't, we wait for some signs and symptoms to develop before instituting those tests, so I'd argue that Kavanaugh's situation is nothing like my first example and what the picture you have been trying to paint, but much closer to the second, and therefore I trust the PPV and NPV of an actual investigation (again, not sure we got one here to be honest).

Right, I agree with these points. Except my calculation didn't imply a exhaustive test was necessary. what it meant was, without first knowing if a person is guilty or innocent what are the odds of him being guilty when accused.

You are saying, he wouldn't have been accused if he were innocent, because a guilty person is likely to be accused.

See my comparison again, these statements are consistent, and are both true:

Think about it, if we knew the person was in the rapist group, then ya, the odds would be super high he's actually guilty. 
If we knew the person was in the innocent group, then ya, the odds would be very low he's actually guilty. 0.01% in dabn's case.

But the problem is we dont know. What I calculated was this, Given a person is accused (and we have no idea which group he's in), what are his odds of being guilty.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 03:44:10 PM

For the third (or maybe fourth?) time now, do you believe there is a 5% chance or a 75% chance that a person with a single accusations is innocent?  Because you started out saying 75% innocent, then briefly started saying 5% innocent, and now you appear to be back to 75% innocent.  Keep in mind that if you say 75% innocent, you're also saying 75% of self-identified sexual assault survivors are liars.

what? No, this assertion is wrong.

There is 75% chance a person with a single accusation is innocent, precisely because we don't know which group he's in.

The false allegation rate is 5%, because allegations are most often true. These two event are NOT mutually exclusive because they describe two different events.

I quote the professor again:
"The 26% and 5% are conditional probabilities for two different events, and either one may (in general) be larger or smaller than the other. "

Title: Re: Statistics update
Post by: sol on October 05, 2018, 03:44:45 PM
Given a person is accused (and we have no idea which group he's in), what are his odds of being guilty.

One of those groups does not exist.  There is no confusion about which group he is in.  You've totally made up an entire population of innocent people who get falsely accused of rape, and you've used those made up false accusations to dilute the incidence of true accusations of rape.

But as I keep repeating, people are not randomly accused of rape the way they are randomly tested for infection.  You cannot apply this approach to the problem of sexual assault allegations.  Your math is fine, your problem setup is wrong.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 03:46:55 PM
Given a person is accused (and we have no idea which group he's in), what are his odds of being guilty.

One of those groups does not exist.  There is no confusion about which group he is in.  You've totally made up an entire population of innocent people who get falsely accused of rape, and you've used those made up false accusations to dilute the incidence of true accusations of rape.

But as I keep repeating, people are not randomly accused of rape the way they are randomly tested for infection.  You cannot apply this approach to the problem of sexual assault allegations.  Your math is fine, your problem setup is wrong.

which group does not exist? The innocent? Did you look at gaja's data? or Dabn's? or anyone that actually turned out to be exonerated for rape? The likelihood is very low, but this group EXISTS. Are you really this partisan and now you are just spewing wholly untrue statements now?

But since the rapist remains a minority in the population( I used 5%, you can use 10% or 15% or w/e) , what I presented is the logical account.

IF you were right and the other group (the innocent) didn't exist, there would NO FP.

Here is a way to help you think about it arithmetically:

A small % of population are rapists, their chance of being falsely accused is very low and they are 95% guilty.
A large % of population are not-rapists, their chance of being falsely accuse is extremely low and being guilty is lower still. (0.1% using dabn's number)

If we knew which group a person being accused is from, then yes, the stat appropriate for that group would be used here. The problem is we don't know. So intuitively we know the odds must be between 0.1% and 95%.

The fact that most of the population are NOT rapists means the 0.1% get much more weight in the averaging process (remember, we don't know which group the person is in). Hence the chance of guilty being 26%.
Title: Re: Statistics update
Post by: sol on October 05, 2018, 04:11:27 PM
which group does not exist? The innocent?

No, not the innocent, the large population of random innocent people who are falsely accused, the group from which you fabricate all of these hypothetical false accusations that you then use to suggest that most people who are accused are innocent.  Sexual assault allegations just don't work the same way random infection screenings do.  You don't test everyone at random, so you never create this huge group of false positives.

Quote
If we knew which group a person being accused is from,

You're still incorrectly applying the test to everyone, when you say this.  You are assuming everyone gets accused at random.  In reality, most people who are accused of rape are accused correctly because, you know, they committed a rape and gave someone a reason to accuse them.  Most people who have not committed a rape do not get accused, because nobody goes around to random people trying to decide if they should be accused of rape or not.
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 04:37:39 PM
Sol, I did not fabricate anything. If you don't believe my numbers, let's use Dabnasty's from reply #7

Quote
In 2012 there were 87,000 reported rapes; 87,000 * .05 = 4350
The average male life expectancy is 75.5; 75.5 * 4350 = 328,425
The US population in 2012 was 312,800,000; 328,425 / 312,800,000 = ~.001

What dabnasty did was he looked at actual data from 2012, and obtained the number of reported rapes (87000), and the hypothetical false accusation cases in a given year (328425), given the 5% fp rate.

so now we have true positive / (TP + NP) (which includes all real cases) in a given year.

what is 87000 / 328425?

lo and hehold: 26.5%

This is the ULTIMATE validation in my defense.  Look, none of this is fabricated, what I did was based on bayes' statistics, calculated the likelihood of being guilty is 26% in THEORY.

Dabnasty's numbers matched almost exactly in practice 26.5%. What more do you want?

Look, none of this is fabricated. I did it right.
Title: Re: Statistics update
Post by: Caroline PF on October 05, 2018, 04:58:28 PM

Here is a way to help you think about it arithmetically:

A small % of population are rapists, their chance of being falsely accused is very low and they are 95% guilty.
So to paraphrase what youīre saying, in the group of rapists, only 95% are actually rapists.


A large % of population are not-rapists, their chance of being falsely accuse is extremely low and being guilty is lower still. (0.1% using dabn's number)
And here, in the group of non-rapists, 0.1% are actually rapists.


That makes no sense.

Iīll give you the first one. Itīs possible that the guy who raped 6 women is falsely accused by a 7th women, therefore leading to the possibility that a rapist could have a false accusation.

But the second one is absolutely impossible. Thereīs no way for a non-rapist to be actually guilty of rape.

If your math allows for an impossibility, then there must be something wrong with the math.


And I think PathToFire is right. You have taken a positive predictive value (actually 1-ppv), and falsely called it the false positive rate.


Hereīs another:
Sol, I did not fabricate anything. If you don't believe my numbers, let's use Dabnasty's from reply #7

Quote
In 2012 there were 87,000 reported rapes; 87,000 * .05 = 4350
The average male life expectancy is 75.5; 75.5 * 4350 = 328,425
The US population in 2012 was 312,800,000; 328,425 / 312,800,000 = ~.001

What dabnasty did was he looked at actual data from 2012, and obtained the number of reported rapes (87000), and the hypothetical false accusation cases in a given year (328425), given the 5% fp rate.

so now we have real / hypo false (which includes all real cases)

what is 87000 / 328425?

lo and hehold: 26.5%

This is the ULTIMATE validation in my defense.  Look, none of this is fabricated, what I did was based on bayes' statistics, calculated the likelihood of being guilty is 26% in THEORY.

Dabnasty's numbers matched almost exactly in practice 26.5%. What more do you want?

Look, none of this is fabricated. I did it right.

Using actual numbers, you are saying that there were 328425 false accusations in a year. But only 87000 reported rapes.

How can you have more false accusations than total accusations?
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 05:07:57 PM

So to paraphrase what youīre saying, in the group of rapists, only 95% are actually rapists.

And here, in the group of non-rapists, 0.1% are actually rapists.

That makes no sense.


Right I screwed up, this is what happens when you spend all day arguing with people who has no clue how Bayes work (not you Caroline PF). Let me try it again.

5% of false allegation means 5 % of allegations are false. Says nothing about the groups.

the 0.1% means if one is from the innocent group, there is a 0.1% chance he would be wrongly accused.

Again, sorry for the mix up. Clearly I've spent too much time on this today and I am no longer thinking clearly.

I labeled those as hypo false cases in a year given the 5% false allegation rate. Not false accusations.

Here is the way to look at it, we KNOW there were 87000 cases reported, but based on the 5% false allegation rate, we calculate there COULD be 328425 cases in a year.

If we knew the case from the 87000 pile, then it ends there. Problem is we don't know. so we must consider the possibility it came from the pile outside of the 87000 too. Hence the 26%.
Title: Re: Statistics update
Post by: Caroline PF on October 05, 2018, 05:15:58 PM
I labeled those as hypo false cases in a year given the 5% false allegation rate. Not false accusations.

What do you mean by this? I donīt understand what you are saying here.
Title: Re: Statistics update
Post by: Caroline PF on October 05, 2018, 05:33:00 PM
I labeled those as hypo false cases in a year given the 5% false allegation rate. Not false accusations.

Here is the way to look at it, we KNOW there were 87000 cases reported, but based on the 5% false allegation rate, we calculate there COULD be 328425 cases in a year.

If we knew the case from the 87000 pile, then it ends there. Problem is we don't know. so we must consider the possibility it came from the pile outside of the 87000 too. Hence the 26%.

Iīm sorry, Iīm just not following you here. Could you explain the groups that these numbers are representing in plain english to me?

For instance, my understanding is that the number 87000 represents the total number of allegations (both true and false) in that year.

What group does the number 328425 represent?
Title: Re: Statistics update
Post by: sol on October 05, 2018, 05:38:10 PM
Quote
The average male life expectancy is 75.5; 75.5 * 4350 = 328,425

What dabnasty did was he looked at actual data from 2012, and obtained the number of reported rapes (87000), and the hypothetical false accusation cases in a given year (328425), given the 5% fp rate.

so now we have true positive / (TP + NP) (which includes all real cases) in a given year.

what is 87000 / 328425?

lo and hehold: 26.5%

This is the ULTIMATE validation in my defense.

Lol, I find this hilarious.  Now you're explicitly explaining why what you did is wrong, and still claiming vindication.

Here's a quick reminder since you haven't gotten the message yet:  the entire male US population is not equivalently subject to rape accusations.  As a general rule, rapists get accused of rape and nonrapists don't.
Title: Re: Statistics update
Post by: Glenstache on October 05, 2018, 05:39:22 PM
Here is the way to look at it, we KNOW there were 87000 cases reported, but based on the 5% false allegation rate, we calculate there COULD be 328425 cases in a year.

If we knew the case from the 87000 pile, then it ends there. Problem is we don't know. so we must consider the possibility it came from the pile outside of the 87000 too. Hence the 26%.

This is the fundamental flaw in your analysis. You assume that the false positive rate applies at the same value to the accused population from which the false positive rate is derived and to the entire population. That is not correct. As you said, if an allegation comes from the 87,000 pile (the number actually accused) then it ends there. There is a strong bias in the accused data set. Of course if you accuse a person from the general population you will get a bigger likelihood of being innocent. But that is an inane question to ask within the given context. Kavanaugh is clearly in the "87,000" group for the purposes of this discussion and, as you said above, that is where the analysis should stop.
Title: Re: Statistics update
Post by: sol on October 05, 2018, 05:47:38 PM
Of course if you accuse a person from the general population you will get a bigger likelihood of being innocent. But that is an inane question to ask within the given context.

That's the whole shtick, here.  Anisotropy's math is predicated on the underlying assumption, as part of the way that the problem is phrased, that millions of Americans are falsely accused of rape.  He thinks that the false accusation rate applied to the population size gives you the number of false accusations, and doesn't recognize that this would only be true if everyone in the population were accused.

Stats is definitely counterintuitive sometimes.  This is not one of those cases.  This is just a poorly formed problem.
Title: Re: Statistics update
Post by: Caroline PF on October 05, 2018, 05:55:40 PM
I labeled those as hypo false cases in a year given the 5% false allegation rate. Not false accusations.

Here is the way to look at it, we KNOW there were 87000 cases reported, but based on the 5% false allegation rate, we calculate there COULD be 328425 cases in a year.

If we knew the case from the 87000 pile, then it ends there. Problem is we don't know. so we must consider the possibility it came from the pile outside of the 87000 too. Hence the 26%.

Iīm sorry, Iīm just not following you here. Could you explain the groups that these numbers are representing in plain english to me?

For instance, my understanding is that the number 87000 represents the total number of allegations (both true and false) in that year.

What group does the number 328425 represent?

To clarify my question further, my understanding of the number 328425, based on Dabnastyīs numbers is the number of false accusations (4350) in one year multiplied by 75.5 years. So it is the total number of false allegations in 75 years; or the number of false allegations in the US over an average maleīs lifespan, in order to calculate his odds of ever being falsely accused in his lifetime.

But you seem to be applying that number to a single year.
we calculate there COULD be 328425 cases in a year.

So Iīm confused.
number of accusations in one year / number of false accusations in 75 years
doesnīt seem to give a useful number.
Title: Re: Statistics update
Post by: sol on October 05, 2018, 06:41:16 PM
So Iīm confused.
number of accusations in one year / number of false accusations in 75 years
doesnīt seem to give a useful number.

It's not just you.  I'm not sure anisotropy understands what's going on here either, no matter how many times he says "I did it right".
Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 06:46:51 PM
The age/year is moot. As long as there is a non-zero rate that the "innocent specific" population could be falsely accused of, in this case 0.1%. Which incidentally tied back to laserlady's post in the original thread to produce a 5% overall fp rate, which I might write separately about on why and how they are related, because they are two sides of the same coin.

you can work backwards to find the number of hypothetical false positives in a given year IF you know the innocent specific fp rate and the population base.

once you have that, then true positive / (fp + tp) will give you the likelihood of a person (without knowing which group he belongs) being guilty when accused.

In short:
If you know the fp for criminal group, population breakdown, and fn, you can do bayes to get the answer

If you know the fp for innocent group, population base, and true positives, you can do it this way too. As long as a fp for innocent group exists, you are implicitly agreeing with the notion one could be accused by chance.

If your assumptions are valid for a given year, they will agree with each other. I have said all there is to say, I don't know what else to add.
Title: Re: Statistics update
Post by: Caroline PF on October 05, 2018, 08:42:29 PM
In short:
If you know the fp for criminal group, population breakdown, and fn, you can do bayes to get the answer

If you know the fp for innocent group, population base, and true positives, you can do it this way too. As long as a fp for innocent group exists, you are implicitly agreeing with the notion one could be accused by chance.

This is where I disagree with you. I donīt think we know the false positive rate.


You are saying that the 5% false allegation rate is equivalent to the false positive rate.

The rest of us are saying that the 5% false allegation rate is equivalent to 1-ppv.

Weīre not disagreeing with your bayesian math. Weīre disagreeing with your initial assumption of what the 5% is defining.


But, hey, letīs go back to the original numbers, and figure out how the original 5% that weīre fighting over was calculated.

Using google, I found the following publication which explains where they got the false accusation rate of 2-8%.
https://www.nsvrc.org/sites/default/files/Publications_NSVRC_Overview_False-Reporting.pdf (https://www.nsvrc.org/sites/default/files/Publications_NSVRC_Overview_False-Reporting.pdf)
Diving into the studies they based this publication on, I found some of the real life data that they used:
Quote
For example, in a multi-site study of 8 U.S. communities involved in the “Making
a Difference” (or “MAD”) Project, data were collected by law enforcement
agencies for all sexual assault reports received in an 18-24 month period. Of the
2,059 cases that were included in the study, 140 (7%) were classified as false.

For example, Clark and Lewis (1977) examined case files for all 116 rapes
investigated by the Toronto Metropolitan Police Department in 1970. As a result,
they concluded that seven cases involved (6%) false reports made by victims.

Grace, Lloyd, and Smith (1992) conducted a similar analysis of the evidence in all
348 rape cases reported to police in England and Wales during the first three
months of 1985. After reviewing the case files, reports from forensic examiners,
and the statements of victims and suspects, 8.3% were determined to constitute
false allegations.

A similar study was then again sponsored by the Home Office in 1996 (Harris &
Grace, 1999). This time, the case files of 483 rape cases were examined, and
supplemented with information from a limited number of interviews with sexual
assault victims and criminal justice personnel. However, the determination that a
report was false was made solely by the police. It is therefore not surprising that
the estimate for false allegations (10.9%) was higher than those in other studies
with a methodology designed to systematically evaluate these classifications.

The largest and most rigorous study that is currently available in this area is the
third one commissioned by the British Home Office (Kelly, Lovett, & Regan,
2005). The analysis was based on the 2,643 sexual assault cases (where the
outcome was known) that were reported to British police over a 15-year period of
time. Of these, 8% were classified by the police department as false reports. Yet
the researchers noted that some of these classifications were based simply on the
personal judgments of the police investigators, based on the victim’s mental
illness, inconsistent statements, drinking or drug use. These classifications were
thus made in violation of the explicit policies of their own police agencies. The
researchers therefore supplemented the information contained in the police files
by collecting many different types of additional data, including: reports from
forensic examiners, questionnaires completed by police investigators, interviews
with victims and victim service providers, and content analyses of the statements
made by victims and witnesses. They then proceeded to evaluate each case using
the official criteria for establishing a false allegation, which was that there must
be either “a clear and credible admission by the complainant” or “strong
evidential grounds” (Kelly, Lovett, & Regan, 2005). On the basis of this analysis,
the percentage of false reports dropped to 2.5%.

Finally, another large-scale study was conducted in Australia, with the 850 rapes
reported to the Victoria police between 2000 and 2003 (Heenan & Murray, 2006).
Using both quantitative and qualitative methods, the researchers examined 812
cases with sufficient information to make an appropriate determination, and found
that only 2.1% of these were classified as false reports.

All of the studies used a total number of accusations, and the number that were false. All rates were calculated by dividing false accusations by total accusations.

So, if accusations are positive tests, you now know the false positives and the true positives. You know nothing about the negative tests (those who were not accused)


You have been saying that the 5% is referencing the false positive rate. The definition of false positive rate is FP/(FP + TN).

In none of these studies did they have a true negative number, therefore they could not calculate a false positive rate. In real world terms, in order to calculate the false positive rate, we would need the number of innocent men accused divided by all the innocent men in that population. No study attempted to calculate this number.

All studies calculated the number of false positives divided by the number of total positives. Which is the definition of 1-ppv.

Title: Re: Statistics update
Post by: anisotropy on October 05, 2018, 09:33:25 PM
I have to think about what you said Caroline PF. But even if you are right, and 5% is indeed 1-PPV like you said, I think we can still do the calculation.

1 we know the population composition and 2. we know (assumed really) the false negative rate being crimes not reported. And this reverts back to what laserlady proposed some days ago.
           
                          positive                      negative
50                           17                              33                   this part doesn't change
950                           1                              949                 this changes and 1-PPV = 1/18 ~5%
                               18                             982

But when you do the problem like this, you can derive the fp for the innocent group. 1/950, and when we apply it to the general population with known 87000 reported (assumed to be true positive). We still get the probabilities of being guilty to be 20ish% given accused.

Title: Re: Statistics update
Post by: Caroline PF on October 05, 2018, 10:01:04 PM
I have to think about what you said Caroline PF. But even if you are right, and 5% is indeed 1-PPV like you said, I think we can still do the calculation.

1 we know the population composition and 2. we know (assumed really) the false negative rate being crimes not reported. And this reverts back to what laserlady proposed some days ago.
           
                          positive                      negative
50                           17                              33                   this part doesn't change
950                           1                              949                 this changes and 1-PPV = 1/18 ~5%
                               18                             982

But when you do the problem like this, you can derive the fp for the innocent group. 1/950, and when we apply it to the general population with known 87000 reported (assumed to be true positive). We still get the probabilities of being guilty to be 20ish% given accused.

Quick note: 87000 reported is total number accused in 2012. It is the total positive (equivalent to the 18 in your box), not just the true positive.

I agree with your derived false positive rate based on these numbers of 1/950.
But how do you get 20% probability being guilty given those numbers that you wrote? Can you show me the math you used?

The way I see it, if you are accused, you are by definition in the positive test group, and either a false positive (innocent) or a true positive (guilty). You cannot be a true negative or a false negative, because the negative group is the group that wasnīt accused of anything. 
Title: Re: Statistics update
Post by: anisotropy on October 06, 2018, 12:22:28 AM
Sorry I was out.

If like you say, 87000 is the total positive, then true positive for the year would be 87000/18*17 = 82166
With a false positive rate of 1/950, meaning, out of every 950 person 1 would be accused falsely.
If we apply that to the population 312M, of which 95% is not rapist. We would expect to get 312000 false accusations. 312M *0.95 /950.

The issue is that even with a low false accusation rate of 1/950, given the population base and relatively small amount of criminal ratio, the true positives will become only a small portion.

Your view is that if accused, you are automatically from either the rapist group or the non-rapist group. I agree with you here. We just don't know exactly which one.

Rapist group: We know these cases account for ~95% of the 87000. If we KNEW one is from the rapist group, he would be guilty.
Not-rapist group: We know these cases account for ~5% of the 87000. If we KNEW one is from the nonrapist group, he would be innocent.

So far I think we are in agreement. But now we need to deal with the ~0.1% of the false accusation rate.

I understand you may find it nonsensical to account for 312000 accusations which were not reported, but they are not fabricated. Because we are not dealing with cases any longer. If our knowledge were perfect or that we had 100% fpr / 0 false accusation, there would be no need to do this calculation, but we are not omniscient.

Each of the 82116 case is also an true positive accusation. Given one is accused (without knowing where it comes from, as we have no way of knowing which group the case came from), the odds of the case coming from the rapist group (hence guilty) is true positive accu/ (false positives accu + TP accu).

The odds of being from the accusation rapist group (therefore guilty) is 82166 / (312000 false accusations + 82166 true accu.) =21%. Note how 82166/87000 = ~95%. They are describing two different events.

One 95% of reported cases are guilty (rapist cases). Another is when accused (not a case), there is a 21% chance of being guilty. They are both true.

I think if we do the problem by treating the 5% as fpr instead of fp, we are making the distinction that we will be working with rates and would need to be tied back to the actual population and need to work with actual case#. With fp, we can work on it as long as the population composition is known.


Title: Re: Statistics update
Post by: former player on October 06, 2018, 01:41:54 AM

I understand you may find it nonsensical to account for 312000 accusations which were not reported, but they are not fabricated.

Well, yes.  If an accusation is not reported, it has not been made.  It is not an accusation.  (Is it a dead parrot?  Or is that just this thread?)

Do you ever stop to think logically about what you are writing?

You seem to be trying to treat these so-called 312000 false accusations as a double negative in order to bring them into the calculation.  Which is nonsense.  You can say "this would be a false accusation if it had been made".  You can't then say it has been made when it hasn't, which is what you are doing.


Jesus wept.
Title: Re: Statistics update
Post by: anisotropy on October 06, 2018, 01:47:30 AM
No, each case is an accusation, but the reverse is not true.

An accusation does not need to be a case. With 0.1% false accusation rate assumption that was built in, we need to apply that to the population. Otherwise, what does that number even mean?
Title: Re: Statistics update
Post by: former player on October 06, 2018, 02:05:58 AM
No, each case is an accusation, but the reverse is not true.

An accusation does not need to be a case. With 0.1% false accusation rate assumption that was built in, we need to apply that to the population. Otherwise, what does that number even mean?


Quite.  It doesn't mean anything because if an accusation has not been made it does not exist.

https://www.youtube.com/watch?v=DkQhK8O9Jik
Title: Re: Statistics update
Post by: Caroline PF on October 06, 2018, 08:32:18 AM

I understand you may find it nonsensical to account for 312000 accusations which were not reported, but they are not fabricated.

Well, yes.  If an accusation is not reported, it has not been made.  It is not an accusation. 

Former Player is right. This is nonsensical and impossible. The number of accusations are by definition the number of accusations reported to authorities. And the number of false allegations are the subset of that number that are shown to be false.

So the number of false allegations made in 2012 can be assumed to be around 4350, applying the 5% historical rate to the 87000.

Or the number could be a total of 312000 false accusations based on a 0.1% false positive rate that we calculated earlier, applied to the population. I agree with this math. If the FP rate is correct, this is the number of false allegations we should see.

But both of these numbers canīt be right. Which means we screwed up our math somewhere, and the false positive rate of 0.1% is wrong.


I was mulling it over last night, and I believe I know where we went wrong. The numbers for false positive rate that we calculated earlier are nonsensical because we changed units mid calculation. Let me explain.

         
                          positive                      negative
50                           17                              33                   this part doesn't change
950                           1                              949                 this changes and 1-PPV = 1/18 ~5%
                               18                             982

But when you do the problem like this, you can derive the fp for the innocent group. 1/950,

To start, we took a population of 1000 men, and applied an assumed statistic of 5% have assaulted in their lifetime to come up with 50 men who have assaulted sometime in the past, and 950 who have never assaulted.

So far, so good.

But then we took that 50 and changed it to 50 assaults, in order to apply the assumed statistic that 2/3 assaults are not reported. And came up with 17 reported assaults and 33 non-reported assaults.

But the number of men who have ever assaulted, and the number of assaults in a year are not the same number. This is where we screwed up.

So the 949, the true negative number we calculated, is nonsensical because it was derived from nonsense inputs.


So, Iīm going to try to find the false positive rate again, without making the same logical error.

Inputs: false allegations (1-ppv) of 5% and 2/3 of cases not reported.

                         positive                      negative
285                         95                             190                   
???                           5                              ???             
???                         100                            ???

I donīt think we have enough information to finish the derivation. If there are 285 assaults, how many non-assaults do we have in the same time frame? How would we even define non-assaults? Number of consensual sexual encounters? Number of attempted sexual encounters that failed? Number of days that an assault didnīt happen, multiplied by the at-risk population?

Okay, try again with the 2012 population numbers.
Inputs: total reported cases 87000, false allegation rate of 5%, and male population of 156M (312M / 2), 95% of whom are non-rapists.

                         positive                      negative
7.8M                    82650                           ???                   
148.2M                  4350                           ???             
156M                   87000                           ???

If the first column is total population of men, the second column would be total men accused in a year. The third column should thus be total men not accused this year, and can be calculated with simple subtraction.

                         positive                      negative
7.8M                    82650                           7.7M                   
148.2M                  4350                        148.2M             
156M                   87000                        155.9M

So the false positive rate in this example is 4350 / 148.2M = 0.003%
Title: Re: Statistics update
Post by: anisotropy on October 06, 2018, 09:33:05 AM
So.... based on your reasoning, given a person is accused, what are the odds they are guilty? just 95%? That doesnt seem very Bayesian to me, I am not saying you are wrong btw.

Ya i was thinking about it last night too. I think I will actually go to the local college next week and see if I could talk to a prof, not necessarily the same prof. I will present to them the following, just like i had done the first time. In addition to the questions below, I want to find out specifically which treatment of "5% as the false allegation rate" is appropriate, ie, as fp or fpr. And of course, the "Fabricated" false accusation in the population thing from the previous post. Oh the things I would do for knowledge. Stay tuned, thanks.

Now, I am going to apply the same method for "actually being a criminal". In this hypothetical case, the numbers are just an estimate with assumptions built in (let's assume they are appropriate), but that is not the main issue I am concerned about.
I am assuming 5% of the population are criminals. 5% as the false allegation rate. Assuming only about 1/3 of the crimes are reported, so I am using 2/3 as false negative rate.

The 2x2 table would look like this

                                  5%                             accused                          not accused
Criminals                    50                                  17                                   33
Not Criminals             950                                48                                  902
                                 1000                                65                                  935

so, 65 people are accused but only 17 people are actual criminals; that's 26% chance of a person being actual criminal when accused of a crime, despite the false allegation rate being only 5%.

Now my questions are (if you find this interesting enough to answer): Was my framing/formulation of the problem appropriate? What logic trap did I fall in which yielded this puzzling result? Under what circumstances can I or can't I set up the table like this?
Title: Re: Statistics update
Post by: Caroline PF on October 06, 2018, 10:20:51 AM
So.... based on your reasoning, given a person is accused, what are the odds they are guilty? just 95%? That doesnt seem very Bayesian to me, I am not saying you are wrong btw.

Yeah, it doesnīt seem very Bayesian, because it doesnīt require Bayesian calculations.


You are asking for the following: given someone is accused of a crime, what are the chances he is actually guilty? So what part of the Bayesian square are you looking to find? The false positive rate? The sensitivity? The specificity?

You are taking a positive test (accused of crime) and wanting to know what percentage are true positives (actually guilty). That is the Positive Predictive Value that you are looking for.

Quote
Positive predictive value is the probability that subjects with a positive screening test truly have the disease.
Negative predictive value is the probability that subjects with a negative screening test truly don't have the disease.

I already showed you that the real world data is calculating the 1-ppv, and calling it the Ļfalse accusation rateĻ.

So you are asking the following:
Given a 1-PPV of 5%, what is the PPV?

Simple calculation, which doesnīt require Bayesian calculations, despite being based on Bayesian numbers.


So if you do go talk to another professor, donīt focus on your calcuations, because you have been doing them right. Focus instead on asking about where the real world numbers fit in the grid, and which output that you are looking for. The example that you created for the professor, had correct math, but I believe that the assumptions were faulty. The professor responded that your math was correct, and that FPR and PPV were different numbers because they are different concepts. That is all correct. But he never addressed your assumptions, and thatīs the real question that we have been discussing in this thread.
Title: Re: Statistics update
Post by: maizefolk on October 06, 2018, 10:41:03 AM
So.... based on your reasoning, given a person is accused, what are the odds they are guilty? just 95%? That doesnt seem very Bayesian to me, I am not saying you are wrong btw.

You can certainly take a bayesian approach, but if you did you'd want to feed into it the assumptions of a population of about 250,000 rapists, each with a 1/3 chance of being accused, and about 159,750,000 innocent men, each with an approximately 1 in 40,000 chance of being accused.

This is based on the following assertions:

-160M men living in the USA
-87,000 accusations of rape
-5% of accusations of rape are later shown to be false
-1/3 of rapes are reported/result in accusations.

From those four assertions we can calculate:

1. The number of true positives (total accusations of rape * (1 - the proportion of accusations of rape which are false)).
2.  The number of false positives (total accusations of rape * the proportion of accusations of rape which are false)
3. The number of false negatives (number of true positives * 2) <-- since if 2/3rds of rapes are unreported then the number of false negatives must be 2x the number of true positives.
4. The number of true negatives (total men in the USA - the sum of the first three categories).

This gives a confusion matrix that looks like this.

Code: [Select]
                    Rapists             Non-rapists
Accused              82,650 (TP)              4,350 (FP)
Not Accused         165,300 (FN)       159,747,700 (TN)

So for non-rapists the chance that they are accused is 0.0000027* (or ~1 in 40,000) and for rapists the chance that they are accused of rape is .33 (or 1 in 3). If you plug this numbers into the Bayesian formula you're already using, you should get an answer which is correct (or at least as correct as the accuracy of the four assertions listed above).

This also simplifies by assuming a rapist will only attack one person per year. In reality, studies suggest that even among rapists and those who commit sexual assault, a small number of perpetrators are responsible for a large proportion of total assaults.

*This doesn't quite match Caroline PF's number just because we started with slightly different numbers of assumed total men.
Title: Re: Statistics update
Post by: Caroline PF on October 06, 2018, 10:47:53 AM
Here is how I would word the question to the statistics professor:


Assume there is a type of crime. When retrospective studies were done on accusations of this crime, they concluded that 5% of accusations turned out to be false accusations.

Let's also assume that 5% of the population will commit this crime in their lifetimes.

Further, let's assume that only 1/3 criminals will ever be accused of this crime in their lifetime.*

I am interested to know what the likelihood is of an innocent man being accused of this crime (false positive rate?), and what the likelihood is that the accused is guilty (positive predictive value?).

How would I fill in the following square?

                                   accused              not accused
criminal         50               ???                       ???
non-criminal 950               ???                       ???
                 1000               ???                       ???

In other words, let the professor fill the table in and see if it matches the numbers that you filled in.


* I made this number up. I have no idea what the real world number is. 2/3 rapes not reported is not the same as 2/3 rapists never reported. So the numbers derived here would not necessarily apply to the real world.
Title: Re: Statistics update
Post by: shenlong55 on October 06, 2018, 11:23:53 AM
So.... based on your reasoning, given a person is accused, what are the odds they are guilty? just 95%? That doesnt seem very Bayesian to me, I am not saying you are wrong btw.

Yeah, it doesnīt seem very Bayesian, because it doesnīt require Bayesian calculations.


You are asking for the following: given someone is accused of a crime, what are the chances he is actually guilty? So what part of the Bayesian square are you looking to find? The false positive rate? The sensitivity? The specificity?

You are taking a positive test (accused of crime) and wanting to know what percentage are true positives (actually guilty). That is the Positive Predictive Value that you are looking for.

Quote
Positive predictive value is the probability that subjects with a positive screening test truly have the disease.
Negative predictive value is the probability that subjects with a negative screening test truly don't have the disease.

I already showed you that the real world data is calculating the 1-ppv, and calling it the Ļfalse accusation rateĻ.

So you are asking the following:
Given a 1-PPV of 5%, what is the PPV?

Simple calculation, which doesnīt require Bayesian calculations, despite being based on Bayesian numbers.


So if you do go talk to another professor, donīt focus on your calcuations, because you have been doing them right. Focus instead on asking about where the real world numbers fit in the grid, and which output that you are looking for. The example that you created for the professor, had correct math, but I believe that the assumptions were faulty. The professor responded that your math was correct, and that FPR and PPV were different numbers because they are different concepts. That is all correct. But he never addressed your assumptions, and thatīs the real question that we have been discussing in this thread.

Thank you!
Title: Re: Statistics update
Post by: anisotropy on October 06, 2018, 12:52:22 PM
The professor said my "framing was correct", which I understood to be my assumptions were valid. But I will check again as you suggested.


In other words, let the professor fill the table in and see if it matches the numbers that you filled in.


That's a fair suggestion, thanks.
Title: Re: Statistics update
Post by: anisotropy on October 06, 2018, 02:38:29 PM
UPDATE

I am now 99% sure I have been wrong this whole time. I sincerely apologize to all. It was not my intention to be malicious or gross, I had genuinely believed what I said was true. The odds of being guilty off a single accusation is not 26%, it's much higher. I am sorry.

I want to thank Caroline PF and Maize especially as their step-by-step guidance made my error glaringly clear (even to a halfwit such as myself).

Regards,
Title: Re: Statistics update
Post by: Caroline PF on October 06, 2018, 03:19:06 PM
Anisotropy,

Thank you for being willing to engage in discussion with me. I am not a statistics expert by any means, and I found that I learned a lot about statistics through my exchanges with you.

Maybe I'll even improve my scores on my re-certification exams, as they usually include questions on bayesian statistics. :)
Title: Re: Statistics update
Post by: former player on October 06, 2018, 03:36:44 PM
Thank you for the apology. Many would not have had the guts to make one, even anonymously on the internet, and I appreciate that you have done it.
Title: Re: Statistics update
Post by: shenlong55 on October 06, 2018, 04:40:01 PM
Thank you for the apology. Many would not have had the guts to make one, even anonymously on the internet, and I appreciate that you have done it.

+1
Title: Re: Statistics update
Post by: shuffler on October 06, 2018, 05:08:06 PM
Thank you for the apology. Many would not have had the guts to make one, even anonymously on the internet, and I appreciate that you have done it.

+1
+1

... though perhaps you should also voice it, briefly, on the BK thread.
Your "vindicated by the professor" post over there was ... rather over the top.
Title: Re: Statistics update
Post by: anisotropy on October 06, 2018, 05:36:10 PM
Thank you for the apology. Many would not have had the guts to make one, even anonymously on the internet, and I appreciate that you have done it.

+1
+1

... though perhaps you should also voice it, briefly, on the BK thread.
Your "vindicated by the professor" post over there was ... rather over the top.

Done
Title: Re: Statistics update
Post by: shuffler on October 06, 2018, 05:54:51 PM
Thank you for the apology. Many would not have had the guts to make one, even anonymously on the internet, and I appreciate that you have done it.

+1
+1

... though perhaps you should also voice it, briefly, on the BK thread.
Your "vindicated by the professor" post over there was ... rather over the top.

Done
Forthright.  Well done.  Thanks.
Title: Re: Statistics update
Post by: golden1 on October 06, 2018, 06:58:35 PM
Heh, I have been reading along with fascination.

Anisotropy has done this before in arguments before, when we were talking about racism and he refused to deal with any history prior to some arbitrary date that he specified.  He has a need to control the discussion and to be correct, and can only do that by irrationally defining his argument to some arbitrary standard.

It’s honestly not worth arguing here.  I would not trust this person to be able to frame the question to the professor correctly or without bias.

What an odd person.
Title: Re: Statistics update
Post by: anisotropy on October 06, 2018, 07:36:43 PM
Heh, I have been reading along with fascination.

Anisotropy has done this before in arguments before, when we were talking about racism and he refused to deal with any history prior to some arbitrary date that he specified.  He has a need to control the discussion and to be correct, and can only do that by irrationally defining his argument to some arbitrary standard.

It’s honestly not worth arguing here.  I would not trust this person to be able to frame the question to the professor correctly or without bias.

What an odd person.

We did not discuss "racism" per se, rather, more about how progressive helped Trump win. I did not want to conflate that issue with race related topics because those should be discussed on their own.

I have been reading a slew of books from your list, including Stamped. I have not changed my views expressed in the progressive helped Trump win thread.

Like everyone else, I won't know I am wrong until I do.

This one was a serious mea culpa. I mistook P(A|B) for P(B|A), which was not correct.
Title: Re: Statistics update
Post by: Gin1984 on October 06, 2018, 08:52:42 PM
Here is how I would word the question to the statistics professor:


Assume there is a type of crime. When retrospective studies were done on accusations of this crime, they concluded that 5% of accusations turned out to be false accusations.

Let's also assume that 5% of the population will commit this crime in their lifetimes.

Further, let's assume that only 1/3 criminals will ever be accused of this crime in their lifetime.*

I am interested to know what the likelihood is of an innocent man being accused of this crime (false positive rate?), and what the likelihood is that the accused is guilty (positive predictive value?).

How would I fill in the following square?

                                   accused              not accused
criminal         50               ???                       ???
non-criminal 950               ???                       ???
                 1000               ???                       ???

In other words, let the professor fill the table in and see if it matches the numbers that you filled in.


* I made this number up. I have no idea what the real world number is. 2/3 rapes not reported is not the same as 2/3 rapists never reported. So the numbers derived here would not necessarily apply to the real world.
Just a side note here, when psychology researchers discribes rape, 30% of the undergraduate male students in the studies (on average) admit to raping.  So maybe using number 5% may not be good idea.
Title: Re: Statistics update
Post by: golden1 on October 07, 2018, 07:05:16 AM
Right, and you refused to incorporate any facts into your world view prior to 1964.

I have no proof that you actually are reading anything, just as we have no proof that you are talking to professors. This could all be trolling or just virtue signalling on your part.

The thing that really gets me about you is the fucking arrogance.  Anyone who doesn't agree with you is less intelligent or "doesn't get it".  Maybe you have high IQ, but you seem to struggle with EQ quite a bit.  If you are trying to sway people to your arguments, you seem to be failing, pretty dramatically.  You might want to do some introspection on why that is. 
Title: Re: Statistics update
Post by: radram on October 07, 2018, 09:37:42 AM
UPDATE
I want to thank Caroline PF and Maize especially as their step-by-step guidance made my error glaringly clear (even to a halfwit such as myself).

Anyone else wonder why Sol got absolutely nothing. He tried 3 ways from Sunday to explain what you were doing, and how to do it correctly. When you finally took his advice and posed another question to your expert, they "showed you the way".

Even if you hate Sol for other reasons, I found it a little rude to ignore his efforts.

And thank you for publicly apologizing here AND the original thread. That just does not happen anymore, and should.

I must admit, I enjoy very much playing with numbers, but this discussion was creeping me out. I am glad it was moved to another thread.

Title: Re: Statistics update
Post by: anisotropy on October 07, 2018, 10:10:40 AM
Right, and you refused to incorporate any facts into your world view prior to 1964.

I have no proof that you actually are reading anything, just as we have no proof that you are talking to professors. This could all be trolling or just virtue signalling on your part.

The thing that really gets me about you is the fucking arrogance.  Anyone who doesn't agree with you is less intelligent or "doesn't get it".  Maybe you have high IQ, but you seem to struggle with EQ quite a bit.  If you are trying to sway people to your arguments, you seem to be failing, pretty dramatically.  You might want to do some introspection on why that is.

lol ok
Title: Re: Statistics update
Post by: anisotropy on October 07, 2018, 10:13:21 AM
UPDATE
I want to thank Caroline PF and Maize especially as their step-by-step guidance made my error glaringly clear (even to a halfwit such as myself).

Anyone else wonder why Sol got absolutely nothing. He tried 3 ways from Sunday to explain what you were doing, and how to do it correctly. When you finally took his advice and posed another question to your expert, they "showed you the way".

Even if you hate Sol for other reasons, I found it a little rude to ignore his efforts.

And thank you for publicly apologizing here AND the original thread. That just does not happen anymore, and should.

I must admit, I enjoy very much playing with numbers, but this discussion was creeping me out. I am glad it was moved to another thread.

I do not hate Sol or anything like that. I thanked CPF and Maize because it was specifically through their replies I realized the mistake I had made. I did not receive the same "benefits" from reading Sol's posts, unfortunately.

Regardless, Sol was right, what I originally said was indeed counterfactual. I am sorry Sol.

Title: Re: Statistics update
Post by: sol on October 07, 2018, 10:30:56 AM
I am sorry Sol.

No worries.  I'm glad you came around.  If my explanations weren't clear enough to convince you, I have no hard feelings about someone else doing a better job than I did. 

Despite having this math questions resolved, I think it's important to take a moment to reflect on this discussion and the way it went down, because you made a lot of people very uncomfortable with your absolute insistence that sexual assault survivors are mostly liars.  It's possible to be mathematically correct and still a huge asshole, and admitting your math mistake in this instance doesn't change that.
Title: Re: Statistics update
Post by: anisotropy on October 07, 2018, 12:05:20 PM
I was a huge asshole.

I was a huge asshole precisely because I insisted what was counterfactual to be "objectively true". I inadvertently spread lies and created unjustified doubt and confusion. I became someone I despise the most.

There was a huge problem with my treatment of false allegation rate, as many had pointed out. But I did not understand at the time.

As golden1 pointed out, hubris played a big role, I have no excuse for that.
Title: Re: Statistics update
Post by: Watchmaker on October 08, 2018, 02:49:55 PM
Wow, that was quite a thread.

I'm glad several of you stuck it out with anisotropy to help him understand his mistake. Kudos to Caroline PF and Maize for finally getting through, but also for Sol and the others for continuing to try. (I also appreciate how this conversation was moved to a new thread when people objected to it in the BK one).

Anisotropy -  Admitting your error was a good move (and I imagine a hard one). It doesn't cancel out the fact that your were 1) very incredibly wrong 2) arrogant 3) and a jerk (these last two would have been true even if your math wasn't wrong). I truly hope you learn from this and use it as an opportunity to grow.

Jeez, you people actual have me feeling (a tiny, tiny bit) optimistic.
Title: Re: Statistics update
Post by: anisotropy on October 08, 2018, 03:47:47 PM

Anisotropy -  Admitting your error was a good move (and I imagine a hard one). It doesn't cancel out the fact that your were 1) very incredibly wrong 2) arrogant 3) and a jerk (these last two would have been true even if your math wasn't wrong). I truly hope you learn from this and use it as an opportunity to grow.


Nothing cancels past actions out to those affected.

I always admit my errors freely when proven wrong, it is easy for me to do. I have no baggage and no identity in an argument. The only thing that matters is if the situation/argument is factual.

Therefore, being arrogant and a jerk is often a inexorable outcome from where I stand, for truth must be defended no matter what.

In the case that I am wrong (like this one), I simply switch side with the same (if not higher) level of conviction. I do what I think is right, people's perception (of me) mean little to me. That being said, I do need to keep my hubris in check, I succumb to it from time to time.
Title: Re: Statistics update
Post by: sol on October 08, 2018, 03:56:28 PM
Jeez, you people actual have me feeling (a tiny, tiny bit) optimistic.

I got the exact opposite impression.  Anisotropy continued to argue that sexual assault victims are mostly liars right through the nationally televised testimony of a sexual assault victim, up until the moment the perpetrator was confirmed to the supreme court and the whole thing became moot, then changed his mind. 

Maybe that was a genuine change of heart, but it sure does look like awfully convenient timing.
Title: Re: Statistics update
Post by: Watchmaker on October 08, 2018, 03:59:09 PM

Anisotropy -  Admitting your error was a good move (and I imagine a hard one). It doesn't cancel out the fact that your were 1) very incredibly wrong 2) arrogant 3) and a jerk (these last two would have been true even if your math wasn't wrong). I truly hope you learn from this and use it as an opportunity to grow.


Nothing cancels past actions out to those affected.

I always admit my errors freely when proven wrong, it is easy for me to do. I have no baggage and no identity in an argument. The only thing that matters is if the situation/argument is factual.

Therefore, being arrogant and a jerk is often a inexorable outcome from where I stand, for truth must be defended no matter what.

In the case that I am wrong (like this one), I simply switch side with the same (if not higher) level of conviction. I do what I think is right, people's perception (of me) mean little to me. That being said, I do need to keep my hubris in check, I succumb to it from time to time.


I think you're missing a couple lessons here. Namely:

Perhaps the fact that you can be so astonishing wrong about such a simple thing that you claimed to understand well means that you should approach your future opinions more humbly.

And there's the issue that, even if you had been correct, you brought your "facts" to the wrong audience, in the wrong way, at the wrong time. You hurt people needlessly. You can serve truth and be kind at the same time. This time, you managed neither. Next time, try for both.

...

And there goes my optimism.  That was short lived.
Title: Re: Statistics update
Post by: anisotropy on October 08, 2018, 04:03:10 PM

Maybe that was a genuine change of heart, but it sure does look like awfully convenient timing.

lol ok. It was right here

Quote
Simple calculation, which doesnīt require Bayesian calculations, despite being based on Bayesian numbers.

and

Quote
This gives a confusion matrix that looks like this.

Code: [Select]

                    Rapists             Non-rapists
Accused              82,650 (TP)              4,350 (FP)
Not Accused         165,300 (FN)       159,747,700 (TN)


So for non-rapists the chance that they are accused is 0.0000027* (or ~1 in 40,000) and for rapists the chance that they are accused of rape is .33 (or 1 in 3).

I started to see where I went wrong. I wish I had known better and understood sooner. But that's the nature of arguing on the internet via posts. If we had been face to face, someone would have pointed out my error ie, I took P(A|B) to be P(B|A), very quickly in a convincing manner.
Title: Re: Statistics update
Post by: Caroline PF on October 08, 2018, 06:44:12 PM
Therefore, being arrogant and a jerk is often a inexorable outcome from where I stand, for truth must be defended no matter what.
I started to see where I went wrong. I wish I had known better and understood sooner. But that's the nature of arguing on the internet via posts. If we had been face to face, someone would have pointed out my error ie, I took P(A|B) to be P(B|A), very quickly in a convincing manner.

So are you saying it's Sol's fault for not being convincing enough? /s


Read through the thread again, now that you know what your mistake was. Multiple people tried in multiple ways to explain it to you. I honestly don't know why you listened to me. I merely restated PathToFire's statements.

Wanting to defend the truth is noble, but you need to be absolutely sure you have the truth. And that is where the humility comes in, because you can never be absolutely sure. You need to be able to say to yourself, "I'm pretty sure I'm right, but there's a chance I'm wrong. And the best way to find out if I'm wrong is by listening to other people and trying to understand it from their point of view."

The problem was not the internet vs face-to-face. The problem was listening.
Title: Re: Statistics update
Post by: anisotropy on October 08, 2018, 09:35:16 PM
Quote
The problem was listening.

👍

It just didn't come through for some reason. The fault was entirely mine.
Title: Re: Statistics update
Post by: Villanelle on October 10, 2018, 02:07:43 AM

Anisotropy -  Admitting your error was a good move (and I imagine a hard one). It doesn't cancel out the fact that your were 1) very incredibly wrong 2) arrogant 3) and a jerk (these last two would have been true even if your math wasn't wrong). I truly hope you learn from this and use it as an opportunity to grow.


Nothing cancels past actions out to those affected.

I always admit my errors freely when proven wrong, it is easy for me to do. I have no baggage and no identity in an argument. The only thing that matters is if the situation/argument is factual.

Therefore, being arrogant and a jerk is often a inexorable outcome from where I stand, for truth must be defended no matter what.

In the case that I am wrong (like this one), I simply switch side with the same (if not higher) level of conviction. I do what I think is right, people's perception (of me) mean little to me. That being said, I do need to keep my hubris in check, I succumb to it from time to time.


I think you're missing a couple lessons here. Namely:

Perhaps the fact that you can be so astonishing wrong about such a simple thing that you claimed to understand well means that you should approach your future opinions more humbly.

And there's the issue that, even if you had been correct, you brought your "facts" to the wrong audience, in the wrong way, at the wrong time. You hurt people needlessly. You can serve truth and be kind at the same time. This time, you managed neither. Next time, try for both.

...

And there goes my optimism.  That was short lived.

This.  So much this.  Read this and attempt to internalize it.  Because frankly, I get the sense you are still too arrogant and proud to see that there were errors here beyond just an incorrectly applied formula or an inability to understand the logic behind a calculation.   The errors in empathy and compassion and tone and listening are far more egregious in my book that the failure to use the correct math formula, but you don't really address those.  You apologize for the math.  Had your math been correct, frankly, you were still an insensitive ass. 
Title: Re: Statistics update
Post by: sol on October 15, 2018, 02:53:58 PM
someone would have pointed out my error ie, I took P(A|B) to be P(B|A)

For all of the math nerds that so loved this thread, today's xkcd is about anisotropy's mea culpa:   https://xkcd.com/2059/

(https://imgs.xkcd.com/comics/modified_bayes_theorem.png)
Title: Re: Statistics update
Post by: RetiredAt63 on October 16, 2018, 02:05:57 PM
Oh my, the p(C) is a killer, isn't it?