Hey GuitarStv,
I struggled to reject the null hypothesis of each category with the raw data for like an hour..... I was checking if I made a typo then I saw footnote 4 from the paper:
"We have used a significance level of .10 for some tests where: 1) the results support the hypothesis and we are consequently more willing to reject the null hypothesis of no difference; 2) our hypothesis is strongly supported theoretically and by empirical results in other studies that use lower significance levels; 3) our small n may be obscuring large differences; and 4) the gravity of an increased risk of Type I error is diminished in light of the benefit of decreasing the risk of a Type II error. " p.288
err what? I don't buy the reasons especially when they used 0.05 for some scores and 0.10 for others. That's like saying we will lower our "significance" standard if it agrees with our hypothesis and if it agrees with other studies that we like. Given how small the n already is, this arbitrary decision to use 0.10 instead of .05 for some scores really make the whole study much weaker.
Then I found this: The paper states perceived male vs female reaches the 0.05 threshold (pg. 298), but in the actual histogram it reverts back to only 0.10. (pg.299) WHAT?
Using 0.05 threshold, as is common to pretty much all studies, in my effort to identify an outlier (in the aggregate), I found none. Namely, no instructor bias and no gender bias.....
To reiterate, the difference only shows when we compare ONLY perceived female vs perceived male (at 0.10), but disappears when we compare Actual male vs Actual female, or when we compare perceived female vs aggregate.
If it were truly gender based bias, we would expect the female identities to consistently score lower in all
1. Actual female vs Actual male
2. Actual female vs perceived male
3. Perceived female vs perceived male (the paper concedes it does not reach 0.05, pg.299)
Which are all absent, in conclusion, the author's conclusion is quite weak in my opinion.
On the other hand, the Female TA got an avg score of 4.16 (0.98) while the Male TA got 3.82 (1.11), which, is not significant at 0.05, but if we choose to use their 0.10 threshold, actually is significant, lol.
ps. sorry Guitar I missed your post, yes the scores changed, but we have to remember the change could be due to different groups of students making the ratings or random variation since N is so small to being with. In fact, these changes you noted, are not significant enough to pass the 0.05 null test.
Regarding:
If there was no bias, you would expect the perceived female to decrease by about 1.9% and the percieved male to increase by about 1.9%. That would mirror the initial instructor assessments for the two when sex was known
I don't understand, can you elaborate please. Why would the score of perceived female decrease and score of perceived male increase? I am saying there is a bias, and the bias is actually against the instructor (quality of work), not gender. Provided that we lower our standard to using the 0.10 threshold.