In all this discussion I am reminded of the change in orchestra hiring when candidates played for the hiring people behind a screen. Suddenly women were hired. Before no one thought they could play some instruments (cello for one) well. I forget where I read this, but it was certainly an "aha" moment.
We have all seen the accounts where papers/reports that were identical except for names were judged better if the name was a male name, right? And online TAs were evaluated better if the name was male, even though they were actually the same person.
And on these forums I have seen people be extremely surprised when someone with a neutral name turned out to be female - they were assumed to be male because of interests and writing style.
All this to say that we all have lots of built-in assumptions that we aren't aware of, they were embedded in us before we were old enough to analyze them. And they are culture specific - I had an aha moment of my own for one of The Last Jedi characters.
The Moss-Racusin study definitely falls in this category, likely most known too. I think it has merits, but as I mentioned in my reply to Toque, I wish the set up were more rigorous and the study pool bigger.
The TA-rating study set up was
even worse, the four study groups had a grand total of 43 students (table 1 in paper), with each group averaging around 20 students to make the rating (note to people unfamiliar with the process, this is how you critique studies in a scientific manner, you do it from the feet up, starting with experimental design).
https://www.researchgate.net/publication/269288475_What%27s_in_a_Name_Exposing_Gender_Bias_in_Student_Ratings_of_TeachingThe students were from the university of north Carolina, a better design would be to include another group from elsewhere. Yes it's online, but odds are the students were predominately from North Carolina.
The grading scale itself was also problematic, students were given choice of 1-5, which no choice of scores in between, ie, no 2.5 allowed. In this set up, each score off creates 20% difference in final results. Preferably, the scoring schemes should be at least 1-10.
When you look at the actual data, most people will cry foul on how the "perceived female" TA got the shaft and scored lower than the "perceived male" TA, but COMPLETELY IGNORE the score differences in "real" female and male TA were minimal.
I quote the paper itself:
"When looking at the individual questions as well as the student ratings index, there are no significant differences between the ratings of the actual male and female instructor." pg 298In fact, the only rating that was "significantly" (that is suspect also) below the mean was the perceived female TA, if it were truly purely gender-based bias, we would expect
BOTH perceived and real female TA to receive poor scores. That
did not happen. What about the fact that female instructor consistently scored better than the male instructor, which suggested the male instructor was a weaker instructor overall, and partially bears the blame for the perceived female TA's low scores?
This idea is further supported by comparing the histogram (Figure 1), we will notice the perceived male and real female (same person), had very similar scores and the confidence interval (error bars) overlap by quite a bit, we notice the same effect when we compare perceived female and real male (same person), notice the error bars also overlapped.
Most people touting the validity of this study seem to ignore or didn't even notice this, and instead focus on an ideology fueled march.
The results actually suggests that there is some disconnect in terms of quality of students of the perceived female TA group, and the quality of performances between the REAL male and female TAs. Whatever pro gender bias conclusion one can draw from this study is extremely weak.
I know most of you probably don't understand or don't even care why the things I mentioned are important, it is extremely important, because if we hope to fix the problem, we have to identify the right cause, and not just a blanket gender bias sticker in all circumstances.
Don't just group think and follow the crowd. Read the studies yourself, critique the set-up, the method, the data, be a jerk and ask tough questions, that's how we fix things in the long run.
ps.
GuitarStv, recall my suggestion that tech might one day fix the trait-based problem and level the playing field, do you think this might be an experiment to that idea? As the real female TA consistently outscored the real male TA.