Authors:
(1) Amador Durán, SCORE Lab, I3US Institute, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(2) Pablo Fernández, SCORE Lab, I3US Institute, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(3) Beatriz Bernárdez, I3US Institute, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(4) Nathaniel Weinman, Computer Science Division, University of California, Berkeley, Berkeley, CA, USA ([email protected]);
(5) Aslı Akalın, Computer Science Division, University of California, Berkeley, Berkeley, CA, USA ([email protected]);
(6) Armando Fox, Computer Science Division, University of California, Berkeley, Berkeley, CA, USA ([email protected]).
5 Execution Plan and 5.1 Recruitment
5.2 Training and 5.3 Experiment Execution
Acknowledgments and References
Context. Pair programming has been found to increase student interest in Computer Science, particularly so for women, and would therefore appear to be a way to help remedy the under–representation of women in the field. However, one reason for this under– representation is the unwelcoming climate created by gender stereotypes applied to engineers in general, and to software engineers in particular, assuming that men perform better than their women peers. If this same bias is present in pair programming, it could work against the goal of improving gender balance in computing. Objective. In a remote setting in which students cannot directly observe the gender of their peers, we aim to explore whether Software Engineering students behave differently when the perceived gender of their remote pair programming partners changes, searching for differences in (i) the perceived productivity compared to solo programming; (ii) the partner’s perceived technical competency compared to their own; (iii) the partner’s perceived skill level; (iv) the interaction behavior, such as the frequency of source code additions, deletions, validations, etc.; and (v) the type and relative frequencies of dialog messages used for collaborative behavior in a chat window. Although there are some studies on pair programming performance and gender pair combination, to the best of our knowledge there are no studies on the impact of gender stereotypes and bias within the pairs themselves. Method. We have developed an online platform (twincode) that randomly classifies students into gender–balanced groups, arranges them in pairs for remote pair programming (sharing an editor window and a chat window), and can selectively deceive one or both partners regarding the gender of the other via the use of a clearly gendered avatar. Several behaviors are automatically measured during the pair programming process, together with two questionnaires and a semantic tagging of the pairs’ conversations. We will perform a series of experiments to identify the effect, if any, of possible gender bias in remote pair programming interactions. Students in the control group will have no information about their partner’s gender; students in the treatment group will receive such information but will be selectively deceived about their partner’s true gender. To analyze the data, apart from checking reliability of questionnaire data using Cronbach’s alpha and Kaiser criterion, for each response variable we will (i) compare control and experimental groups for the score distance between two in–pair tasks; then, using the data from the experimental group only, we will (ii) compare scores using the partner’s perceived gender as a within–subjects variable; and (iii) analyze the interaction between the partner’s perceived gender (within–subjects) and the subject’s gender (between–subjects). For the (i) and (ii) analyses we will use t–tests, whereas for the (iii) analyses we will use mixed–model ANOVAs.
Pair programming is an increasingly popular collaboration paradigm that has been shown to be an effective tool in Computer Science education as measured by positive influence on grades, class performance, confidence, productivity, and motivation to stay [6], especially for women [19, 24]. In pair programming, two partners work closely together to solve a programming task. As such, their ability to engage with each other is key. However, these interactions are influenced by implicit gender bias [12, 18], such as assuming women are less technically competent [18]. This is a widely observed phenomenon even in highly–structured settings [6, 13]. Social sciences research indicates that one’s behavior of an individual is affected by the behavior of their peers [8]. Therefore, implicit gender bias based on perception of peers may have effects on one’s behavior, potentially influencing pair programming experience.
In this work, in a non–colocated (i.e. remote) environment in which the gender of the peers cannot be directly observed, our goal is to explore whether Software Engineering students change their behavior when the perceived gender of their remote pair programming partners changes from man to woman or vice versa. Note that, while we recognize that many students may identify as neither men nor women, our initial exploration focuses primarily on interactions between students who identify as one of these, so that we can better align our findings with the existing literature on implicit gender bias. The potential biases in interactions involving gender-fluid, non-gender-conforming, or nonbinary students is a rich and complex topic deserving its own subsequent study.
To achieve our goal, we plan to search for differences not only in the perceived productivity of pair programming compared to solo programming, the partner’s perceived technical competency compared to their own, and the partners’ perceived skill level, but also in the interaction behavior, i.e. the frequency of source code additions, deletions, validations, etc., and the type and relative frequency of dialog messages used for collaborative behavior.
To get early feedback on the infrastructure supporting our proposal, we ran two pilot studies, one at each university, with a limited number of students, where we could check the comprehensibility of the questionnaires used to gather subjective data, the applicability of the message tagging (described in Section 2), and the capabilities of the twincode platform, which is briefly described below.