Tag Archives: statistics

Exploring Faster Screening with Fewer Tests via Bayesian Group Testing



How does one find a needle in a haystack? At the turn of World War II, that question took on a very concrete form when doctors wondered how to efficiently detect diseases among those who had been drafted into the war effort. Inspired by this challenge, Robert Dorfman, a young statistician at that time (later to become Harvard professor of economics), proposed in a seminal paper a 2-stage approach to detect infected individuals, whereby individual blood samples first are pooled in groups of four before being tested for the presence or absence of a pathogen. If a group is negative, then it is safe to assume that everyone in the group is free of the pathogen. In that case, the reduction in the number of required tests is substantial: an entire group of four people has been cleared with a single test. On the other hand, if a group tests positive, which is expected to happen rarely if the pathogen’s prevalence is small, at least one or more people within that group must be positive; therefore, a few more tests to determine the infected individuals are needed.
Left: Sixteen individual tests are required to screen 16 people — only one person’s test is positive, while 15 return negative. Right: Following Dorfman’s procedure, samples are pooled into four groups of four individuals, and tests are executed on the pooled samples. Because only the second group tests positive, 12 individuals are cleared and only those four belonging to the positive group need to be retested. This approach requires only eight tests, instead of the 16 needed for an exhaustive testing campaign.
Dorfman’s proposal triggered many follow-up works with connections to several areas in computer science, such as information theory, combinatorics or compressive sensing, and several variants of his approach have been proposed, notably those leveraging binary splitting or side knowledge on individual infection probability rates. The field has grown to the extent that several sub-problems are recognized and deserving of an entire literature on their own. Some algorithms are tailored for the noiseless case in which tests are perfectly reliable, whereas some consider instead the more realistic case where tests are noisy and may produce false negatives or positives. Finally, some strategies are adaptive, proposing groups based on test results already observed (including Dorfman’s, since it proposes to re-test individuals that appeared in positive groups), whereas others stick to a non-adaptive setting in which groups are known beforehand or drawn at random.

In “Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design”, we present an approach to group testing that can operate in a noisy setting (i.e., where tests can be mistaken) to decide adaptively by looking at past results which groups to test next, with the goal to converge on a reliable detection as quickly, and with as few tests, as possible. Large scale simulations suggest that this approach may result in significant improvements over both adaptive and non-adaptive baselines, and are far more efficient than individual tests when disease prevalence is low. As such, this approach is particularly well suited for situations that require large numbers of tests to be conducted with limited resources, as may be the case for pandemics, such as that corresponding to the spread of COVID-19. We have open-sourced the code to the community through our GitHub repo.

Noisy and Adaptive Group Testing in a Non-Asymptotic Regime
A group testing strategy is an algorithm that is tasked with guessing who, among a list of n people, carries a particular pathogen. To do so, the strategy provides instructions for pooling individuals into groups. Assuming a laboratory can execute k tests at a time, the strategy will form a kn pooling matrix that defines these groups. Once the tests are carried out, the results are used to decide whether sufficient information has been gathered to determine who is or is not infected, and if not, how to form new groups for another round of testing.

We designed a group testing approach for the realistic setting where the testing strategy can be adaptive and where tests are noisy — the probability that the test of an infected sample is positive (sensitivity) is less than 100%, as is the specificity, the probability that a non-infected sample returns negative.

Screening More People with Fewer Tests Using Bayesian Optimal Experimental Design
The strategy we propose proceeds the way a detective would investigate a case. They first form several hypotheses about who may or may not be infected, using evidence from all tests (if any) that have been carried out so far and prior information on the infection rate (a). Using these hypotheses, our detectives produce an actionable item to continue the investigation, namely a next wave of groups that may help in validating or invalidating as many hypotheses as possible (b), and then loop back to (a) until the set of plausible hypotheses is small enough to unambiguously identify the target of the search. More precisely,
  1. Given a population of n people, an infection state is a binary vector of length n that describes who is infected (marked with a 1), and who is not (marked with a 0). At a certain time, a population is in a given state (most likely a few 1’s and mostly 0’s). The goal of group testing is to identify that state using as few tests as possible. Given a prior belief on the infection rate (the disease is rare) and test results observed so far (if any), we expect that only a small share of those infection states will be plausible. Rather than evaluating the plausibility of all 2n possible states (an extremely large number even for small n), we resort to a more efficient method to sample plausible hypotheses using a sequential Monte Carlo (SMC) sampler. Although quite costly by common standards (a few minutes using a GPU in our experimental setup), we show in this work that SMC samplers remain tractable even for large n, opening new possibilities for group testing. In short, in return for a few minutes of computations, our detectives get an extensive list of thousands of relevant hypotheses that may explain tests observed so far.

  2. Equipped with a relevant list of hypotheses, our strategy proceeds, as detectives would, by selectively gathering additional evidence. If k tests can be carried out at the next iteration, our strategy will propose to test k new groups, which are computed using the framework of Bayesian optimal experimental design. Intuitively, if k=1 and one can only propose a single new group to test, there would be clear advantage in building that group such that its test outcome is as uncertain as possible, i.e., with a probability that it returns positive as close to 50% as possible, given the current set of hypotheses. Indeed, to progress in an investigation, it is best to maximize the surprise factor (or information gain) provided by new test results, as opposed to using them to confirm further what we already hold to be very likely. To generalize that idea to a set of k>1 new groups, we score this surprise factor by computing the mutual information of these “virtual” group tests vs. the distribution of hypotheses. We also consider a more involved approach that computes the expected area under the ROC curve (AUC) one would obtain from testing these new groups using the distribution of hypotheses. The maximization of these two criteria is carried out using a greedy approach, resulting in two group selectors, GMIMAX and GAUCMAX (greedy maximization of mutual information or AUC, respectively).
The interaction between a laboratory (wet_lab) carrying out testing, and our strategy, composed of a sampler and a group selector, is summarized in the following drawing, which uses names of classes implemented in our open source package.
Our group testing framework describes an interaction between a testing environment, the wet_lab, whose pooled test results are used by the sampler to draw thousands of plausible hypotheses on the infection status of all individuals. These hypotheses are then used by an optimization procedure, group_selector, that figures out what groups may be the most relevant to test in order to narrow down on the true infection status. Once formed, these new groups are then tested again, closing the loop. At any point in the procedure, the hypotheses formed by the sampler can be averaged to obtain the average probability of infection for each patient. From these probabilities, a decision on whether a patient is infected or not can be done by thresholding these probabilities at a certain confidence level.
Benchmarking
We benchmarked our two strategies GMIMAX and GAUCMAX against various baselines in a wide variety of settings (infection rates, test noise levels), reporting performance as the number of tests increases. In addition to simple Dorfman strategies, the baselines we considered included a mix of non-adaptive strategies (origami assays, random designs) complemented at later stages with the so-called informative Dorfman approach. Our approaches significantly outperform the others in all settings.
We executed 5000 simulations on a sample population of 70 individuals with an infection rate of 2%. We have assumed sensitivity/specificity values of 85% / 97% for tests with groups of maximal size 10, which are representative of current PCR machines. This figure demonstrates that our approach outperforms the other baselines with as few as 24 tests (up to 8 tests used in 3 cycles), including both adaptive and non-adaptive varieties, and performs significantly better than individual tests (plotted in the sensitivity/specificity plane as a hexagon, requiring 70 tests), highlighting the savings potential offered by group testing. See preprint for other setups.
Conclusion
Screening a population for a pathogen is a fundamental problem, one that we currently face during the current COVID-19 epidemic. Seventy years ago, Dorfman proposed a simple approach currently adopted by various institutions. Here, we have proposed a method to extend the basic group testing approach in several ways. Our first contribution is to adopt a probabilistic perspective, and form thousands of plausible hypotheses of infection distributions given test outcomes, rather than trust test results to be 100% reliable as Dorfman did. This perspective allows us to seamlessly incorporate additional prior knowledge on infection, such as when we suspect some individuals to be more likely than others to carry the pathogen, based for instance on contact tracing data or answers to a questionnaire. This provides our algorithms, which can be compared to detectives investigating a case, the advantage of knowing what are the most likely infection hypotheses that agree with prior beliefs and tests carried out so far. Our second contribution is to propose algorithms that can take advantage of these hypotheses to form new groups, and therefore direct the gathering of new evidence, to narrow down as quickly as possible to the "true" infection hypothesis, and close the case with as little testing effort as possible.

Acknowledgements
We would like to thank our collaborators on this work, Olivier Teboul, in particular, for his help preparing figures, as well as Arnaud Doucet and Quentin Berthet. We also thank Kevin Murphy and Olivier Bousquet (Google) for their suggestions at the earliest stages of this project, as well as Dan Popovici for his unwavering support pushing this forward; Ignacio Anegon, Jeremie Poschmann and Laurent Tesson (INSERM) for providing us background information on RT-PCR tests and Nicolas Chopin (CREST) for giving guidance on his work to define SMCs for binary spaces.

Source: Google AI Blog


Google Summer of Code 2019 (Statistics Part 2)

2019 has been an epic year for Google Summer of Code as we celebrated 15 years of connecting university students from around the globe with 201 open source organizations big and small.

We want to congratulate our 1,134 students that complete GSoC 2019. Great work everyone!

Now that GSoC 2019 is over we would like to wrap up the program with some more statistics to round out the year.

Student Registrations

We had 30,922 students from 148 countries register for GSoC 2019 (that’s a 19.5% increase in registrations over last year, the previous record). Interest in GSoC clearly continues to grow and we’re excited to see it growing in all parts of the world.

For the first time ever we had students register from Bhutan, Fiji, Grenada, Papua New Guinea, South Sudan, and Swaziland.

Universities

The 1,276 students accepted into the GSoC 2019 program hailed from 6586 universities, of which, 164 have students participating for the first time in GSoC.

Schools with the most accepted students for GSoC 2019:

University # of Accepted Students
Indian Institute of Technology, Roorkee48
International Institute of Information Technology - Hyderabad29
Birla Institute of Technology and Science, Pilani (BITS Pilani)27
Guru Gobind Singh Indraprastha University (GGSIPU Dwarka)20
Indian Institute of Technology, Kanpur19
Indian Institute of Technology, Kharagpur19
Amrita University / Amrita Vishwa Vidyapeetham14
Delhi Technological University11
Indian Institute of Technology, Bombay11
Indraprastha Institute of Information and Technology, New Delhi11

Mentors

Each year we pore over gobs of data to extract some interesting statistics about the GSoC mentors. Here’s a quick synopsis of our 2019 crew:
  • Registered mentors: 2,815
  • Mentors with assigned student projects: 2,066
  • Mentors who have participated in GSoC for 10 or more years: 70
  • Mentors who have been a part of GSoC for 5 years or more: 307
  • Mentors that are former GSoC students: 691
  • Mentors that have also been involved in the Google Code-in program: 498
  • Percentage of new mentors: 35.84%
GSoC 2019 mentors are from all parts of the world, representing 81 countries!

Every year thousands of GSoC mentors help introduce the next generation to the world of open source software development—for that we are forever grateful. We can not stress enough that without our invaluable mentors the GSoC program would not exist. Mentorship is why GSoC has remained strong for 15 years, the relationships built between students and mentors have helped sustain the program and many of these communities. Sharing their passion for open source, our mentors have paved the road for generations of contributors to enter open source development.

Thank you to all of our mentors, organization administrators, and all of the “unofficial” mentors that help in our open source organization’s communities. Google Summer of Code is a community effort and we appreciate each and every one of you.

By Stephanie Taylor, Google Open Source

Reflecting on Google Code-in 2018

Google Code-in (GCI), our contest introducing 13-17 year olds to open source software development, wrapped up last December with impressive numbers: 3,124 students from 77 countries completed an impressive 15,323 tasks!

These students spent 7 weeks working online with 27 open source organizations, writing code, writing and editing documentation, designing UI elements and logos, conducting research, developing videos teaching others about open source software, as well as finding (and fixing!) hundreds of bugs.

Overview

  • 2,164 students completed three or more tasks (earning a Google Code-in 2018 t-shirt)
  • 17% of students were girls
  • 23% of the participants from the USA were girls
  • 79% of students were first time participants in GCI
  • We saw very large increases in the number of students from Austria, Indonesia, Malaysia, Pakistan, and Taiwan

Student Age

Participating Schools

Students from 1,673 schools competed in this year’s contest. Many students learn about GCI from their friends or teachers and continue to spread the word to their classmates. This year the 5 schools with the most students completing tasks in the contest were:
School Name Number of Student Participants Country
Dunman High School 110 Singapore
Indus E.M High School 73 India
Sacred Heart Convent Senior Secondary School 69 India
Amity International School Sec-46 Gurgaon 36 India
Bhartiya Vidya Bhavan Vidyashram Pratap Nagar 27 India

Countries

We are pleased to have 9 countries with first time Winners and Finalists. Winners from Georgia, Macedonia, Philippines, South Africa and Spain, and Finalists from Israel, Luxembourg, Nepal and Pakistan.

The chart below displays the 10 countries with the most students completing at least 1 task.

What's Next

In June we will welcome all 54 grand prize winners to the San Francisco Bay Area for a fun-filled trip. The trip includes the opportunity for students to meet with one of the mentors they worked with during the contest. Students will also take part in an awards ceremony, meet with Google engineers to hear about new and exciting projects, tours of the Google campuses and a fun day exploring San Francisco.

We are thrilled that Google Code-in was so popular this year. We hope to continue to grow and expand this contest in the future to introduce even more teenagers to the world of open source software development.

Thank you again to the heroes of this program: the 789 mentors from 57 countries that guided students through the program and welcomed them into their open source communities.

By Saranya Sampat, Google Open Source

Magnificent mentors of Google Summer of Code 2018

Mentors are the heart and soul of the Google Summer of Code (GSoC) program and have been for the last 14 years. Without their hard work and dedication, there would be no Google Summer of Code. These volunteers spend 4+ months guiding their students to create the best quality project possible while welcoming them into their communities – answering questions and providing help at all hours of the day, including weekends and holidays.

Thank you mentors and organization administrators! 

Each year we pore over heaps of data to extract some interesting statistics about the GSoC mentors. Here’s a quick synopsis of our 2018 crew:
  • Registered mentors: 2,819
  • Mentors with assigned student projects: 1,996
  • Mentors who have participated in GSoC for 10 or more years: 46
  • Mentors who have been a part of GSoC for 5 years or more: 272
  • Mentors that are former GSoC students: 627
  • Mentors that have also been involved in the Google Code-in program: 474
  • Percentage of new mentors: 36.5%
GSoC 2018 mentors are from all parts of the world, hailing from 75 countries!

If you want to see the stats for all 75 countries check out this list.


Another fun fact about our 2018 mentors: they range in age from 15-80 years old!
  • Average mentor age: 34
  • Median mentor age: 33
  • Mentors under 18 years old: 26*
GSoC mentors help introduce the next generation to the world of open source software development – for that we are very grateful. To show our appreciation, we invite two mentors from each of the 206 participating organizations to attend our annual mentor summit at the Google campus in Sunnyvale, California. It’s three days of community building, lively debate, learning best practices from one another, working to strengthen open source communities, good food, and lots and lots of chocolate.

Thank you to all of our mentors, organization administrators, and all of the “unofficial” mentors that help in the various open source organization’s communities. Google Summer of Code is a community effort and we appreciate each and every one of you.

Cheers to yet another great year!

By Stephanie Taylor, Google Open Source

* Most of these 26 young GSoC mentors started their journey in Google Code-in, our contest for 13-17 year olds that introduces young students to open source software development.

Google Summer of Code 2018 statistics part 2

Now that Google Summer of Code (GSoC) 2018 is underway and students are wrapping up their first month of coding, we wanted to bring you some more statistics on the 2018 program. Lots and lots of numbers follow:

Organizations

Students are working with 206 organizations (the most we’ve ever had!), 41 of which are participating in GSoC for the first time.

Student Registrations

25,873 students from 147 countries registered for the program, which is a 25.3% increase over the previous high for the program back in 2017. There are 9 new countries with students registering for the first time: Angola, Bahamas, Burundi, Cape Verde, Chad, Equatorial Guinea, Kosovo, Maldives, and Mali.

Project Proposals

5,199 students from 101 countries submitted a total of 7,209 project proposals. 70.5% of the students submitted 1 proposal, 18.1% submitted 2 proposals, and 11.4% submitted 3 proposals (the max allowed).

Gender Breakdown

11.63% of accepted students are women, a 0.25% increase from last year. We are always working toward making our programs and open source more inclusive, and we collaborate with organizations and communities that help us improve every year.

Universities

The 1,268 students accepted into the GSoC 2018 program hailed from 613 universities, of which 216 have students participating for the first time in GSoC.

Schools with the most accepted students for GSoC 2018:
University Country Students
Indian Institute of Technology, Roorkee India 35
International Institute of Information Technology - Hyderabad India 32
Birla Institute of Technology and Science, Pilani (BITS Pilani) India 23
Indian Institute of Technology, Kharagpur India 22
Birla Institute of Technology and Science Pilani, Goa campus / BITS-Pilani - K.K.Birla Goa Campus India 18
Indian Institute of Technology, Kanpur India 16
University of Moratuwa Sri Lanka 16
Indian Institute of Technology, Patna India 14
Amrita University India 13
Indian Institute of Technology, Mandi India 11
Indraprastha Institute of Information and Technology, New Dehli India 11
University of Buea Cameroon 11
BITS Pilani, Hyderabad Campus India 11
Another post with stats on our awesome GSoC mentors will be coming soon!

By Stephanie Taylor, Google Open Source

Google Summer of Code 2018 statistics part 1

Since 2005, Google Summer of Code (GSoC) has been bringing new developers into the open source community every year. This year we accepted 1,264 students from 62 countries into the 2018 GSoC program to work with a record 206 open source organizations this summer.

Students are currently participating in the Community Bonding phase of the program where they become familiar with the open source projects they will be working with. They also spend time learning the codebase and the community’s best practices so they can start their 12 week coding projects on May 14th.

Each year we like to share program statistics about the GSoC program and the accepted students and mentors involved in the program. Here are a few stats:
  • 88.2% of the accepted students are participating in their first GSoC
  • 74.4% of the students are first time applicants

Degrees

  • 76.18% of accepted students are undergraduates, 17.5% are masters students, and 6.3% are getting their PhDs.
  • 73% are Computer Science majors, 4.2% are mathematics majors, 17% are other engineering majors (electrical, mechanical, aerospace, etc.)
  • We have students in a variety of majors including neuroscience, linguistics, typography, and music technologies.

Countries

This year there are four students that are the first to be accepted into GSoC from their home countries of Kosovo (three students) and Senegal. A complete list of accepted students and their countries is below:
CountryStudentsCountryStudentsCountryStudents
Argentina5Hungary7Russian Federation35
Australia10India605Senegal1
Austria14Indonesia3Serbia1
Bangladesh3Ireland1Singapore8
Belarus3Israel2Slovak Republic2
Belgium3Italy24South Africa1
Brazil19Japan7South Korea2
Bulgaria2Kosovo3Spain21
Cameroon14Latvia1Sri Lanka41
Canada31Lithuania5Sweden6
China52Malaysia2Switzerland5
Croatia3Mauritius1Taiwan3
Czech Republic4Mexico4Trinidad and Tobago1
Denmark1Morocco2Turkey8
Ecuador4Nepal1Uganda1
Egypt12Netherlands6Ukraine6
Finland3Nigeria6United Kingdom28
France22Pakistan5United States104
Germany53Poland3Venezuela1
Greece16Portugal10Vietnam4
Hong Kong3Romania10Venezuela1
There were a record number of students submitting proposals for the program this year -- 5,199 students from 101 countries.

In our next GSoC statistics post we will delve deeper into the schools, gender breakdown, mentors, and registration numbers for the 2018 program.

By Stephanie Taylor, Google Open Source

Google Code-in 2017: more is merrier!

Google Code-in Logo
Google Code-in (GCI), our contest introducing 13-17 year olds to open source software development, wrapped up last month with jaw-dropping numbers: 3,555 students from 78 countries completed an impressive 16,468 tasks! That’s 265% more students than last year - the previous high during the 7 year contest!

These students spent 7 weeks working online with 25 open source organizations, writing code, writing and editing documentation, designing UI elements and logos, conducting research, developing videos teaching others about open source software, as well as finding (and fixing!) hundreds of bugs.

General Statistics

  • 65.9% of students completed three or more tasks (earning a Google Code-in 2017 t-shirt)
  • 17% of students were girls
  • 27% of the participants from the USA were girls
  • 91% of the students were first time participants

Student Age

Participating Schools

Students from 2,060 schools competed in this year’s contest. Many students learn about GCI from their friends or teachers and continue to spread the word to their classmates. This year the 5 schools with the most students completing tasks in the contest were:

School Name Number of Student Participants Country
Dunman High School 140 Singapore
Sacred Heart Convent Senior Secondary School 43 India
Indus E.M High School 27 India
Jayshree Periwal International School 25 India
Union County Magnet High School 18 United States

Countries

We are pleased to have 7 new countries participating in GCI this year: Bolivia, Botswana, Guinea, Guyana, Iceland, Kyrgyzstan, and Morocco! The chart below displays the ten countries with the most students completing at least 1 task.


In June we will welcome all 50 grand prize winners to the San Francisco Bay Area for a fun-filled trip. The trip includes the opportunity for students to meet with one of the mentors they worked with during the contest. Students will also take part in an awards ceremony, meet with Google engineers to hear about new and exciting projects, tours of the Google campuses and a fun day exploring San Francisco. 

Keep an eye on the Google Open Source Blog in the coming weeks for posts from mentoring organizations describing their experience and the work done by students.

We are thrilled that Google Code-in was so popular this year. We hope to continue to grow and expand this contest in the future to introduce even more teenagers to the world of open source software development. 

Thank you again to the heroes of this program: the 704 mentors from 62 countries that guided students through the program and welcomed them into their open source communities.

By Stephanie Taylor, Google Code-in Team

Google Code-in is breaking records

It’s been an incredible (and incredibly busy!) three weeks for the 25 mentor organizations participating in Google Code-in (GCI) 2017, our seven week global contest designed to introduce teens to open source software development. Participants complete bite sized “tasks” in topics that include coding, documentation, UI/UX, quality assurance and more. Volunteer mentors from each open source project help participants along the way.

Total registered students has already surpassed 2016 numbers and we are less than halfway to the finish! We’re thrilled that high school students are embracing GCI like never before.

Check out some of the statistics below (current as of Thursday, December 14):
  • Total registered students: 6,146
  • Number of students who have completed at least one task: 1,573 (51% of those students have completed more than 3 tasks, earning them a GCI t-shirt)
  • Total number of tasks completed: 5,499
  • Most tasks completed by one student: 39

Top 5 Countries by Tasks Completed

Countries Represented by Mentors and Students



Of course, GCI wouldn’t be possible without the effort of the more than 725 mentors and organization administrators. Based in 65 countries, mentors answer questions, review submissions, and approve tasks for students at all hours of the day -- and sometimes night! They work tirelessly to help encourage and guide the next generation of open source contributors.

Every year we express our gratitude to the mentors and organization administrators. We are particularly grateful for them given how many more students are participating in GCI this year. Thank you all, and hang in there!

By Mary Radomile, Google Open Source

Understanding Bias in Peer Review



In the 1600’s, a series of practices came into being known collectively as the “scientific method.” These practices encoded verifiable experimentation as a path to establishing scientific fact. Scientific literature arose as a mechanism to validate and disseminate findings, and standards of scientific peer review developed as a means to control the quality of entrants into this literature. Over the course of development of peer review, one key structural question remains unresolved to the current day: should the reviewers of a piece of scientific work be made aware of the identify of the authors? Those in favor argue that such additional knowledge may allow the reviewer to set the work in perspective and evaluate it more completely. Those opposed argue instead that the reviewer may form an opinion based on past performance rather than the merit of the work at hand.

Existing academic literature on this subject describes specific forms of bias that may arise when reviewers are aware of the authors. In 1968, Merton proposed the Matthew effect, whereby credit goes to the best established researchers. More recently, Knobloch-Westerwick et al. proposed a Matilda effect, whereby papers from male-first authors were considered to have greater scientific merit that those from female-first authors. But with the exception of one classical study performed by Rebecca Blank in 1991 at the American Economic Review, there have been few controlled experimental studies of such effects on reviews of academic papers.

Last year we had the opportunity to explore this question experimentally, resulting in “Reviewer bias in single- versus double-blind peer review,” a paper that just appeared in the Proceedings of the National Academy of Sciences. Working with Professor Min Zhang of Tsinghua University, we performed an experiment during the peer review process of the 10th ACM Web Search and Data Mining Conference (WSDM 2017) to compare the behavior of reviewers under single-blind and double-blind review. Our experiment ran as follows:
  1. We invited a number of experts to join the conference Program Committee (PC).
  2. We randomly split these PC members into a single-blind cadre and a double-blind cadre.
  3. We asked all PC members to “bid” for papers they were qualified to review, but only the single-blind cadre had access to the names and institutions of the paper authors.
  4. Based on the resulting bids, we then allocated two single-blind and two double-blind PC members to each paper.
  5. Each PC member read his or her assigned papers and entered reviews, again with only single-blind PC members able to see the authors and institutions.
At this point, we closed our experiment and performed the remainder of the conference reviewing process under the single-blind model. As a result, we were able to assess the difference in bidding and reviewing behavior of single-blind and double-blind PC members on the same papers. We discovered a number of surprises.

Our first finding shows that compared to their double-blind counterparts, single-blind PC members tend to enter higher scores for papers from top institutions (the finding holds for both universities and companies) and for papers written by well-known authors. This suggests that a paper authored by an up-and-coming researcher might be reviewed more negatively (by a single-blind PC member) than exactly the same paper written by an established star of the field.

Digging a little deeper, we show some additional findings related to the “bidding process,” in which PC members indicate which papers they would like to review. We found that single-blind PC members (a) bid for about 22% fewer papers than their double-blind counterparts, and (b) bid preferentially for papers from top schools and companies. Finding (a) is especially intriguing; with no author information reviewers have less information, arguably making the job of weighing the merit of each paper more difficult. Yet, the double-blind reviewers bid for more work, not less, than their single-blind counterparts. This suggests that double-blind reviewers become more engaged in the review process. Finding (b) is less surprising, but nonetheless enlightening: In the presence of author names and institution, this information is incorporated into the reviewers’ bids. All else being equal, the odds that single-blind reviewers bid on papers from top institutions is about 15 percent above parity.

We also studied whether the actual or perceived gender of authors influenced the behavior of single-blind versus double-blind reviewers. Here the results are a little more nuanced. Compared to double-blind reviewers, we saw about a 22% decrease in the odds that a single-blind reviewer would give a female-authored paper a favorable review, but due to the smaller count of female-authored papers this result was not statistically significant. In an extended version of our paper, we consider our study as well as a range of other studies in the literature and perform a “meta-analysis” of all these results. From this larger pool of observations, the combined results do show a significant finding for the gender effect.

To conclude, we see that the practice of double-blind reviewing yields a denser landscape of bids, which may result in a better allocation of papers to qualified reviewers. We also see that reviewers who see author and institution information tend to bid more for papers from top institutions, and are more likely to vote to accept papers from top institutions or famous authors than their double-blind counterparts. This offers some evidence to suggest that a particular piece of work might be accepted under single-blind review if the authors are famous or come from top institutions, but rejected otherwise. Of course, the situation remains complex: double-blind review imposes an administrative burden on conference organizers, reduces the opportunity to detect several varieties of conflict of interest, and may in some cases be difficult to implement due to the existence of pre-prints or long-running research agendas that are well-known to experts in the field. Nonetheless, we recommend that journal editors and conference chairs carefully consider the merits of double-blind review.

Please take a look at our full paper for more details of our study.

The Mentors of Google Summer of Code 2017

Every year, we pore over oodles of data to extract the most interesting and relevant statistics about the Google Summer of Code (GSoC) mentors. Mentors are the bread and butter of our program - without their hard work and dedication, there would be no GSoC. These volunteers spend 12 weeks (plus a month of community bonding) tirelessly guiding their students to create the best quality project possible and welcoming them into their communities - answering questions and providing help at all hours.

Here’s a quick snapshot of our 2017 group:
  • Total mentors: 3,439
  • Mentors assigned to an active project: 1,647
  • Mentors who have participated in GSoC over 10 years: 22
  • Percentage of new mentors: 49%
GSoC 2017 mentors are a worldly group, hailing from 69 countries on 6 continents - we’re still waiting on a mentor from Antarctica… Anyone?

Interested in the data? Check out the full list of countries.
Some interesting factoids about our mentors:
  • Average age: 39
  • Youngest: 15*
  • Oldest: 68
  • Most common first name: Michael (there are 40!)
GSoC mentors help to introduce the next generation to the world of open source software development — for that we are very grateful. To show our appreciation, we invite two mentors from each of the 201 participating organizations to attend the annual mentor summit at the Google campus in Sunnyvale, California. It’s three days of food, community building, lively debate and lots of fun.

Thank you to everyone involved in Google Summer of Code. Cheers to yet another great year!

By Mary Radomile, Google Open Source

* Say what? 15 years old!? Yep! We had 12 GSoC mentors under the age of 18. This group of enthusiastic teens started their journey in our sister program, Google Code-in, an open source coding competition for 13-17 year olds. You can read more about it at g.co/gci.