Author Archives: Open Source Programs Office

Announcing an Open Source ADC board for BeagleBone

Cross posted on the Google Research Blog
Working with electronics, we often find ourselves soldering up a half baked electronic circuit to detect some sort of signal. For example, last year we wanted to measure the strength of a carrier. We started with traditional analog circuits — amplifier, filter, envelope detector, threshold. You can see some of our prototypes in the image below; they get pretty messy.


While there's a certain satisfaction in taming a signal using the physical properties of capacitors, coils of wire and transistors, it's usually easier to digitize the signal with an Analog to Digital Converter (ADC) and manage it with Digital Signal Processing (DSP) instead of electronic parts. Tweaking software doesn't require a soldering iron, and lets us modify signals in ways that would require impossible analog circuits.


There are several standard solutions for digitizing a signal: connect a laptop to an oscilloscope or Data Acquisition System (DAQ) via USB or Ethernet, or use the onboard ADCs of a maker board like an Arduino. The former are sensitive and accurate, but also big and power hungry. The latter are cheap and tiny, but slower and have enough RAM for only milliseconds worth of high speed sample data.  


That led us to investigate single board computers like the BeagleBone and Raspberry Pi, which are small and cheap like an Arduino, but have specs like a smartphone.  And crucially, the BeagleBone's system-on-a-chip (SoC) combines a beefy ARMv7 CPU with two smaller Programmable Realtime Units (PRUs) that have access to all 512MB of system RAM.  This lets us dedicate the PRUs to the time-sensitive and repetitive task of reading each sample out of an external ADC, while the main CPU lets us use the data with the GNU/Linux tools we're used to.


The result is an open source BeagleBone cape we've named PRUDAQ.  It's built around the Analog Devices AD9201 ADC, which samples two inputs simultaneously at up to 20 megasamples per second, per channel.  Simultaneous sampling and high sample rates make it useful for software-defined radio (SDR) and scientific applications where a built-in ADC isn't quite up to the task.  


Our open source electrical design and sample code are available on GitHub, and GroupGets has boards ready to ship for $79.  We also were fortunate to have help from Google intern Kumar Abhishek. He added support for PRUDAQ to his Google Summer of Code project BeagleLogic that performs much better than our sample code.


We started PRUDAQ for our own needs, but quickly realized that others might also find it useful. We're excited to get your feedback through the email list.  Tell us what can be done with inexpensive fast ADCs paired with inexpensive fast CPUs!

Posted by Jason Holt, Software Engineer

Lessons from Professors’ Open Source Software Experience (POSSE) 2016


From Google Summer of Code to Google Code-in, the Open Source Programs Office does a lot to get students involved with open source. In order to learn more about supporting open source in academia, I attended the NSF funded Professors' Open Source Software Experience (POSSE) in Philadelphia. It was a great opportunity for us to better understand the challenges instructors face in weaving open source into their curriculum and hear solutions on how to bridge the gap.

Almost 30 university professors and community college lecturers attended the 3-day workshop. During the workshop, attendees worked in small groups getting hands on experience incorporating humanitarian free and open source software (HFOSS) into their teaching. Professors were able to talk, mingle and share best practices throughout the event.

The POSSE workshop is led by Heidi Ellis, Professor, Department of Computer Science and Information Technology at Western New England University, and Greg Hislop, Professor of Software Engineering and Senior Associate Dean for Academic Affairs at Drexel University. Heidi and Greg took over running POSSE five years after Red Hat began the program as an outreach effort to the higher education community. Red Hat continues as a collaborator in the effort. Around 40 university and community college professors participate in the program every year with over 100 individuals attending the workshop in the last four years.

Here are some of the challenges professors shared:
  • Very little guidance on how to bring FOSS into the classroom. No standard curriculum / syllabus available to reference. 
  • Time investment required to change the curriculum.
  • Will not be rewarded for teaching FOSS courses.
  • Will not get funds to travel for workshops/conferences unless it’s to present a paper at a conference.
  • Many administrations aren’t aware that adding open source is beneficial for students since more and more companies use open source and expect their new hires to be familiar with it.

The next POSSE will be Nov 17-19. Faculty who are interested in attending POSSE, please click here to apply.

We also discussed a number of open source programs that are currently working to engage students with open source software development:

Thanks to Heidi, Greg and the FOSS2Serve team for organizing POSSE 2016! We look forward to taking what we’ve learned and using it to better support FOSS education in academia.

By Feiran Helen Hu, Open Source Programs Office

GitHub on BigQuery: Analyze all the code



Google, in collaboration with GitHub, is releasing an incredible new open dataset on Google BigQuery. So far you've been able to monitor and analyze GitHub's pulse since 2011 (thanks GitHub Archive project!) and today we're adding the perfect complement to this. What could you do if you had access to analyze all the open source software in the world, with just one SQL command?

The Google BigQuery Public Datasets program now offers a full snapshot of the content of more than 2.8 million open source GitHub repositories in BigQuery. Thanks to our new collaboration with GitHub, you'll have access to analyze the source code of almost 2 billion files with a simple (or complex) SQL query. This will open the doors to all kinds of new insights and advances that we're just beginning to envision.

For example, let's say you're the author of a popular open source library. Now you'll be able to find every open source project on GitHub that's using it. Even more, you'll be able to guide the future of your project by analyzing how it's being used, and improve your APIs based on what your users are actually doing with it.

On the security side, we've seen how the most popular open source projects benefit from having multiple eyes and hands working on them. This visibility helps projects get hardened and buggy code cleaned up. What if you could search for errors with similar patterns in every other open source project? Would you notify their authors and send them pull requests? Well, now you can. Some concepts to keep in mind while working with BigQuery and the GitHub contents dataset:
To learn more, read GitHub's announcement and try some sample queries. Share your queries and findings in our reddit.com/r/bigquery and Hacker News posts. The ideas are endless, and I'll start collecting tips and links to other articles on this post on Medium.

Stay curious!

More statistics from Google Summer of Code 2016

Google Summer of Code
Google Summer of Code (GSoC) 2016 is officially at its halfway point. Mentors and students have just completed their midterm evaluations and it’s time for our second stats post. This time we take a closer look at our participating students.

First, we’d like to highlight the universities with the most student participants. Congratulations are due to the International Institute of Information Technology - Hyderabad for claiming the top spot for the third consecutive year!

Country School 2016 Accepted Students 2015 Accepted Students 12 Year Total
India International Institute of Information Technology - Hyderabad 50 62 252
Sri Lanka University of Moratuwa 29 44 320
Romania University POLITEHNICA of Bucharest 24 14 155
India Birla Institute of Technology and Science Pilani, Goa Campus 22 15 110
India Birla Institute of Technology and Science, Pilani Campus 22 18 116
India Indian Institute of Technology, Bombay 18 13 75
India Indian Institute of Technology, Kharagpur 15 8 92
India Indian Institute of Technology, Roorkee 15 8 57
India Indraprastha Institute of Information Technology 15 7 27
India Amrita Institute Of Technology & Science, Amritapuri 13 5 33
India Indian Institute of Technology, Guwahati 13 5 38
Cameroon University of Buea 12 10 26
India Delhi Technological University 12 9 60
India Indian Institute of Technology BHU Varanasi 12 12 37
Germany TU Munich 11 7 45


Next, we are proud to announce that 2016 marks the largest number of female GSoC participants to date — 12% of accepted students are female, up 2.2% from 2015. This is good progress, but we are certain we can do better in the future to diversify our program. The Google Open Source team will continue our outreach to many organizations, for example, Grace Hopper and Black Girls Code, to increase this number even more 2017. If you have any suggestions of organizations we should work with, please let us know in the comments.

Finally, each year we like to look at the majors of students. As expected, the most common area of study for our participants is Computer Science (approximately 78%), but this year we have a wide variety of studies including Linguistics, Law, Music Technology and Psychology.  The majority of our students this year are undergraduates (67%), followed by Masters (23%) and then PhD students (9%).



Although reviewing GSoC statistics each year is great fun, we want to stress that being “first place” is not the point of the program. Our goal is to get more and more students involved in creating free and open source software. We hope Google Summer of Code encourages contributions to projects that have the potential to make a difference worldwide. Congratulations to the students from all over the globe and keep up the good work!

By Mary Radomile, Open Source Programs Office

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source

Originally posted on the Google Research Blog

By Slav Petrov, Senior Staff Research Scientist

At Google, we spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways. Today, we are excited to share the fruits of our research with the broader community by releasing SyntaxNet, an open-source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems. Our release includes all the code needed to train new SyntaxNet models on your own data, as well as Parsey McParseface, an English parser that we have trained for you and that you can use to analyze English text.

Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language, and that can explain the functional role of each word in a given sentence. Because Parsey McParseface is the most accurate such model in the world, we hope that it will be useful to developers and researchers interested in automatic extraction of information, translation, and other core applications of NLU.

How does SyntaxNet work?

SyntaxNet is a framework for what’s known in academic circles as a syntactic parser, which is a key first component in many NLU systems. Given a sentence as input, it tags each word with a part-of-speech (POS) tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree. These syntactic relationships are directly related to the underlying meaning of the sentence in question. To take a very simple example, consider the following dependency tree for Alice saw Bob:


This structure encodes that Alice and Bob are nouns and saw is a verb. The main verb saw is the root of the sentence and Alice is the subject (nsubj) of saw, while Bob is its direct object (dobj). As expected, Parsey McParseface analyzes this sentence correctly, but also understands the following more complex example:


This structure again encodes the fact that Alice and Bob are the subject and object respectively of saw, in addition that Alice is modified by a relative clause with the verb reading, that saw is modified by the temporal modifier yesterday, and so on. The grammatical relationships encoded in dependency structures allow us to easily recover the answers to various questions, for example whom did Alice see?, who saw Bob?, what had Alice been reading about? or when did Alice see Bob?.

Why is Parsing So Hard For Computers to Get Right?

One of the main problems that makes parsing so challenging is that human languages show remarkable levels of ambiguity. It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures. A natural language parser must somehow search through all of these alternatives, and find the most plausible structure given the context. As a very simple example, the sentence Alice drove down the street in her car has at least two possible dependency parses:


The first corresponds to the (correct) interpretation where Alice is driving in her car; the second corresponds to the (absurd, but possible) interpretation where the street is located in her car. The ambiguity arises because the preposition in can either modify drove or street; this example is an instance of what is called prepositional phrase attachment ambiguity.

Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same. Multiple ambiguities such as these in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence. Usually the vast majority of these structures are wildly implausible, but are nevertheless possible and must be somehow discarded by a parser.

SyntaxNet applies neural networks to the ambiguity problem. An input sentence is processed from left to right, with dependencies between words being incrementally added as each word in the sentence is considered. At each point in processing many decisions may be possible—due to ambiguity—and a neural network gives scores for competing decisions based on their plausibility. For this reason, it is very important to use beam search in the model. Instead of simply taking the first-best decision at each point, multiple partial hypotheses are kept at each step, with hypotheses only being discarded when there are several other higher-ranked hypotheses under consideration. An example of a left-to-right sequence of decisions that produces a simple parse is shown below for the sentence I booked a ticket to Google.
Furthermore, as described in our paper, it is critical to tightly integrate learning and search in order to achieve the highest prediction accuracy. Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework at Google. Given some data from the Google supported Universal Treebanks project, you can train a parsing model on your own machine.

So How Accurate is Parsey McParseface?

On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach. While there are no explicit studies in the literature about human performance, we know from our in-house annotation projects that linguists trained for this task agree in 96-97% of the cases. This suggests that we are approaching human performance—but only on well-formed text. Sentences drawn from the web are a lot harder to analyze, as we learned from the Google WebTreebank (released in 2011). Parsey McParseface achieves just over 90% of parse accuracy on this dataset.

While the accuracy is not perfect, it’s certainly high enough to be useful in many applications. The major source of errors at this point are examples such as the prepositional phrase attachment ambiguity described above, which require real world knowledge (e.g. that a street is not likely to be located in a car) and deep contextual reasoning. Machine learning (and in particular, neural networks) have made significant progress in resolving these ambiguities. But our work is still cut out for us: we would like to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts.

To get started, see the SyntaxNet code and download the Parsey McParseface parser model. Happy parsing from the main developers, Chris Alberti, David Weiss, Daniel Andor, Michael Collins & Slav Petrov.

CCTZ v2.0 — now with more civil time

Last September we announced an open source project called CCTZ, a C++ library that enables computing with arbitrary time zones. Today we're announcing CCTZ v2.0 which introduces a new civil time library. Civil time is a legally recognized representation of time used by humans (i.e., year, month, day, hour, minute and second). The most common example of a civil time is a time zone independent date. In version 2.0, CCTZ's time zone and new civil time libraries cooperate with the standard C++ <chrono> library to give programmers a complete (and simple!) framework in which to reason about and solve even the most complicated time programming problems.

To learn more, please check out the project page on GitHub. Pay particular attention to the fundamental concepts section which establishes a simple, cross-platform and language agnostic mental model that will help you reason about time programming challenges with ease and confidence. And don't forget to subscribe to the new CCTZ mailing list to ask questions and learn about future announcements.

by Greg Miller and Bradley White, Google Engineering

Google Summer of Code marches on!

Google Summer of Code 2016 (GSoC) is well underway and we’ve already seen some impressive numbers — all record highs!
sun.png
  • 18,981 total registered students (up 36% from 2015)
  • 17.34% female registrants
  • 142 countries
  • 5107 students submitting  7,543 project proposals

Student proposals are currently being reviewed by over 2300 mentors and organization administrators from the 180 participating mentor organizations. We will announce accepted students on April 22, 2016 on the Open Source blog and on the program site.

Last week, members of the Google Open Source Programs team attended FOSSASIA in Singapore, Asia’s premier open technology event, to talk about GSoC and Google Code-in. There, we met dozens of former GSoC and GCI students and mentors who were excited to embark on another great year. To learn more about Google Summer of Code, please visit our program site.


By Stephanie Taylor, Open Source Programs

Google Code-in 2015 Wrap Up: Sustainable Computing Research Group (SCoRe)

For the next several weeks, we will be showcasing wrap up posts from the 12 organizations that participated as mentor organizations for Google Code-in 2015. This week we feature SCoRe, an open source research project based in Sri Lanka.
The Sustainable Computing Research Group (SCoRe) at University of Colombo School of Computing conducts research covering various aspects of wireless sensor networks, embedded systems, digital forensic, information security, mobile applications and e-learning. The goal of our research is to generate computing solutions through identifying low cost methodologies and strategies that lead to sustainability. The solutions we get by sustainable computing research projects conducted at SCoRe lab are important for developing countries like Sri Lanka.

Inspired by our participation in Google Summer of Code (GSoC), for the very first time, SCoRe lab participated in Google Code-in 2015 (GCI), with 13 other open source organizations around the world. We offered 250 claimable task for students and we had 27 mentors, mentoring students who successfully completed 164 tasks! We gained active contributors to SCoRe, from students who contribute to our open source projects even after the contest ended.

The tasks covered code, user interface, research, quality assurance, outreach and documentation. 44 students completed at least one task with us this year and eight students completed at least three tasks with us to earn a GCI t-shirt. Six students completed over ten  tasks each in competition to become grand prize winners.

However among these students we had to choose the ones who we felt had the most impactful contributions. We’d like to congratulate the two grand prize winners from SCoRe: Brayan Alfaro and Anesu Mafuvadze.

Below is a comment received from a student who participated:

“It was my pleasure working with you and the SCoRe Community. This contest helped me to enhance my knowledge in software development...I gained a lot of knowledge through the tasks I did. My mentors guided me every time and I would gladly work with this community in the future. I would love to contribute to you in every possible way.”

We give our special thanks to our mentors who voluntarily worked throughout the contest around their busy schedules and vacation plans. We’d also like to thank all the students who actively participated and contributed to our organization. SCoRe was pleased to be selected as a mentoring organization for GCI 2015 and we hope to participate in both GSoC and GCI again in future!

By Dilushi Piumwardane, GCI mentor, SCoRe

Something different — code up hardware in Google Summer of Code

In 1983, the same year I was born, a company called Altera was founded and created the EP300, their first reprogrammable logic device. The event was considered a major step towards the development of devices we now call “Field Programmable Gate Arrays” or FPGAs for short. In the following 33 years, FPGAs would go from extremely expensive devices found only in high end military and telecommunications equipment, to something even a student can afford.

The EP300 in all it's glory
FPGAs are exciting because they make the development process for hardware the same as software. Developers are able to create designs in a hardware description language (HDL), compile and then use them almost instantly! They make hardware code. Turning hardware into code makes it easy for open source developers to share, collaborate and improve the hardware in ways that would have been extremely hard, or even impossible in the past. 

There were 180 open source organizations accepted to participate in Google Summer of Code 2016 (GSoC), and it is exciting to see several of the organizations using these technologies. I've highlighted some of the different types of hardware coding opportunities in GSoC this year below. (Anything I've missed? Feel free to add it in the comments section below!)

In the area of CPU architectures, OpenRISC and it’s spiritual successor, the RISC-V, are attempting to make a truly open hardware at the most fundamental level. In 2016 you could help this goal via participating in GSoC with either the FOSSi Foundation or lowRISC project.


Not content with the existing HDLs, both the ArchC organization and MyHDL organization (a sub-organization of the Python group), are attempting to make it easier to create these hardware designs. MyHDL is particularly cool because Python is normally considered to be as far away from hardware as you can get.


My own project, TimVideos.us, is using much of the work from these other projects to develop high speed video processing hardware for conference and user group recording (or maybe even video DJing).

Imagine developing hardware in the same way you write code. With FPGAs you can — and GSoC has numerous opportunities to create hardware using this exciting technology. With only 7 days left to submit your application, you better get cracking!


By Tim ‘mithro’ Ansell, Software Engineer on Chrome by day, open source hardware hacker by night

Student applications now open for Google Summer of Code!

Are you a university student looking to learn more about open source software development? Look no further than Google Summer of Code (GSoC) and spend your summer break working on an exciting open source project, learning how to write code.
vertical GSoC logo.jpg
For twelve years running, GSoC gives participants a chance to work on an open source software project entirely online. Students, who receive a stipend for their successful contributions, are paired with mentors who can help address technical questions and concerns throughout the program. Former GSoC participants have told us that the real-world experience they’ve gained during the program has not only sharpened their technical skills, but has also boosted their confidence, broadened their professional network and enhanced their resumes. 

Students who are interested can submit proposals on the  program site now through Friday, March 25 at 19:00 UTC. The first step is to review the 180 open source projects and find project ideas that appeal to you. Since spots are limited, we recommend a strong project proposal to help increase your chances of selection. Our Student Manual provides lots of helpful advice to get you started on choosing an organization and crafting a great application. 

For ongoing information throughout the application period and beyond, see the Google Open Source Blog, join our Google Summer of Code discussion lists or join us on internet relay chat (IRC) at #gsoc on Freenode.

Good luck to all the open source coders out there, and remember to submit your proposals early — you only have until Friday, March 25 at 19:00 UTC to apply!


By Mary Radomile, Google Open Source team