Getting to know a research intern: Paul Rubenstein

Research teams are embedded all throughout Google, allowing our discoveries to affect billions of users each day. From creating experiments and prototyping implementations to designing new architectures, our team members and interns work on real-world problems including artificial intelligence, data mining, natural language processing, hardware and software performance analysis, improving compilers for mobile platforms, as well as core search and much more.

Google offers a variety of opportunities for students who wish to gain industry experience. Through our Getting to know a research intern series, we provide a glimpse into some of these opportunities as well as the impactful projects research students at Google work on. Today we’re featuring Paul Rubenstein, from the University of Cambridge.
Tell us about yourself and your research topic. How did you end up working in this area?
I first studied math at the University of Cambridge and went on to get masters degrees in computational biology and machine learning. I then joined the Cambridge-Tuebingen PhD program where I am now in my final year. In the first two years of my PhD, I worked mostly on theoretical aspects of causal inference. Generally, causal inference is about learning causal structure in the world from a mixture of observational data (passive observation of the world) and interventional data (where you perform experiments and see what happens).

In the second half of my PhD, I’ve been working on  representation learning (where one tries to learn lower dimensional features of high dimensional inputs such as images that are useful for transferring to other tasks), generative modelling, disentanglement, and some learning theory. Representation learning has been the broad topic of my research internship at Google.
This is your second internship at Google. Why did you apply the first time, and why did you decide to come back? 
I applied for my first internship because I was interested to see how machine learning is used and developed in an industrial context. I was really impressed by several things about both my team and Google generally: the incredible infrastructure and computational resources, the plethora of interesting problems with practical impact, that academic publishing is encouraged, and that Google is generally a great place to go to work each day.

For these reasons and more, I decided to apply for a second internship. This year, I was with the Brain team in Zurich, focusing on fundamental machine learning research. Being on this team is as close as I imagine it gets to being in an academic lab while in industry — people have a lot of freedom in choosing their research topic and writing papers and having a research impact is the main goal, yet there are several advantages over my experience of academia. The level of software engineering skill (and presence of dedicated software engineers collaborating on the projects) lead to shared code bases that enable prototyping and experimenting at large scale much more easily and quickly than in typical academic labs. These factors, combined with a more collaborative atmosphere, lead to the undertaking of larger scale, potentially more impactful projects.

What project was your internship focused on? What was the outcome of your research? 
In the first half of the internship, I worked on understanding the theoretical underpinnings of some recently proposed representation learning algorithms. This line of research led to a research paper On Mutual Information for Representation Learning which is currently under submission at the International Conference on Learning Representations (ICLR), one of the top machine learning conferences. In the second half, I worked on new algorithms for representation learning. This work is ongoing, and the resulting paper hasn’t been published yet.
Did you write your own code? What advice do you have for future interns?
Yes. Coding at Google is a little different than what I was used to in academia in two main ways. The first is that a lot of code is shared, and as a result, good software engineering practices are followed! This also results in larger code-bases that are a lot more complex than are typical in my PhD. The second is that you have access to a large amount of cutting-edge computational resources. This means that it is possible to run very large scale experiments.

My advice to future interns is that once you’ve started, there are many Google-specific things that have to be learned, so when you inevitably get stuck on something, the best thing to do is to ask someone for help. Asking questions is encouraged because it is the fastest way to improve your productivity and thus the productivity of your team!

What key skills have you gained from your time at Google? What impact has this internship experience had on your research?
My software engineering skills have definitely improved a lot as a result of working here. I’ve also learned a lot about how organisations and teams can be structured and managed in order to be most productive. I have learned a great deal about areas of research that I hadn’t worked in before the internship, and I hope to continue my research in these areas after my internship ends. The exposure to good software engineering practices has had a big impact in that it has facilitated my research in more practical areas involving lots of coding, in contrast to the more theoretical research I did earlier in my PhD.
Looking back on your experiences now: Why should a PhD student apply for an internship at Google? Any advice to offer?
My main reasons to do an internship at Google:

You will be exposed to very interesting problems that you may not see elsewhere.
You will work with and learn from colleagues who are experts in their fields.
I may have mentioned this once or twice already, your software engineering skills will improve a lot!
It’s incredible the amount you can achieve and learn in a 3-4 month internship at Google.

In order to prepare for coding interviews, I recommend the Cracking the Coding Interview book (though some chapters might not be relevant). I typed out my solutions in a Google doc to match the real interview experience as closely as possible. For more practice questions, there are many websites that have libraries of example coding interview questions, you can find many of them on Google's Tech Dev Guide.

To prepare for a research interview, I recommend practicing talking about your research at a high level to those that might know only the basics of your area. You should also review the basics of machine learning and deep learning, e.g. be able to explain basic concepts such as empirical risk minimization/generalisation/overfitting, common architectures (MLPs/convolutions,) and training techniques (SGD/momentum/Adam).