Interview: Python Development at edX with Robert Raposa and Ned Batchelder
In this interview with Robert Raposa and Ned Batchelder, software architects at edX, we’re going to look under the hood of the edX project, where more than 95% of the entire codebase is in Python. Robert is a core contributor to the Open edX LMS and Studio products, as well as their supporting infrastructure. Ned is on the Open edX team, advocating for the community using the software. We’re going to learn about the project, how they develop, what their technology preferences are, and some of the reasons they chose Python as the main language. Many edX core developers are using PyCharm, so we’ll also learn what developers value the most in it.
– Hi Robert and Ned, could you tell us a little bit about yourselves?
Robert: After getting my CS degree, I have spent my entire career working on software related to education. I started off my career coding software used for management training. I then switched to a product for administrators and teachers of pre-school. At this point, I decided to try teaching Math for a year. It turns out that writing software to facilitate teaching is much easier than teaching, so I next worked on a K-12 LMS for many years before finally landing at edX as a software architect. Like most edX employees, I was so excited about joining a nonprofit with an incredible mission and commitment to open source.
Ned: I’ve been deeply embedded in the Python community for a long time. I loved edX’s mission, and also its open source approach.
– For those of us who are only slightly familiar with edX, could you tell us what it’s all about?
Robert: edX was initially started by MIT and Harvard in 2012. Today, it is one of the leading MOOC providers, and the only one that is both a nonprofit and open source. Its mission is to increase access to everyone, everywhere. We offer courses from our partners from many of the top universities across the globe, as well as other nonprofits and institutions.
The core of the Open edX platform, the open source platform upon which edX.org is built, includes two products: the Learning Management System (LMS) and Studio. Studio is an authoring tool used to create the courses that are run and taught through the LMS. There are also many supporting subsystems, for example, for accessing analytics or for discovering courses. These subsystems are also made available through the larger Open edX infrastructure.
– Tell us about how edX is organized as an open source project.
Ned: There are over 800 sites running Open edX, with over 15,000 courses available across the globe. We are making a push to find more adopters, and they are easy to find, so the numbers keep going up! As a non-profit, we provide free online education. We also provide free tools for others to run their own online education. This is all part of our mission to increase access to education for everyone everywhere.
All of the Open edX source code can be found on github at https://github.com/edx. Here you will find 206 repositories, including many Django packages and other libraries that edX has created to support the entire development lifecycle. Well over 90% of what edX develops is open source. Most of the proprietary code concerns how edX markets and lists its courses, and Open edX includes an open source version used by our community members.
We’re a little different from many other open source projects, in that we run a site ourselves with the code we are writing. This creates some unique pressures. One of our challenges as an open source project is, how can we keep our own site running well while also delivering this complex suite of software to other sites, and accepting contributions from them? We don’t know the best answer yet, but we’re continuing to work on it.
– How big is the edX core team?
Ned: We have about 80 in engineers in Cambridge. We’re up to approximately 150 if you add contractors and regular contributors in the community. Given the pervasiveness of Python in our work, I’d say nearly every engineer is a Python developer.
– What are the main languages and technologies generally used at edX?
Robert: Python is used for the vast majority of code for Open edX. We use Django for our web applications, including the Django REST framework.
On the front-end, we have a lot of legacy Backbone.js and Underscore.js, but are slowly moving more and more to React. We also use Sass and Bootstrap.
EdX.org is hosted in AWS. Some example technologies we host there include Memcached, ElasticSearch, MySql, and Mongo. We use a mix of CloudFlare and CloudFront for CDN.
For development, Continuous Integration, and deployments we use a mix of Docker, GitHub, Jenkins, GoCD, Asgard, and Terraform, among others.
Finally, like any undertaking this large, we’ve got our special snowflakes like our Ruby-based discussion service that no one wants to work on except to rewrite it in Python, which still hasn’t happened.
– What’s so special about Python and why is it so widely used at edX?
Ned: Python has strong web development tools, and makes it easy to build quickly. We wanted to provide extensibility in the platform, so using an approachable language that would let people add packages to their installations was essential. The edX Studio authoring tool lets course teams add code to their courses, to randomize or grade assessments, so a dynamic language that let us execute that code was also really powerful.
Python being an open-source language, with a strong culture of open-source tools, has also been important for us, as an open-source project. It lets us affect the tooling we rely on, and it means that we can attract contributors who are already familiar with the Python world.
One last factor in choosing Python: edX started as an MIT project, and MIT’s teaching language is Python. Never underestimate the power of becoming familiar among people at a strong institution like MIT!
– Do you also do some data analysis or ML with Python?
Robert: In addition to using Python with Django, we also use Python for various scripts, linters, testing frameworks, and data analysis. My colleague Cale tells me he used a combination of ipython notebooks, pandas, and ggplot for his analysis work.
– Which Python version is currently in use at edX?
Ned: We are still using Python 2. We’ve been making some advances toward Python 3 where and when we can. As part of our Django upgrade process, we recently introduced tox in most of our repos to test against these various combinations. We’ll likely be switching to Python 3 with the rest of the Django community as they drop support for Python 2.
– What, in your opinion, are the main development challenges for edX developers today?
Robert: Keeping track of such a large codebase. We are trying to introduce more and more best practices as we can, and move more and more of the codebase in the right direction, but we have a lot of legacy to work with at this point. Like many development efforts, we have a big monolith as one of the many components of our architecture, and we are trying to work towards an architecture that is split enough, but not too much. It is a balancing act that is difficult to get right.
– There are many core developers at edX using PyCharm for their development. How does PyCharm help them be more productive?
Robert: We probably have about 40 developers using PyCharm. Many of the other developers end up using some combination of technologies like sublime, vim, and pdb. When I need to pair with someone who doesn’t use PyCharm, I often find myself asking how they can stand not being able to jump into the definition of a method.
Many people choose PyCharm for its debugging capabilities, as well as having an editor that understands Python. When you watch someone debug in a modern IDE, it is hard not to want to be able to do the same. For PyCharm users, we often use debugging, refactoring, autocompletion, version control, find definition or class, and PyCharm has great support for these technologies.
Over the last year, we’ve migrated our development environment from Vagrant to Docker. It was in tandem with PyCharm adding more and more Docker support. There have been some hiccups on this front, but it is nice to still be able to debug.
– Does PyCharm help boost team productivity on your project?
Ned: We really have a mix of people that like and dislike full IDEs. For those who like it, like myself, it definitely improves our workflow. For those who don’t, they think it doesn’t.
Since we need to accommodate both types of engineers, we generally rely on tools that work outside the IDE and run with our continuous builds to enforce rules like code style. Some of these tools have been difficult to fully integrate with PyCharm, but that would give us the best of both worlds.
– What about your individual productivity?
Robert: I have always used visual IDEs and can’t imagine why anyone would not want to. I find it much easier to do so many different things in PyCharm. I know I could use tools like grep, but why wouldn’t I want to search for code and be able to edit it and jump to definitions all in one seamless flow? I even like using the visual tools for resolving git conflicts.
There are also many things I do from the command line. It definitely has its place. I just find so many things that go more smoothly in PyCharm.
– How did you personally first learn about PyCharm?
Robert: I came to edX with a Java background and no experience with Python. At my last job we mostly standardized on Eclipse, but even that changed over time because you get what you pay for, I guess.
Other engineers at edX were using PyCharm, so that’s what I started to use, and I found it does a pretty decent job. I am often surprised by how much it can do given that Python is not compiled.
– Thank you for the interview, Robert and Ned!