Interview: Python Development at edX with Robert Raposa and Ned Batchelder

In this interview with Robert Raposa and Ned Batchelder, software architects at edX, we’re going to look under the hood of the edX project, where more than 95% of the entire codebase is in Python. Robert is a core contributor to the Open edX LMS and Studio products, as well as their supporting infrastructure. Ned is on the Open edX team, advocating for the community using the software. We’re going to learn about the project, how they develop, what their technology preferences are, and some of the reasons they chose Python as the main language. Many edX core developers are using PyCharm, so we’ll also learn what developers value the most in it.

edx_logo_final

– Hi Robert and Ned, could you tell us a little bit about yourselves?

Robert: After getting my CS degree, I have spent my entire career working on software related to education. I started off my career coding software used for management training. I then switched to a product for administrators and teachers of pre-school. At this point, I decided to try teaching Math for a year. It turns out that writing software to facilitate teaching is much easier than teaching, so I next worked on a K-12 LMS for many years before finally landing at edX as a software architect. Like most edX employees, I was so excited about joining a nonprofit with an incredible mission and commitment to open source.

Ned: I’ve been deeply embedded in the Python community for a long time. I loved edX’s mission, and also its open source approach.

– For those of us who are only slightly familiar with edX, could you tell us what it’s all about?

Robert: edX was initially started by MIT and Harvard in 2012. Today, it is one of the leading MOOC providers, and the only one that is both a nonprofit and open source. Its mission is to increase access to everyone, everywhere. We offer courses from our partners from many of the top universities across the globe, as well as other nonprofits and institutions.

The core of the Open edX platform, the open source platform upon which edX.org is built, includes two products: the Learning Management System (LMS) and Studio. Studio is an authoring tool used to create the courses that are run and taught through the LMS. There are also many supporting subsystems, for example, for accessing analytics or for discovering courses. These subsystems are also made available through the larger Open edX infrastructure.

– Tell us about how edX is organized as an open source project.

Ned: There are over 800 sites running Open edX, with over 15,000 courses available across the globe. We are making a push to find more adopters, and they are easy to find, so the numbers keep going up! As a non-profit, we provide free online education. We also provide free tools for others to run their own online education. This is all part of our mission to increase access to education for everyone everywhere.

All of the Open edX source code can be found on github at https://github.com/edx. Here you will find 206 repositories, including many Django packages and other libraries that edX has created to support the entire development lifecycle. Well over 90% of what edX develops is open source. Most of the proprietary code concerns how edX markets and lists its courses, and Open edX includes an open source version used by our community members.

We’re a little different from many other open source projects, in that we run a site ourselves with the code we are writing. This creates some unique pressures. One of our challenges as an open source project is, how can we keep our own site running well while also delivering this complex suite of software to other sites, and accepting contributions from them? We don’t know the best answer yet, but we’re continuing to work on it.

– How big is the edX core team?

Ned: We have about 80 in engineers in Cambridge. We’re up to approximately 150 if you add contractors and regular contributors in the community. Given the pervasiveness of Python in our work, I’d say nearly every engineer is a Python developer.

– What are the main languages and technologies generally used at edX?

Robert: Python is used for the vast majority of code for Open edX. We use Django for our web applications, including the Django REST framework.

On the front-end, we have a lot of legacy Backbone.js and Underscore.js, but are slowly moving more and more to React. We also use Sass and Bootstrap.

EdX.org is hosted in AWS. Some example technologies we host there include Memcached, ElasticSearch, MySql, and Mongo. We use a mix of CloudFlare and CloudFront for CDN.

For development, Continuous Integration, and deployments we use a mix of Docker, GitHub, Jenkins, GoCD, Asgard, and Terraform, among others.

Finally, like any undertaking this large, we’ve got our special snowflakes like our Ruby-based discussion service that no one wants to work on except to rewrite it in Python, which still hasn’t happened.

– What’s so special about Python and why is it so widely used at edX?

Ned: Python has strong web development tools, and makes it easy to build quickly. We wanted to provide extensibility in the platform, so using an approachable language that would let people add packages to their installations was essential. The edX Studio authoring tool lets course teams add code to their courses, to randomize or grade assessments, so a dynamic language that let us execute that code was also really powerful.

Python being an open-source language, with a strong culture of open-source tools, has also been important for us, as an open-source project. It lets us affect the tooling we rely on, and it means that we can attract contributors who are already familiar with the Python world.

One last factor in choosing Python: edX started as an MIT project, and MIT’s teaching language is Python. Never underestimate the power of becoming familiar among people at a strong institution like MIT!

– Do you also do some data analysis or ML with Python?

Robert: In addition to using Python with Django, we also use Python for various scripts, linters, testing frameworks, and data analysis. My colleague Cale tells me he used a combination of ipython notebooks, pandas, and ggplot for his analysis work.

– Which Python version is currently in use at edX?

Ned: We are still using Python 2. We’ve been making some advances toward Python 3 where and when we can. As part of our Django upgrade process, we recently introduced tox in most of our repos to test against these various combinations. We’ll likely be switching to Python 3 with the rest of the Django community as they drop support for Python 2.

– What, in your opinion, are the main development challenges for edX developers today?

Robert: Keeping track of such a large codebase. We are trying to introduce more and more best practices as we can, and move more and more of the codebase in the right direction, but we have a lot of legacy to work with at this point. Like many development efforts, we have a big monolith as one of the many components of our architecture, and we are trying to work towards an architecture that is split enough, but not too much. It is a balancing act that is difficult to get right.

– There are many core developers at edX using PyCharm for their development. How does PyCharm help them be more productive?

Robert: We probably have about 40 developers using PyCharm. Many of the other developers end up using some combination of technologies like sublime, vim, and pdb. When I need to pair with someone who doesn’t use PyCharm, I often find myself asking how they can stand not being able to jump into the definition of a method.

Many people choose PyCharm for its debugging capabilities, as well as having an editor that understands Python. When you watch someone debug in a modern IDE, it is hard not to want to be able to do the same. For PyCharm users, we often use debugging, refactoring, autocompletion, version control, find definition or class, and PyCharm has great support for these technologies.

Over the last year, we’ve migrated our development environment from Vagrant to Docker. It was in tandem with PyCharm adding more and more Docker support. There have been some hiccups on this front, but it is nice to still be able to debug.

– Does PyCharm help boost team productivity on your project?

Ned: We really have a mix of people that like and dislike full IDEs. For those who like it, like myself, it definitely improves our workflow. For those who don’t, they think it doesn’t.

Since we need to accommodate both types of engineers, we generally rely on tools that work outside the IDE and run with our continuous builds to enforce rules like code style. Some of these tools have been difficult to fully integrate with PyCharm, but that would give us the best of both worlds.

– What about your individual productivity?

Robert: I have always used visual IDEs and can’t imagine why anyone would not want to. I find it much easier to do so many different things in PyCharm. I know I could use tools like grep, but why wouldn’t I want to search for code and be able to edit it and jump to definitions all in one seamless flow? I even like using the visual tools for resolving git conflicts.

There are also many things I do from the command line. It definitely has its place. I just find so many things that go more smoothly in PyCharm.

– How did you personally first learn about PyCharm?

Robert: I came to edX with a Java background and no experience with Python. At my last job we mostly standardized on Eclipse, but even that changed over time because you get what you pay for, I guess.

Other engineers at edX were using PyCharm, so that’s what I started to use, and I found it does a pretty decent job. I am often surprised by how much it can do given that Python is not compiled.

– Thank you for the interview, Robert and Ned!

If you want to learn more about Robert’s and Ned’s experiences, follow them on GitHub: Robert’s GitHub and Ned’s GitHub.

Posted in Interview | Tagged | Leave a comment

PyCharm 2018.1.2

PyCharm 2018.1.2 is out: download PyCharm now from our website.

What’s New

Docker Compose Improvements

Our Docker Compose interpreter in PyCharm 2018.1.1 starts your application service together with its dependencies, but leaves your dependencies running after shutting down the application. This has now been changed to match the command-line behavior, and will shut down your dependencies as well. Have you not tried using Docker Compose interpreters yet? Learn how to do so on our blog with Django on Windows, or with Flask on Linux.

Docker Compose users on Windows will be happy to learn that we’re now using named pipes to connect to the Docker daemon, which resolves an issue where some users were unable to run their scripts.

Further Improvements

  • The Python Console now receives focus when its opened
  • Various improvements to database support: columns that show the result of a custom function in MSSQL are now correctly highlighted, and more. Did you know that PyCharm Professional Edition includes all database features from DataGrip, JetBrains’ SQL IDE?
  • Improvements in optimizing Python imports
  • Various issues regarding React lifecycles have been resolved
  • Read more in our release notes

Posted in Release Announcements | Tagged | Leave a comment

Python 3.7: Introducing Data Classes

Python 3.7 is set to be released this summer, let’s have a sneak peek at some of the new features! If you’d like to play along at home with PyCharm, make sure you get PyCharm 2018.1 (or later if you’re reading this from the future).

There are many new things in Python 3.7: various character set improvements, postponed evaluation of annotations, and more. One of the most exciting new features is support for the dataclass decorator.

What is a Data Class?

Most Python developers will have written many classes which looks like:

Data classes help you by automatically generating dunder methods for simple cases. For example, a __init__ which accepted those arguments and assigned each to self. The small example before could be rewritten like:

A key difference is that type hints are actually required for data classes. If you’ve never used a type hint before: they allow you to mark what type a certain variable _should_ be. At runtime, these types are not checked, but you can use PyCharm or a command-line tool like mypy to check your code statically.

So let’s have a look at how we can use this!

The Star Wars API

You know a movie’s fanbase is passionate when a fan creates a REST API with the movie’s data in it. One Star Wars fan has done exactly that, and created the Star Wars API. He’s actually gone even further, and created a Python wrapper library for it.

Let’s forget for a second that there’s already a wrapper out there, and see how we could write our own.

We can use the requests library to get a resource from the Star Wars API:

This endpoint (like all swapi endpoints) responds with a JSON message. Requests makes our life easier by offering JSON parsing:

And at this point we have our data in a dictionary. Let’s have a look at it (shortened):

Wrapping the API

To properly wrap an API, we should create objects that our wrapper’s user can use in their application. So let’s define an object in Python 3.6 to contain the responses of requests to the /films/ endpoint:

Careful readers may have noticed a little bit of duplicated code here. Not so careful readers may want to have a look at the complete Python 3.6 implementation: it’s not short.

This is a classic case of where the data class decorator can help you out. We’re creating a class that mostly holds data, and only does a little validation. So let’s have a look at what we need to change.

Firstly, data classes automatically generate several dunder methods. If we don’t specify any options to the dataclass decorator, the generated methods are: __init__, __eq__, and __repr__. Python by default (not just for data classes) will implement __str__ to return the output of __repr__ if you’ve defined __repr__ but not __str__. Therefore, you get four dunder methods implemented just by changing the code to:

We removed the __init__ method here to make sure the data class decorator can add the one it generates. Unfortunately, we lost a bit of functionality in the process. Our Python 3.6 constructor didn’t just define all values, but it also attempted to parse dates. How can we do that with a data class?

If we were to override __init__, we’d lose the benefit of the data class. Therefore a new dunder method was defined for any additional processing: __post_init__. Let’s see what a __post_init__ method would look like for our wrapper class:

And that’s it! We could implement our class using the data class decorator in under a third of the number of lines as we could without the data class decorator.

More goodies

By using options with the decorator, you can tailor data classes further for your use case. The default options are:

  • init determines whether to generate the __init__ dunder method.
  • repr determines whether to generate the __repr__ dunder method.
  • eq does the same for the __eq__ dunder method, which determines the behavior for equality checks (your_class_instance == another_instance).
  • order actually creates four dunder methods, which determine the behavior for all lesser than and/or more than checks. If you set this to true, you can sort a list of your objects.

The last two options determine whether or not your object can be hashed. This is necessary (for example) if you want to use your class’ objects as dictionary keys. A hash function should remain constant for the life of the objects, otherwise the dictionary will not be able to find your objects anymore. The default implementation of a data class’ __hash__ function will return a hash over all objects in the data class. Therefore it’s only generated by default if you also make your objects read-only (by specifying frozen=True).

By setting frozen=True any write to your object will raise an error. If you think this is too draconian, but you still know it will never change, you could specify unsafe_hash=True instead. The authors of the data class decorator recommend you don’t though.

If you want to learn more about data classes, you can read the PEP or just get started and play with them yourself! Let us know in the comments what you’re using data classes for!

Posted in Tutorial | Tagged , | 18 Comments

PyCharm 2018.1.2 RC

We’re happy to announce that the release candidate of PyCharm 2018.1.2 is available for download on our Confluence page.

What’s New

Docker Compose Improvements

Our Docker Compose interpreter in PyCharm 2018.1.1 starts your application service together with its dependencies, but leaves your dependencies running after shutting down the application. This has now been changed to match the command-line behavior, and will shut down your dependencies as well. Have you not tried using Docker Compose interpreters yet? Learn how to do so on our blog with Django on Windows, or with Flask on Linux.

Docker Compose users on Windows will be happy to learn that we’re now using named pipes to connect to the Docker daemon, which resolves an issue where some users were unable to run their scripts.

Further Improvements

  • The Python Console now receives focus when its opened
  • Various improvements to database support: columns that show the result of a custom function in MSSQL are now correctly highlighted, and more. Did you know that PyCharm Professional Edition includes all database features from DataGrip, JetBrains’ SQL IDE?
  • Improvements in optimizing Python imports
  • Various issues regarding React lifecycles have been resolved
  • Read more in our release notes

Interested?

Download PyCharm now. The release candidate is not an EAP version. Therefore, if you’d like to try out the Professional Edition, you will either need to have an active license, or you’ll receive a 30-day trial period. The Community Edition is free and open source software and can be used without restrictions (apart from the Apache License’s terms).

If you have any comments on our RC version (or any other version of PyCharm), please reach out to us! We’re @pycharm on Twitter, and you can of course always create a ticket on YouTrack, our issue tracker.

Posted in Early Access Preview | Tagged | Leave a comment

Webinar: “Set Theory and Practice: Grok Pythonic Collection Types” with Luciano Ramalho

With PyCon US coming up, we wanted to squeeze in one more webinar, one themed towards PyCon. Luciano Ramalho, long-time Python speaker and teacher, PyCon luminary, and author of one of the best recent Python books, will join us to talk about Python’s data model.

  • Thursday, May 3
  • 5:00 PM – 6:00 PM CEST (11:00 AM – 12:00 PM EDT)
  • Register here
  • Aimed at intermediate Python developers

r2

Luciano’s Fluent Python book from O’Reilly gives deep treatment to this topic, and Luciano is focusing on one aspect: Python’s collection types. It’s a real pleasure for me personally to have Luciano on our webinar: we’ve been friends for many years and he’s one of the truly kind people that makes our community remarkable.

Speaking to You

Luciano Ramalho is a Principal Consultant at ThoughtWorks and the author of Fluent Python. He is the co-founder of the Brazilian Python Association and of Garoa Hacker Clube and a longtime web pioneer.

-PyCharm Team-
The Drive to Develop

Posted in Webinar | Tagged | Leave a comment

Webinar Recording: “Getting the Most Out of Django’s User Model” with Julia Looney

Yesterday we hosted Julia Looney for a webinar on Django user models. Julia has spoken on this topic at recent conferences and we were fortunate to have her with us. Julia’s slides, repositories, and the recording are now available.

During the webinar, Julia gave an overview of 3 options for custom user models:

  • Proxy Model
  • One-to-One Relationship
  • Custom User Model

-PyCharm Team-
The Drive to Develop

Posted in Video, Webinar | Tagged | Leave a comment

PyCharm Edu 2018.1: Going Beyond Python

800x400_blogPE_2018_1_@2x

Download

Back in 2014, we launched PyCharm Educational Edition with the vision to provide a free, open-source tool that would familiarize Python learners with real developer experience from the very start, and would offer teachers an easy way to share code practice exercises. Since then, we’ve received a lot of positive feedback from both students and teachers, which helped us improve PyCharm Edu a lot. Now, it’s time to go beyond Python.

Please welcome Java and Kotlin learning and teaching support available inside IntelliJ IDEA and Android Studio!

What’s new in PyCharm Edu 2018.1

Continue reading

Posted in Education, Release Announcements | Tagged , | Leave a comment

PyCharm Scientific Mode with Code Cells

You can use code cells to divide a Python script into chunks that you can individually execute, maintaining the state between them. This means you can re-run only the part of the script you’re developing right now, without having to wait for reloading your data. Code cells were added to PyCharm 2018.1 Professional Edition’s scientific mode.

To try this out, let’s have a look at the raw data from the Python Developer Survey 2017 that was jointly conducted by JetBrains and the Python Software Foundation.

To start, let’s create a scientific project. After opening PyCharm Professional Edition (Scientific mode is not available in the Community Edition), choose to create a new project, and then select ‘Scientific’ as the project type:

Create a Scientific Project

A scientific project will by default be created with a new Conda environment. Of course, for this to work you need to have Anaconda installed on your computer. The scientific project also creates a folder structure for your data.

If we want to analyze data, we’ll first need to go get some data. Please download the CSV file from the ‘Raw Data’ section of the developer survey results page. Afterward, place it in the data folder that was created in the scientific project’s scaffold.

Extract, Transform, Load

Our first challenge will be to load the file. The easiest way to do this would be to run:

So let’s run this. After writing this code in the main.py file that was created for us with the project, right click anywhere in the file and choose ‘Run’. We should see a Python console appear at the bottom of our screen after the script completes execution. On the right-hand side, we should see the variable overview with our dataframe. Click ‘View as DataFrame’ to inspect the DataFrame:

DF After Initial Load

We can see the structure of the CSV file here. The columns have headings like “Java:What other language(s) do you use?”. These columns are the result of multiple-choice answers: respondents were asked ‘What other language(s) do you use?’ and could select multiple answers. If an answer was selected, that string is inserted. Otherwise the string ‘NA’ is inserted (if you open the CSV file directly, you’ll be able to see a lot of ‘NA’ values).

If you scroll through the DataFrame a little more, you’ll see that in some cases Pandas was able to correctly infer some data, but in many cases it would be fairly unwieldy to work with the data in this shape.

To make the data easier to work with, we could recode columns after the read_csv call, and fix things. A better way is to configure the read_csv call with various parameters.

In this step, we’d like to make sure that our columns will be named in a way that’s easier to work with, and to make sure that the data types are all correct. To do this, we can use several parameters of read_csv:

  • names – allows us to specify the names of the columns (instead of reading them from the CSV file). We need to pay attention to the fact that if we specify this parameter, Pandas will import the header column as a data row by default. We can prevent that by explicitly specifying header=0, which indicates that the 0th row (the first row) is a header row.
  • dtype – enables us to specify a datatype per column, as a dict. Pandas will cast values in these columns to the specified datatype.
  • converters – functions that receive the raw value of the cell for a specified column, and return the desired value (with the desired datatype).
  • usecols – allows us to specify exactly which columns to import.

 

For more details, see the documentation for read_csv. Or, just write pd.read_csv in PyCharm to see it in the documentation tool window (this works if you have scientific mode enabled; if not, use Ctrl+Q to see the documentation).

The disadvantage of these parameters is that they take lists and dicts, which become very unwieldy for datasets with many columns. As our dataset has over 150 columns, it would be a pain to write them inline. Also, the information for one column would be spread among these parameters, making it hard to see what is being done to a column.

One great thing about analyzing data with Pandas is that we can use all features of the Python language. So let’s create a data dictionary with plain Python objects, and then use some Python magic to transform these to the structures Pandas needs.

The Data Dictionary

To recap, for every column we want to know what name we will want to give it, and how to encode the values. We also want to have the ability to drop a column.

Let’s create a separate file to hold our data dictionary: survey_data_dictionary.py. In this file, we define a class that describes what we want to do with a column:

Now we can make a big list of all of our columns, and describe one-by-one what to do with them. To make our lives easier, we can use Pandas to get a list of the current names of the columns. Run in the Python console:

This will print the full name of every column as a Python comment. Copy & paste the full list into the data dictionary Python file after the class definition. We can now use regex replacement to create instances of our ColumnDescription class.

Open the Replace tool (Ctrl+R or Edit | Find | Replace), and make sure to check ‘Regex’ to enable regex mode. Enter #([^\n]+) as the regex to find. This looks for the ‘#’ character, and then multiple characters that are *not* a newline. Everything between the parentheses is captured into a group, which we can then use in the replacement (use $1 for the first capture group).

As a replacement type (use Ctrl+Shift+Enter to create newlines):

Make sure you’ve indented the middle line, and used double quotes, and then click “Replace all”. Now we’ve created a lot of ColumnDescription objects. We’ll need them in a list for Pandas, so let’s wrap it with a list constructor now. Write DATA_DICTIONARY = [ before the first ColumnDescription call, and ] at the end of the file. Choose Code | Reformat Code to properly indent all of the ColumnDescription calls.

At this point, we can go back to our main.py and feed this data structure into the read_csv call. Let’s start by adding the ability to rename columns – we do this with the names parameter. This should be a list of strings, with as its length the number of columns in the CSV file.

We can use a Python list comprehension to extract the names from our list of ColumnDescription objects:

At this point we can provide this list to read_csv. We also need to remember to specify the header row explicitly so that Pandas doesn’t import the header row as a data row:

If we run this code, we should see nothing has changed. To see if it worked, let’s go back to our data dictionary and add a name to the first column:

After re-running main.py, we should now see that the first column has been renamed. Crack open a bottle of champagne to celebrate your success!

Let’s provide the other metadata from DATA_DICTIONARY to read_csv with a combination of list comprehensions and dict comprehensions:

Now all that’s left to do is to populate the rest of the data dictionary. Unfortunately, this is manual work; there’s no way for Pandas to know the design of our survey. If you want to follow along with the rest of the blog post without writing the entire data dictionary, you can grab a complete one from the GitHub repo.

For those columns that are either ‘NA’ or the name of the selected columns, we can create a small helper function that will convert these to booleans:

We can then specify this helper function as the converter for a column like this:

Another type of data that’s fairly common in surveys is categorical data: multiple choice, single answer. We can specify Pandas’ CategoricalDType as the data type for those columns:

Cleaning up our Data

Although our columns are now looking good, we may want to make some additional changes to our data. In the Python developer survey, the first question was:

Is Python the main language you use for your current projects?

  • Yes
  • No, I use Python as a secondary language
  • No, I don’t use Python for my current projects

All respondents who selected they don’t use Python were excluded from most of the rest of the survey. So we should drop these data points for our analysis. Let’s create a new code cell, and start cleaning up our data.

Code cells are defined simply by creating a comment that starts with #%%. The rest of the comment is the header of the cell, which you see when you collapse it:

As long as you have the scientific mode enabled in PyCharm Professional, you should see a dividing line appear, and a green ‘play’ icon to run the cell.

It’s fairly easy to select data in Pandas, so let’s complete our cell:

As the remaining choices are basically “Yes” and “No”, we can also turn the remaining data into a boolean:

Analyzing our Data

In our survey, users were asked what they thought the ratio is between the number of Python developers creating web applications, and the number of developers that do data science. To make things interesting, they were also asked what they thought other people thought this ratio was. Let’s see now if people think they agree with the rest of the world.

The questions look like this:

Please think about the total number of Python Web Developers in the world and the total number of Data Scientists using Python.

What do you think is the ratio of these two numbers?

Python Web Developers ( ) 10:1 ( ) 5:1 ( ) 2:1 ( ) 1:1 ( ) 1:2 ( ) 1:5 ( ) 1:10 Python data scientists

What do you think would be the most popular opinion?

Python Web Developers ( ) 10:1 ( ) 5:1 ( ) 2:1 ( ) 1:1 ( ) 1:2 ( ) 1:5 ( ) 1:10 Python data scientists

 

Make sure you’ve specified categorical data types in the data dictionary for both questions.

We can now go ahead and create a new code cell to start our analysis. Let’s start by getting the value counts:

We’re disabling sorting here to maintain the order that we’ve specified using the categorical data type. If we run this cell with the green play icon, we can then click ‘View as Series’ in the variable overview to have a glance at our data.

View as Series

We can also use Matplotlib to get a graphical overview of the data:

Ratio Plot

See the GitHub repository for the exact code used to generate the plot.

We can see in the plot that there’s a difference between what individual respondents thought the ratio was, and what they thought the most popular opinion was. So let’s dive a little deeper: how big is this difference?

Exploring Further

Although the data points are categorical, they represent numbers, so we can see what the numeric difference would be if we turn them into numbers. Let’s create a new code cell, and calculate the difference:

After running this cell, we see:

Turns out the difference in means isn’t very large. However, the distributions are fairly different. We can see in the plot that the ‘self’ distribution has a peak at the 5:1 web dev:data scientist point, whereas the ‘Most popular’ distribution trades some votes from 5:1 to 1:1. Fun fact: this same survey found about a 1:1 distribution between web developers and data scientists with its Python usage questions.

ratio_plot

To see whether or not we have a significant difference, we can use a Chi-Square test. The scipy.stats package contains a method to calculate this statistic. So let’s create a last code cell to finish this investigation:

This results in: Power_divergenceResult(statistic=294.72519972505006, pvalue=1.1037599850410103e-60).

In other words, there’s a 1-60 chance that this is the result of random chance, and we can conclude this is a statistically significant difference.

What’s Next?

We’ve just shown how to ingest a fairly large CSV file into Pandas, and how to handle the conversion of data from its raw form to a form that’s easier to analyze. For the example, we looked into what respondents think the Python ecosystem looks like. And we’ve confirmed that people think that others have a different opinion from themselves (also, water is wet).

Now it’s your turn! Download the CSV (and you may want to grab the data dictionary from this blog post’s repo) and let us know what interesting things you discover in the Python developer survey! It contains many interesting data points: what people use Python for, what their job roles are, what packages they use, and more.

Posted in Tutorial | Tagged , | 3 Comments

PyCharm Hotfix for Pip 10.0 and IPython 6.3.0 compatibility

Pip 10.0 is close to being released, and changes parts of its API. Several older versions of PyCharm are incompatible with the newer version. If you’d like to use PyCharm with the new version of pip, please update PyCharm.

In addition, IPython 6.3.0 causes a ValueError if run after a script by using the “Show command line afterwards” option. This bug has also been resolved.

The new versions of PyCharm are:

  • 2016.3.5
  • 2017.1.7
  • 2017.2.6
  • 2017.3.5

If you’re using a version with a minor update (last number) lower than those mentioned, please update PyCharm. If you wish to update to the latest version (2018.1.1 at the time of writing), get it here. If you wish to get one of the hotfixed older versions, you can find the appropriate release on our previous versions page.

Posted in Release Announcements | Tagged | 2 Comments

PyCharm 2018.1.1

We’re happy to announce that PyCharm 2018.1.1 is now available from our website!

Please update manually

Check for updates

Due to an error on our side, PyCharm 2018.1 will not inform you that the new version is available. Please either download PyCharm 2018.1.1 from our website or alternatively click Help | Check for Updates (on macOS: PyCharm | Check for Updates).

What’s new

Improved @dataclass support

Data classes are a Python 3.7 feature that many developers are looking forward to. As Python 3.7 isn’t final yet, some details were changed recently. After an update to the PEP, we’ve made improvements to our support:

  • The hash parameter was changed to unsafe_hash. This parameter forces @dataclass to generate the __hash__ dunder method, even if the frozen parameter is set to False. In this case, the hash of your class can change if you make any changes to your class’s fields. A changing hash will break data structures that depend on it, like Dict. The Python core developers wanted to highlight that this should be done very carefully.
  • We now warn you if you use a parameter to generate a method which you’ve already defined.

Further Improvements

  • Pip 10.0 is now supported. A compatibility issue with IPython 6.3.0 was resolved.
  • Matplotlib plots are now correctly shown for scripts executed with “Run in Python Console”.
  • Django or Flask is now installed if you create a new project of those types on a remote machine
  • And more: read the release notes for details

Posted in Release Announcements | Tagged | 4 Comments