Analyzing Data in Amazon Redshift with Pandas

Redshift is Amazon Web Services’ data warehousing solution. They’ve extended PostgreSQL to better suit large datasets used for analysis. When you hear about this kind of technology as a Python developer, it just makes sense to then unleash Pandas on it. So let’s have a look to see how we can analyze data in Redshift using a Pandas script!

Setting up Redshift

If you haven’t used Redshift before, you should be able to get the cluster up for free for 2 months. As long as you make sure that you don’t use more than 1 instance, and you use the smallest available instance.

To play around, let’s use Amazon’s example dataset, and to keep things very simple, let’s only load the ‘users’ table. Configuring AWS is a complex subject, and they’re a lot better at explaining how to do it than we are, so please complete the first four steps of the AWS tutorial for setting up an example Redshift environment. We’ll use PyCharm Professional Edition as the SQL client.

Connecting to Redshift

After spinning up Redshift, you can connect PyCharm Professional to it by heading over to the database tool window (View | Tool Windows | Database), then use the green ‘+’ button, and select Redshift as the  data source type. Then fill in the information for your instance:

Redshift Connection

Make sure that when you click the ‘test connection’ button you get a ‘connection successful’ notification. If you don’t, make sure that you’ve correctly configured your Redshift cluster’s VPC to allow connections from on port 5439.

Now that we’ve connected PyCharm to the Redshift cluster, we can create the tables for Amazon’s example data. Copy the first code listing from here, and paste it into the SQL console that was opened in PyCharm when you connected to the database. Then execute it by pressing Ctrl + Enter, when PyCharm asks which query to execute, make sure to select the full listing. Afterward, you should see all the tables in the database tool window:

Database Tool Window

To load the sample data, go back to the query window, and use the Redshift ‘load’ command to load data from an Amazon S3 bucket into the database:

Redshift Query

The IAM role identifier should be the identifier for the IAM role you’ve created for your Redshift cluster in the second step in the Amazon tutorial. If everything goes right, you should have about 50,000 rows of data in your users table after the command completes.

Loading Redshift Data into a Pandas Dataframe

So let’s get started with the Python code! In our example we’ll use Pandas, Matplotlib, and Seaborn. The easiest way to get all of these installed is by using Anaconda, get the Python 3 version from their website. After installing, we need to choose Anaconda as our project interpreter:

Anaconda Root Env

If you can’t find Anaconda in the dropdown, you can click the settings “gear” button, and then select ‘Add Local’ and find your Anaconda installation on your disk. We’re using the root Anaconda environment without Conda, as we will depend on several scientific libraries which are complicated to correctly install in Conda environments.

Pandas relies on SQLAlchemy to load data from an SQL data source. So let’s use the PyCharm package manager to install sqlalchemy: use the green ‘+’ button next to the package list and find the package. To make SQLAlchemy work well with Redshift, we’ll need to install both the postgres driver, and the Redshift additions. For postgres, you can use the PyCharm package manager to install psycopg2. Then we need to install sqlalchemy-redshift to teach SQLAlchemy the specifics of working with a Redshift cluster. This package is unfortunately not available in the default Anaconda repository, so we’ll need to add a custom repository.

To add a custom repository click the ‘Manage Repositories’ button, and then use the green ‘+’ icon to add the ‘conda-forge’ channel. Afterwards, close the ‘Manage Repositories’ screen, and install sqlalchemy-redshift. Now that we’ve done that, we can start coding!

Conda Forge Channel

To show how it’s done, let’s analyze something simple in Amazon’s dataset, the users dataset holds fictional users, and then indicates for every user if they like certain types of entertainment. Let’s see if there’s any correlation between the types of entertainment.

As always, the full code is available on GitHub.

Let’s open a new Python file, and start our analysis. At first, we need to load our data. Redshift is accessed just like a regular PostgreSQL database, just with a slightly different connection string to use the redshift driver:

Also note that Redshift by default listens on port 5439, rather than Postgres’ 5432.

After we’ve connected we can use Pandas’ standard way to load data from an SQL database:

The dataset holds users’ preferences as False, None, or True. Let’s interpret this as True being a ‘like’, None being ambivalent, and False being a dislike. To make a correlation possible, we should convert this into numeric values:

And now we’re ready to calculate the correlation matrix, and present it. To present it we’ll use Seaborn’s heatmap. We’ll also create a mask to only show the bottom half of the correlation matrix (the top half mirrors the bottom).

After running this code, we can see that there are no correlations in the dataset:

Redshift Example Correlations

Which is strong evidence for Amazon’s sample dataset being a sample dataset. QED.

Fortunately, PyCharm also works great with real datasets in Redshift. Let us know in the comments what data you’re interested in analyzing!

Posted in Tutorial | Tagged , , , , | 1 Comment

Support a Great Partnership: PyCharm and Django Team up Again

Last June (2016) JetBrains PyCharm partnered with the Django Software Foundation to generate a big boost to Django fundraising. The campaign was a huge success. Together we raised a total of $50,000 for the Django Software Foundation!

This year we hope to repeat that success. During the two-week campaign, buy a new PyCharm Professional Edition individual license with a 30% discount code, and all the money raised will go to the DSF’s general fundraising and the Django Fellowship program.

Promotion details

pycharm-django-blog-postUp until Aug 28th, you can effectively donate to Django by purchasing a New Individual PyCharm Professional annual subscription at 30% off. It’s very simple:

1. When buying a new annual PyCharm subscription in our e-store, on the checkout page, сlick “Have a discount code?”.
2. Enter the following 30% discount promo code:
Alternatively, just click this shortcut link to go to the e-store with the code automatically applied
3. Fill in the other required fields on the page and click the “Place order” button.

All of the income from this promotion code will go to the DSF fundraising campaign 2017 – not just the profits, but actually the entire sales amount including taxes, transaction fees – everything. The campaign will help the DSF to maintain the healthy state of the Django project and help them continue contributing to their different outreach and diversity programs.

Read more details on the special promotion page.


“Django has grown to be a world-class web framework, and coupled with PyCharm’s Django support, we can give tremendous developer productivity,” says Frank Wiles, DSF President. “Last year JetBrains was a great partner for us in support of raising money for the Django Software Foundation, on behalf of the community, I would like to extend our deepest thanks for their generous help. Together we hope to make this a yearly event!”

If you have any questions, get in touch with Django at or JetBrains at

Posted in Sales and Promotions | Tagged , | 6 Comments

PyCharm 2017.2.1 out now

We’re happy to announce that PyCharm 2017.2.1 is now available from our website


  • Docker: Python console volume issue fix, and various other fixes
  • Debugging: concurrency visualization for Python 3.6, and console fixes
  • Inspections: variable type comparison bugs
  • Namespace package resolution bug on remote interpreters
  • JavaScript: Go to Declaration bug fix, and more
  • See details for all fixes in the release notes

To update PyCharm get the new version from our website. You can also update by choosing Help | Check for Updates in PyCharm, or use JetBrains Toolbox to keep all your JetBrains tools up to date.

To keep up with the latest news on PyCharm, please subscribe to our blog posts (leave your email address in the box on the right), or follow us on Twitter.

-PyCharm Team
The Drive to Develop

Posted in Release Announcements | 2 Comments

SSH Agent: Simplify SSH keys

Security is important, and that’s why we all use key pairs when authenticating to our servers. And as is best practice, we also protect our private key with a passphrase, don’t we? Ideally the passphrase is long and complicated, and therefore a massive pain to type.

So how can we be lazy while still maintaining security? SSH Agent is the answer! Rather than typing the passphrase to your key individually for every connection, we can type it only once when we turn the computer on, and then SSH agent will keep the decrypted key safe in memory.

Let’s have a look how to do this on Windows, Mac, and Linux. Feel free to skip ahead to the section that’s relevant for your OS. At the very end I’ll show how to setup PyCharm to connect using the key in SSH Agent, which is the same on all platforms.

On Windows

The default tool for SSH connections on Windows is PuTTY. Let’s walk through generating a key and then using it with PuTTY.

When you download PuTTY, you don’t just get PuTTY, you get various other tools as well. We’ll use one of those to generate a key to use with SSH later. Open PuTTYgen, and click the generate button to start creating a key:


After simulating an earthquake with your mouse, you should have a shiny new SSH key.

PuTTYgen after key generation

Give it a secure passphrase, and then save the private key somewhere safe. At this point you should copy the public key to the ~/.ssh/authorized_keys file on the computer you’d like to connect to. If you’d like a computer to try this with, you can create a VM using the Vagrantfile here. If you’re using the VM, choose ‘Save public key’ to overwrite in the files subfolder, and then run vagrant up to create the virtual machine.

Now that we have a Linux machine with the public key installed, let’s connect to it. To manually use the key, we can configure PuTTY to connect to our box. We will connect to localhost, on port 2222 (the port that Vagrant exposes SSH to by default). And then in the menu on the left select SSH, and then Auth. In the settings you can configure a key file to use:

Key Configuration in PuTTY

When you click open now, you should be prompted for the username you’d like to log in as. If you’re using the sample box, the username is ‘example’. At this point you’ll need to enter the passphrase for the private key:

PuTTY connected

If you right-click the title bar, and choose Duplicate Session, you’ll need to re-enter the passphrase. Let’s see if we can make this a little more efficient.

A tool that comes with PuTTY is Pageant. What this does is hold your private key, and as long as it’s running, it can provide the unlocked private key to PuTTY (and other tools like PyCharm). After you start Pageant, you may notice that nothing seems to happen. Pageant lives in the Windows task bar on the bottom right of your screen. Right-click Pageant’s icon, and you can choose ‘Add Key’ to add your key to Pageant:

Task Area

After finding your key on the filesystem, you’ll need to enter the key’s passphrase. However, after this is done, you won’t need to enter it again until Pageant is stopped (for example, when you shutdown your computer).

So when we duplicate our session, we can see that we don’t need to enter a passphrase anymore:

PuTTY Before and After

Although it’s possible to configure Pageant to start with Windows, it might be more interesting to use another tool: KeeAgent. If you’re already using KeePass to store your passwords, you can use the KeeAgent plugin to make KeePass work just like Pageant. And then when you unlock your KeePass workspace, your private key will automatically be unlocked as well. Read how to set it up here.

On macOS

To create a key on macOS, open a Terminal window, and run:

Make sure to add a passphrase for your key.

After this finishes you’ll have both the private key file, and the public key file (with a .pub extension). Let’s get a Linux computer to try our new key with. Clone this repository, copy the .pub file over in the files subdirectory, and then run vagrant up to create a VM (you’ll need Vagrant for this).

When the VM has started, we can use ssh to connect to it:

We need to disable host key checking, as otherwise SSH will see a mismatch between your computer’s key and the VM’s key. When you run this command, you’ll be prompted for the key’s passphrase. So let’s see if we can get rid of this prompt.

macOS comes with a Keychain application which stores your passwords in the background. To see the stored passwords, use the Keychain Access application. We can store the passphrase there by using:

On version of macOS before Sierra, you’re done now. On macOS Sierra and later, you’ll need to do some additional steps: see this answer on StackOverflow.

After adding it to your Keychain, you should see that whenever you run the SSH command again, it’ll take the passphrase from your Keychain.

On Linux

On Linux you can use the ssh-add command, after running it once and entering your passphrase, you won’t have to re-enter your passphrase until you log out:

Without any parameters this will add the key in ~/.ssh/id_rsa. If you get an error ‘Could not open a connection to your authentication agent’, start ssh-agent by running: eval $(ssh-agent -s). To make this easier, you could add it to your ~/.bashrc, or see if your distribution or desktop environment has a tool that support SSH Agent. For example GNOME Keyring supports ssh-agent.

Using Your Key in PyCharm Professional Edition

After storing your key in SSH Agent (or a compatible tool), you can connect to a Python interpreter on a remote machine using that key for authentication. Please keep in mind that remote interpreters are only available in PyCharm Professional Edition. So let’s open up PyCharm, and go to the interpreter settings (File | Settings | Project Interpreter). Then use the gear icon, and choose ‘Add Remote’.

Add Remote

Then choose “SSH Credentials”, enter the hostname, port, and username. By setting authentication method to ‘Authentication agent’ you’re instructing PyCharm to take the credentials from SSH Agent or a compatible tool.

SSH Credentials

Finally, just click OK to connect, and you’ll be using the Python interpreter on the machine or VM you’re connecting to.

Posted in Tutorial | Tagged , , , , | 1 Comment

PyCharm 2017.2.1 RC out now

After the 2017.2 release, we haven’t been sitting still, and have fixed a lot of bugs:

  • Docker: Python console volume issue fix, and various other fixes
  • Debugging: concurrency visualization for Python 3.6, and console fixes
  • Inspections: variable type comparison bugs
  • Namespace package resolution bug on remote interpreters
  • JavaScript: Go to Declaration bug fix, and more
  • See details for all fixes in the release notes

You can try the release candidate of PyCharm 2017.2.1 now: get it from Confluence.

To keep up with the latest news on PyCharm, please subscribe to our blog posts (leave your email address in the box on the right), or follow us on Twitter.

-PyCharm Team
The Drive to Develop

Posted in Release Announcements | Tagged | 3 Comments

Webinar Recording: Teaching Python 3.6 with Games

Yesterday Paul Craven, computer science professor and creator of the Arcade library for Python 2d games, gave a webinar about teaching Python to university students using game writing. The recording is now available:

During the webinar, Paul discussed his techniques for teaching introductory programming, an approach which he has refined over the years based on seeing what works and what doesn’t work. He covered:

  • The iterative process of producing teaching materials
  • Having a good toolchain (PyCharm for writing Sphinx docs, GitHub, ReadTheDocs, etc.)
  • How PyCharm’s IDE features impacted the quality of what students did (PEP 8, function completion, spell checking, etc.)
  • The effect Python 3.5/3.6 type hinting had on the learning/teaching experience
  • How games as a topic made learning fun, and how he adjusted the game library to make it more teachable
  • Teaching self-sufficiency by learning to browse APIs
  • The importance of already-working game examples

Paul then covered a bit of Arcade and the kinds of games it makes easy to write.

The webinar lasted a bit over an hour and had one of our highest number of questions posed during the session. We’ll try to get Paul back again in the future and go more in depth on game writing.

If you have any questions or comments about the webinar, feel free to leave them in the comments below, or you can reach us on Twitter. Paul Craven is on Twitter as well, his Twitter handle is @professorcraven.

-PyCharm Team
The Drive to Develop

Posted in Webinar | 7 Comments

Using Docker Compose on Windows in PyCharm

By popular demand, PyCharm 2017.2 Professional Edition expands its Docker Compose support to those of you who run Windows. Let’s take a look and see how this works!

In our last Docker Compose post, we created a guest book in Flask. This time we’ll take a simple todo app in Django, and dockerize it. The starting point today will be a Django todo app which works locally, see the code on GitHub.

Setting Up Docker on Windows

If you don’t have Docker installed yet, you’ll need to make a decision about which version to install:

  • Are you using anything other than Windows 10 Pro or Enterprise, or do you have Virtualbox, VMware, or anything other than Hyper-V installed: get Docker Toolbox and Virtualbox.
  • If you’re on Windows 10 Pro or Enterprise, and you have either Hyper-V or no virtualization software installed: get Docker for Windows.

The reason for this is that Docker for Windows is based on Microsoft’s Hyper-V virtualization technology. Hyper-V is a seriously cool bit of tech that wraps your Windows in a hypervisor, rather than installing a hypervisor within Windows. What this means is that effectively you’ll be using a VM when you’re using your computer. Hypervisors are unable to run on a VM, so when you enable Hyper-V on Windows, you can’t run any other VM software anymore.

Setting Up Docker Toolbox

If you installed Docker for Windows, you can skip this section.

Docker Toolbox works by redirecting all your Docker commands to a Docker instance running either on a local VM, or on a cloud service. Today, let’s set up a Virtualbox VM on our local computer. Run with a cmd window:

Let’s verify that it works by connecting our command-line Docker. To do so, we need to run this cryptic looking command in cmd:

To see what it does, run docker-machine env default --shell=cmd, it will output several commands to set environment variables that configure Docker and Docker Compose. The long command above simply runs these all.

At this point, if you run docker run hello-world you should see a cheerful message that confirms that everything works:

Hello World Container Windows

Running Django in Docker

For our Django app we’ll need to create two containers: a database container, and a container which holds our actual application. We’ll use Docker Compose to link the containers together.

Let’s get started with writing our Dockerfile:

This is enough configuration to start Django, however, before we can proceed we should also make sure that we configure our database. For this we should write a compose file where we add both our Django service, and our postgres service:

The postgres image is easily configured with environment variables, for details read the image’s page on Docker hub. In this case we’re only setting the password, leaving the defaults for username and database. If you want to persist data when the container is destroyed, you’d need to create a named volume for the /var/lib/postgresql/data folder.

After adding these files, let’s just head over to our Django to configure our new database credentials:

In a Docker Compose project, you can connect to linked containers by their service name unless you’ve specified an alias in the link section of the compose file. In this case we wrote:

Therefore we should tell Django to look for the db host. Although I’m hardcoding it here for simplicity, ideally you’d get this configuration from environment variables.

This is all the configuration we need to do, and we can get started with building our images now.

Let’s first let PyCharm know where to find Docker. Go to Settings | Build, Execution, Deployment | Docker, and make sure that your Docker is configured. If you’re using Docker for Windows, and there’s no Docker listed, just click the green ‘+’ icon, and the defaults should be correct. If you’re using Docker machine, select the ‘Docker Machine’ radio button, and select the correct machine in the dropdown:

Docker-Machine Settings in PyCharm

After that’s set up, we can go and add our Docker run configuration, go to the ‘Edit Run Configurations’ screen, and add a Docker Deployment run configuration.

Add Docker Deployment Run Config

Let’s name it Rebuild Images, and in the Deployment field, select the compose file:

Docker Compose Run Configuration in PyCharm

Now when we run this configuration, we should see that all the layers are pulled from Docker hub, and both the database and Django are started.

Setting up the Python Remote Docker Interpreter

Now to make sure that we can debug our Django project, let’s configure PyCharm to use the Python interpreter within our Docker container. To do so go to Settings | Project Interpreter, and use the gear icon to select Add Remote:

Interpreter Settings

Choose the Docker Compose interpreter type, and make sure the docker-compose.yml file is selected. The service you choose under ‘service’ is the service you want to debug with this run configuration, when you start it, all services will still be started either way. As the only Python service is ‘web’, let’s select that here:

Add Compose Interpreter

Afterwards you should see that PyCharm detected the packages we configured in requirements.txt, and the path mappings:


Now we can add a normal Django server run configuration, just make sure to set host to ‘’ to make sure that we listen to requests coming from outside the Docker container.

Now first run the migrations, by going to Tools | Run task, and then writing migrate. After this command has completed, we can use the regular run and debug icons in PyCharm to run and debug our Django project. So let’s run it!

To see our Django application in the browser, go to http://localhost:8000 if you’re using Docker for Windows. If you’re using Docker Machine, we’ll first need to check on which IP our Docker Machine is running, run docker-machine ip default on the command line. In my case this is, so I’ll go to in the browser.

Now if you see a message “DisallowedHost at /”, go to Django’s, and find ALLOWED_HOSTS. During development we can change this to: ALLOWED_HOSTS = [‘*’] to disable this check. Please make sure you appropriately configure it when running in production however.

When everything works, we can add a breakpoint, and debug as usual:

Debug Django

Posted in Tutorial | Tagged , , , | 12 Comments

PyCharm Edu 4 is Out, Enhanced for Both Learners and Educators

PyCharm Edu 4 is out now! This update brings a better user experience to both learners and educators, making the product’s use as simple as possible, whether it is used for learning, or for teaching.

PyCharm 4 is out

First of all, we’ve changed the welcoming UI. Now you begin by choosing your role, Learner or Educator. Depending on your choice, you get access to the courses you can join as a learner and can practice with the help of simple and effective “fill in the missing code” exercises. Or, you can create your own code practice tasks and integrated tests as an educator.


With PyCharm Edu 4, learners can now easily choose the course to join and start learning thanks to the new Welcoming UI.

Learn more about PyCharm Edu for learners.


Educators can now easily manage and share their learning materials thanks to better integration with Stepik.

Learn more about PyCharm Edu for educators.


Please let us know what you think about PyCharm Edu! Share your feedback here in the comments, on twitter or report your findings on YouTrack.

Your PyCharm Edu Team

Posted in Education, Release Announcements, Screencast | Tagged , | 2 Comments

PyCharm 2017.2 Out Now: Docker Compose on Windows, SSH Agent and more

PyCharm 2017.2 is out now! Get it today for Docker Compose support on Windows, SSH Agent, Azure Databases, and Amazon Redshift support.

Get it from our website

  • We’ve added some small improvements for editing Python files: a quick fix to change the signature of a function you’re calling, an inspection to make sure your Python strings formatted with str.format() work correctly, and auto-completion for type hints
  • Docker Compose is additionally supported on Windows (this feature is available only in PyCharm Professional Edition)
  • PyCharm 2017.2 supports using SSH Agent to handle your SSH private keys. Compatible tools like Pageant on Windows are also supported. (only in Professional Edition)
  • Database tools fully support connecting to Amazon Redshift and Azure Databases (only in Professional Edition)
  • Run inline SQL on multiple data source (only in Professional Edition)
  • Improvements for Version Control, JavaScript, and HiDPI support (JavaScript support is available only in Professional Edition)
  • And more, see our what’s new page for details

Get PyCharm 2017.2 now from our website!

Please let us know what you think about PyCharm! You can reach us on Twitter, Facebook, and by leaving a comment on the blog.

PyCharm Team
-The Drive to Develop

Posted in Release Announcements | Tagged | 23 Comments

Interview: Paul Craven on Python Gaming and Teaching

Writing games in Python is fun, so how about using it to teach computer programming? Paul Craven is both a professor and the creator of Arcade, a 2d game library for Python. He’s doing a webinar with us next week, so we talked to him about teaching Python, using Python 3 type hints, why another Python game library, and how IDEs fit into teaching.


Thanks a bunch for doing the webinar next week. First, can you tell us a little bit about yourself?

I worked in the IT industry for 15 years before switching to teaching at Simpson College, a small 4-year college in Iowa. My main interest has been getting first-time programmers to realize that programming can be fun. That moment when a student cheers out loud because they finally figured out how to get sprites to move correctly on the screen? It is a beautiful thing to see.

You teach programming, and you created the Arcade game library to help. Before talking about Arcade, can you explain the motivation behind having a framework that you can teach?

Teaching is like engineering. Each semester you work to improve how you teach students. I had been using the Pygame library. But I wanted a library that I could improve on based on what I saw from students. For example:

Function and parameter names that students intuitively understand. Each year I had to teach them “set_mode” opens a window. Why not just have the function called “open_window”?
Support for functions students which ask for. In Pygame drawing ellipses with thick borders always had a moire pattern because of a bug in the library. And you can’t tilt ellipses when you draw them. Every year I have a student that wants to draw a football. And each year they were frustrated that it looked awful. I wanted a library where it just worked.
Students would download a graphic for their sprite. But it would be too large and there was no easy way to scale the image. That always resulted in hours of wasted time explaining how to do the scaling. What if the library just supported scaling?

After a while I collected such a long list of things like that, I decided to create a game library that where I wouldn’t have to teach “around” these issues.

Beyond using it for teaching, can you talk a bit about Arcade? What is it, how is it different, who might want to use it?

Arcade is great for sprite-based 2D games. It is simple and easy to get started with the library. There is no need to learn a complex framework. If you’ve wanted to create a game for fun, but Kivy, Unity, Unreal would just take more time to learn than what you’ve got, Arcade is a better choice. If you want to quickly create a scientific visualization without a lot of overhead code, Arcade can help you there too.

Arcade uses OpenGL and Pyglet. With OpenGL acceleration, it can draw a lot of sprites fast.

I use Arcade for my PyCharm tutorials and Arcade’s Python 3.6 type hinting is a big reason. Can you talk about your experience, as a library author and teacher, with type hinting?

New programmers often get confused when calling functions and methods. What data does it expect? And when the program doesn’t run, the students aren’t yet experts in reading stack traces. So they are stuck with a 500 line program that doesn’t work and they don’t know why. Frustrating.

Type hinting can sometimes tell students that they are passing unexpected data to the function. It does this before they run the program. Before they’ve even moved to writing the next line of code. It seems trivial, but it’s not. I found students able to create better, more complex programs because PyCharm and type hinting kept them from that error and allowed them to move on.

You also use PyCharm CE with your students. What’s been your experience having beginners start with an IDE?

I’ve taught students with an IDE and without an IDE. The biggest advantage is how the IDE can help students catch errors early. PyCharm’s built-in PEP-8 checking is huge. Also, built in spell-checking! Imagine trying to read program comments written without a spell-checker from today’s student. Students come up with some interesting ways to spell words.

Posted in Education, Interview | Tagged | Leave a comment