Creating a Python Development Environment on Amazon EC2

In the last two blog posts of this series we discussed how to set up a local VM-based development environment for a cloud application, and then built a Flask-RESTful app within this environment. Today, we’ll take our app to AWS, and we’ll set up a remote development environment.

The environment we’ll describe here is configured for development, not production. If you’re interested in seeing how to prepare this application for production, let me know in the comments!

This blog post was written on Ubuntu; it should work as well on macOS. There are some difficulties on Windows with the SSH configuration for Terraform.

The Environment

We’ll create a two-tier environment on AWS: a web server, and a database server. To keep in line with best practices, we’ll want to make only the absolutely necessary ports open. In line with this practice, we’ll add a third EC2 instance to be a ‘management host’. We’ll use that as an SSH bastion to connect to the other machines. We’ll also run Ansible configuration changes from this box.

Infrastructure Overview

As only the management and web hosts need to be exposed to the internet, we can put the database host in a private subnet. Not shown in the diagram is the NAT gateway that’s necessary to make sure we can access the internet from the DB host. It would be very hard to install PostgreSQL on it without it.

Terraform

Now that we know the configuration we want on AWS, we need a way to make it happen. We could manually go into the AWS Console and configure everything from there. However, that would be hard to reproduce, and impossible to version control.

The next options come from AWS themselves: the CLI and CloudFormation. The CLI makes scripting easy, but it’s hard to make scripts that are idempotent and allow for easy changes. CloudFormation is a solution that allows us to describe a desired infrastructure in JSON which can then be applied to AWS.

Terraform is software from the same guys who make Vagrant and Consul, that allows us to write our desired state in a language that’s a lot user-friendlier than CloudFormation JSON. As it’s what a lot of the cool kids are using today, we’ll use this for the grouporder project.

Describing the Configuration

I’ve split the code into two repositories: the project itself, and the infrastructure. The infrastructure essentially consists of three files: the network setup, the instances, and some additional provisioning details. Terraform reads all files in the directory and essentially combines them into one big file. Splitting the files is just to make it easier for us to find things.

The networking setup creates the public and private subnets within our virtual private cloud (VPC). By creating a new VPC this application is fully separated from other applications in my AWS account. Furthermore, we’ll have internal IP addresses we can use to communicate between the EC2 instances for this project.

Any EC2 instance launched in the public subnet will receive both a private IP address (in the 10.0.x.x range) and a public IP address. The instances in the private subnet will only get a private IP.

Let’s have a closer look at the configuration of the database host (which is in the private subnet):

It is all pretty simple. We first look up the AMI ID of a current Ubuntu 16.04 LTS image and then we describe the instance we want to launch. It is easy to see here how Terraform allows us to link pieces of our configuration together. In the networking file I’ve defined an aws_subnet, which I gave the name private_subnet, and we’re accessing it’s id attribute to instruct Terraform to launch this EC2 instance in that subnet.

Let’s take a look at the security group for our database server, as it’s the most interesting one:

We’re allowing SSH and SQL connections in, but we specify the security groups that it’s allowed to come in from. So for example, an SSH connection from the web host would be denied, even though the web host is in the same subnet as the management host. As we’re writing full security groups, Terraform needs us to specify explicitly that our DB host can communicate to the outside world.

For the sake of  ‘brevity’ I won’t go through the rest of the Terraform configuration here, but it’s all in the repo, and if you have any questions, let us know in the comments!

Applying the Configuration

Please note, that we’re using a NAT gateway, which is not included in AWS’s free tier, so starting this configuration will cost you money. The total cost should be under 10 cents per hour, with the NAT gateway surprisingly being over half the cost.

Now that that’s out of the way, let’s get started. To be able to do this you’ll need:

  • An AWS account
  • An IAM role with the appropriate permissions (at least EC2FullAccess and VPCFullAccess), and an access key
  • Terraform, get it here, and then place it in a directory that’s on your PATH

Terraform will use the AWS credentials you’ve configured for the AWS CLI. If you don’t have the CLI installed, you can manually create a file in ~/.aws/credentials with the following contents:

Then check out the Terraform files from the infrastructure repository, and open them in PyCharm. Be sure to get the ‘HashiCorp Terraform’ plugin, to get code completion in the Terraform files.

When starting a new project, or when you add new providers (like ‘AWS’ and ‘Templating’) in Terraform, you’ll need to run terraform init to make Terraform configure these. So open the Terminal in PyCharm (Alt+F12) and run terraform init inside the project folder.

Before applying, you need to make sure that you have a private key loaded in your SSH agent and have uploaded the public key to AWS. Without the key, you won’t be able to provision or access the EC2 machines.

After Terraform has initialized, you can run terraform apply, which will first check the current state on AWS, and then ask you whether to apply changes to make AWS look like the state described in the Terraform files. When you start the command, Terraform will ask about any variables that are required for the configuration. For the grouporder-aws configuration, two variables are required: the desired AWS region, and your AWS Key name. Please keep in mind that public keys are region specific, so you should choose the same region where you’ve uploaded your key.

If you got the PyCharm plugin, you can use a run configuration for this. Otherwise, just run terraform apply on the command line, and answer yes when it asks whether or not to apply the changes.

Infrastructure and Software

The Terraform files configure the infrastructure and then also kick off provisioning. For this setup, we’re checking out the grouporder repository on the management machine, and then using Ansible to setup the software on all three EC2 instances.

The Ansible configuration is mostly in the grouporder repository, which also contains a Vagrantfile to make it possible to have everything run on a single VM for local development. Ansible “roles” enable us to describe certain server behaviors, which we can choose to deploy to machines as described in the inventory file.

To tell Ansible about our AWS configuration, we’re using Terraform to fill out a template of an Ansible inventory file, which is then transferred to the server:

Although Ansible is used to configure the Python environment on the web server, we’re not actually checking out the code there, as we’ll connect to the machines with PyCharm soon to be able to start developing on this environment.

Setting up PyCharm

Now that we have the environment spun up in AWS, we’d like to get started with developing on the cloud. So let’s hook up PyCharm!

Due to our network settings, we’ll need to connect through the management host to both our web and database hosts. PyCharm 2017.3 can read an SSH config file to set up this kind of connections. Let’s use a Terraform template (ssh_config.tmpl in the repository) to generate a section of an SSH config file, which we can then copy over:

After Terraform completes, we get a ssh_config.out file. Open this file and copy-paste its contents into your ~/.ssh/config file.

If you already checked out grouporder for last week’s blog post, you can open that. If not: open the grouporder repo in PyCharm (VCS | Checkout from Version Control | GitHub). Make sure to mark the grouporder subfolder as a sources folder: right-click the folder | Mark directory as | Sources root. If you don’t set this up, the PYTHONPATH will be set wrong when we execute our code later.

After opening the project, go to Tools | Deployment | Configuration, and add a new SFTP server. As we’ve defined “Web” as a Host in the SSH config file, we can just use ‘Web’ for the hostname here. Choose ‘OpenSSH Config and authentication agent’ as the authentication type, and type ‘ubuntu’ as the username. We’ll use our remote home folder as the root path: /home/ubuntu.

Deployment Settings

Then, on the mappings page, add a mapping between the project folder on your machine, and a subfolder of the root path configured on the ‘Connection’ page:

Deployment Mappings

This will place our code in /home/ubuntu/grouporder. After this configuration, make sure that the following option is checked: Tools | Deployment | Automatic upload (always). By doing this, all changes we make will automatically be uploaded to our remote machine. Finally, right-click the project root folder in the project tool window, and choose ‘Upload to web’ from the context menu to upload the initial version.

We’ll now need to have a look what the internal IP address is of our database server, you can find this in the SSH Config after HostName. For me this happens to be 10.0.1.38, which makes the database connection string: postgresql://grouporder:hunter2@10.0.1.38/grouporder.

We’ll use a run configuration to apply the migrations to the database. Choose module name pgmigrate, the migrations subfolder as working directory, and provide as script parameters: migrate -t latest --conn postgresql://grouporder:hunter2@10.0.1.38/grouporder. For details, see the previous blog post. pgmigrate will not return anything if it successfully applied the migrations.

Let’s connect to the database to make sure that we’ve correctly applied the migrations. However, we didn’t expose the database to the world, so how can we do this? The answer is SSH tunneling!

Create a new database connection: View | Tool Windows | Database, then use the green ‘+’ icon to add a PostgreSQL data source. On the ‘General’ page, we’ll connect as if we’re on the database box, so host is localhost (db and username is ‘grouporder’, password is ‘******’). Then we’ll go over to the SSH/SSL page, and we’ll use database as the hostname. This is one of the hosts defined in the Terraform ssh_config segment. Just make sure to choose ‘OpenSSH config and authentication agent’ as the Auth type:

DB Connection AWS

You may want to rename it from grouporder@localhost to Grouporder AWS or something else that makes it clear that we’re not actually connecting to localhost.

After connecting, we should see all the tables in the database tool window.

At this point we’ve fully configured our development environment on AWS, and we can create a regular run configuration for our Flask application. Don’t forget to set host to 0.0.0.0 or we won’t be able to access it. So let’s get started:

Grouporder AWS Complete

If you want to play around with the application a little, see the end of the previous blog post. Everything that worked on Vagrant should work the same on AWS now.

And that concludes the third part of our developing for the cloud blog series. If you’re interested in more, let us know in the comments! For example, if there’s enough interest, I could write a blog post about making a production-ready version of this application.

This entry was posted in Tutorial and tagged , , , , , , , , , . Bookmark the permalink.

6 Responses to Creating a Python Development Environment on Amazon EC2

  1. Hi,

    First of all – GREAT POST!
    I really enjoyed all the steps described and fitting my tools of choice together (Vagrant, Ansible + PyCharm). I was not aware of this functionality in the IDE.

    Obviously myself and my colleagues are interested in looking at production ready environment setup. What’s particularly interesting is if we can separate dev from prod within the same project?

    Another bit is whether or not there is a way to add some checks/protection from making accidental modifications in production env (if one mistakenly takes it for dev).

    Last thing – it would be great if you guys could tweet these kind of posts to make them more visible. I just came across this one by accident and it is a worthwhile lecture for anyone looking at taking advantage of new PyCharm features and proper dev in general.

    Best regards
    Michal

    • Ernst Haagsman says:

      Thank you very much! I can look into making a production-ready configuration in January.

      To prevent accidental modifications in prod: 1. deploy prod code only through CI, 2. if you need to connect to the prod DB, change the console backgrounds to a very obvious color (like red) to make it hard to change something in the wrong DB by mistake.

      We’ve been tweeting these blog posts from the @pycharm Twitter account, do you mean we should tweet them from @jetbrains as well?

      Regards,
      Ernst

  2. Dutch Masters says:

    Just wanted to agree, these posts about more advanced usages of Pycharm are great.

    The only reason I know about them is that i have an RSS feed where I see it.. Many devs don’t look at twitter but maybe a youtube promo or something.

    • Paul Everitt says:

      I’d like to follow up on your point…when you say “many devs”, do you know a lot of people that still do RSS/Atom?

  3. Harvey says:

    Tweet them and put them on YouTube for N00bs!

  4. Roman Kazakov says:

    Thank you for these series! It was very useful for me to get comfortable with PyCharm. And I’m waiting for your post about how to make production ready version.

Leave a Reply

Your email address will not be published. Required fields are marked *