Data

How to Run SQL from Jupyter Notebook – Two Easy Ways

Read this post in other languages:
简体中文

Why you need to combine SQL and Python inside Jupyter notebooks

SQL is extremely good for data retrieval and calculating basic statistics, whereas Python comes into its own when you need in-depth, flexible exploratory data analysis or data science. What if you could use both programming languages inside of one tool?

In this post you will learn two easy ways to use Python and SQL from the Jupyter notebooks interface and create SQL queries with a few lines of code. These two methods are almost database-agnostic, so you can use them for any SQL database of your choice: MySQL, Postgres, Snowflake, MariaDB, Azure, etc.

Method 1: Using Pandas Read SQL Query

Step 1: Install a Python package to connect to your database

We suggest installing the following packages:

  • PostgreSQL database: ! pip install psycopg2-binary. Make sure to install psycopg2-binary, because it will also take care of the dependencies required.
  • Snowflake database: ! pip install snowflake-connector-python.
  • MySQL database: ! pip install mysql-connector-python.

To install packages in Datalore you can also use the Environment manager, which will make the packages persistent when you reopen the notebook later.

Datalore is a collaborative data science notebook in the cloud, tailored for data science and machine learning. You can start with a free Community plan and upgrade as you go!

Open tutorial in Datalore

Step 2: Create a database connection in Jupyter

Connect a database to a Jupyter notebook

Run the sample code below to connect to the MySQL database. You can find sample code for connecting to PostgreSQL and Snowflake databases in this tutorial.

import os
import psycopg2 as ps
import pandas as pd

conn = ps.connect(
    host=os.environ["db_host"],
    port=os.environ["db_port"],
    dbname=os.environ["db_name"],
    user=os.environ["db_user"],
    password=os.environ["db_password"])

Tip: To store the credentials, we are using environment variables, called Secrets in Datalore. This helps prevent unintentional leaks of your credentials when you share your Jupyter notebooks or your screen with someone.

If you can’t connect to your company’s databases from cloud tools, consider installing Datalore in a private cloud or on-premises.

Step 3: Run SQL queries using pandas

After you create a database connection you can execute your SQL select queries right away!

See the code sample below:

df = pd.read_sql_query("select * from <table>", con=conn)

Run SQL query using pandas

Voila! Just run the code cells and you will get the results saved to a pandas dataframe that you can continue working on with Python.

Run query and visualize in Datalore

Method 2: Using SQL cells in Datalore notebooks

Recently we integrated native SQL cells and database connections inside Python notebooks in Datalore.

Instead of writing boilerplate Python code to connect to a database, you can now create a connection once from the UI and then reuse it in multiple notebooks. After creating a connection, you’ll be able to browse the database schema, which can be extremely helpful for writing SQL queries.

When typing SQL code, you will get smart code completion for SQL syntaxes and table/column names.

After executing the code cell, the result will be automatically saved to a pandas dataframe and you can seamlessly continue working on it with Python.

Have we mentioned that you can collaborate on SQL code in real time in Datalore?

When you are done you can publish results as a report and share your data story via link with the whole world.

Try in Datalore

That’s all for now! We hope your life as a data scientist becomes slightly easier with these simple methods to use SQL inside Jupyter notebooks. To get more tips on using Jupyter notebooks, make sure to subscribe to our blog updates and follow us on Twitter!

Happy data-sciencing!

The Datalore team

Links:

  1. Tutorial in Datalore
  2. Register for Datalore Community
  3. Learn about Datalore Enterprise