Data Science How-To's

How to Use Jupyter Notebooks in PyCharm

PyCharm is one of the most well-known data science tools, offering excellent out-of-the-box support for Python, SQL, and other languages. PyCharm also provides integrations for Databricks, Hugging Face and many other important tools. All these features allow you to write good code and work with your data and projects faster. 

PyCharm Professional’s support for Jupyter notebooks combines the interactive nature of Jupyter notebooks with PyCharm’s superior code quality and data-related features. This blog post will explore how PyCharm’s Jupyter support can significantly boost your productivity.

Watch this video to get a comprehensive overview of using Jupyter notebooks in PyCharm and learn how you can speed up your data workflows. 

Speed up data analysis

Get acquainted with your data

When you start working on your project, it is extremely important to understand what data you have, including information about the size of your dataset, any problems with it, and its  patterns. For this purpose, your pandas and Polars DataFrames can be rendered in Jupyter outputs in Excel-like tables. The tables are fully interactive, so you can easily sort one or multiple columns and browse and view your data, you can choose how many rows will be shown in a table and perform many other operations.

The table also provides some important information for example:

  • You can find the the size of a table in its header.
  •  You can find the data type symbols in the column headers.
  • You can also use JetBrains AI Assistant to get information about your DataFrame by clicking on the icon.

Easily spot issues with the data

After getting acquainted with your data, you need to clean it. This an important step, but it is also extremely time consuming because there are all sorts of problems you could find, including missing values, outliers, inconsistencies in data types, and so on. Indeed, according to the State of Developer Ecosystem 2023 report, nearly 50% of Data Professionals dedicate 30% of their time or more to data preparation. Fortunately, PyCharm offers a variety of features that streamline the data-cleaning process.

Some insights are already available in the column headers. 

First, we can easily spot the amount of missing data for each column because it is highlighted in red. Also, we may be able to see at a glance whether some of our columns have outliers. For example, in the bath column, the maximum value is significantly higher than the ninety-fifth percentile. Therefore, we can expect that this column has at least one outlier and requires our attention.

Additionally, you might suspect there’s an issue with the data if the data type does not match the expected one. For example, the header of the total_sqft column below is marked with the symbol, which in PyCharm indicates that the column contains the Object data type. The most appropriate data type for a column like total_sqft would likely be float or integer, however, so we may expect there to be inconsistencies in the data types within the column, which could affect data processing and analysis. After sorting, we notice one possible reason for the discrepancy: the use of text in data and ranges instead of numerical values.

So, our suspicion that the column had data-type inconsistencies was proven correct. As this example shows, small details in the table header can provide important information about your data and alert you to issues that need to be addressed, so it’s always worth checking.You can also use no-code visualizations to gather information about whether your data needs to be cleaned. Simply click on the icon in the top-left corner of the table. There are many available visualization options, including histograms, that can be used to see where the peaks of the distribution are, whether the distribution is skewed or symmetrical, and whether there are any outliers.

Of course, you can use code to gather information about your dataset and fix any problems you’ve identified. However, the mentioned low-code features often provide valuable insights about your data and can help you work with it much faster.

Code faster 

Code completion and quick documentation

A significant portion of a data professional’s job involves writing code. Fortunately, PyCharm is well known for its features that allow you to write code significantly faster. For example, local ML-powered full line code completion can provide suggestions for entire lines of code.

Another useful feature is quick documentation, which appears when you hover the cursor over your code. This allows you to gather information about functions and other code elements without having to leave the IDE.

Refactorings

Of course, working with code and data is an interactive process, and you may often decide to make some changes in your code – for example, to rename a variable. Going through the whole file or, in some cases, the entire project, would be cumbersome and time consuming. We can use PyCharm’s refactoring capabilities to rename a variable, introduce a constant, and make many other changes in your code. For example, in this case, I want to rename the DataFrame to make it shorter. I simply use the the Rename refactoring to make the necessary changes.

PyCharm offers a vast number of different refactoring options. To dive deeper into this functionality, watch this video.

Fix problems

It is practically impossible to write code without there being any mistakes or typos. PyCharm has a vast array of features that allow you to spot and address issues faster. You will notice the Inspection widget in the top-right corner if it finds any problems. 

For example, I forgot to import a library in my project and made several typos in the doc so let’s take a look how PyCharm can help here. 

First of all, the problem with the library import:

Additionally, with Jupyter traceback, you can see the line where the error occurred and get a link to the code. This makes the bug-fixing process much easier. Here, I have a typo in line 3. I can easily navigate to it by clicking on the blue text.

Additionally if you would like to get more information and suggestion how to fix the problem, you can use JetBrains AI Assistant by clicking on Explain with AI

Of course, that is just the tip of the iceberg. We recommend reading the documentation to better understand all the features PyCharm offers to help you maintain code quality.

Navigate easily

For the majority of cases, data science work involves a lot of experimentation, with the journey from start to finish rarely resembling a straight line.

During this experimentation process, you have to go back and forth between different parts of your project and between cells in order to find the best solution for a given problem. Therefore, it is essential for you to be able to navigate smoothly through your project and files. Let’s take a look at how PyCharm can help in this respect.

First of all, you can use the classic CMD+F (Mac) or CTRL+F (Windows) shortcut for searching in your notebook. This basic search functionality offers some additional filters like Match Case or Regex.

You can use Markdown cells to structure the document and navigate it easily.

If you would like to highlight some cells so you can come back to them later, you can mark them with #TODO or #FIXME, and they will be made available for you to dissect in a dedicated window.

Or you can use tags to highlight some cells so you’ll be able to spot them more easily.

In some cases, you may need to see the most recently executed cell; in this case, you can simply use the Go To option. 

Save your work

Because teamwork is essential for data professionals, you need tooling that makes sharing the results of your work easy. One popular solution is Git, which PyCharm supports with features like notebook versioning and version comparison using the Diff view. You can find an in-depth overview of the functionality in this tutorial.

Another useful feature is Local History, which automatically saves your progress and allows you to revert to previous steps with just a few clicks.

Use the full power of AI Assistant

JetBrains AI Assistant helps you automate repetitive tasks, optimize your code, and enhance your productivity. In Jupyter notebooks, it also offers several unique features in addition to those that are available in any JetBrains tool. 

Click the icon to get insights regarding your data. You can also ask additional questions regarding the dataset or ask AI Assistant to do something – for example, “write some code that solves the missing data problem”.

AI data visualization

Pressing the icon will suggest some useful visualizations for your data. AI Assistant will generate the proper code in the chat section for your data.

AI cell

AI Assistant can create a cell based on a prompt. You can simply ask it to create a visualization or do something else with your code or data, and it will generate the code that you requested. 

Debugger

PyCharm offers advanced debugging capabilities to enhance your experience in Jupyter notebooks. The integrated Jupyter debugger allows you to set breakpoints, inspect variables, and evaluate expressions directly within your notebooks. This powerful tool helps you step through your code cell by cell, making it easier to identify and fix issues as they arise. Read our blog post on how you can debug a Jupyter notebook in PyCharm for a real-life example.

Get started with PyCharm Professional

PyCharm’s Jupyter support enhances your data science workflows by combining the interactive aspects of Jupyter notebooks with advanced IDE features. It accelerates data analysis with interactive tables and AI assistance, improves coding efficiency with code completion and refactoring, and simplifies error detection and navigation. PyCharm’s seamless Git integration and powerful debugging tools further boost productivity, making it essential for data professionals.

Download PyCharm Professional to try it out for yourself! Get an extended trial today and experience the difference PyCharm Professional can make in your data science endeavors.Use the promo code “PyCharmNotebooks” at checkout to activate your free 60-day subscription to PyCharm Professional. The free subscription is available for individual users only.

Explore our official documentation to fully unlock PyCharm’s potential for your projects.

image description