Datalore

Collaborative data science platform for teams

Try Datalore

Data Science Datalore How-To's

Pandas Tutorial: 10 Popular Questions for Python Data Frames

Alena Guzharina

Pandas is one of the first libraries you will learn about when you start working with Python for data analysis and data science. The pandas library helps you work with datasets, transform and clean up your data, and get statistics.

Pandas Tutorial: 10 popular questions for Python data frames

In this tutorial, we will answer 10 of the most frequently asked questions people have when working with pandas. The questions covered in this tutorial mostly come from Stack Overflow.

Open the notebook

Dataset

In the first part of this tutorial, we will work with the dataset containing sample data for city population and some information about the size of the land area and population density.

Pandas loc and iloc

pandas.loc[] helps to access a group of rows and columns by labels or a boolean array slice.

Let’s select the population for Mexico city.

Below we’ll print only the population of Mexico City.

With .iloc[] you can select columns by using numeric integer indices.

A few things to keep in mind:

A plain : is used to select all data across rows/columns.
0:2 will select rows/columns 0 and 1. 2 is not included.
-1 will select the last element.

Renaming columns in pandas

Next we’ll rename the columns to make them easier to access in the future.

There are a few ways to do this:

directly assigning df.columns an array of column names.
using df.rename to rename specific columns.

Selecting multiple columns in a pandas DataFrame

Let’s split our DataFrame into two DataFrames containing:

City, Country, and Population.
City, Area, and Density.

We can do this in several ways:

By using .iloc[:, 0:3], where the first argument in the brackets selects all rows and the second argument selects column 0, column 1, and column 2.
By slicing the DataFrame with double [] and entering the column names you want to select.

Pandas merge two tables by column

Next we’ll vertically concatenate the two tables that we’ve created. The tables have the same City column, so we will use the pd.merge function to concatenate the two tables.

The left_on and right_on parameters indicate the column name to merge on in the first and second table.

Change column type in pandas with pandas apply

To work further with the DataFrame we need to transform the Population, Area, and Density columns from strings into numbers.

To do this we will:

Create a function, to_int(), which will transform the string with ‘,’ symbols into integer numbers.
Use the apply function with the lambda expression.

Groupby and turn into a DataFrame

Let’s now group the DataFrame by Country and count the population of each country in this data sample.
The difficulty with pd.groupby is that it returns a groupby object, not a DataFrame. In the example below, we’ll show how to create a DataFrame from a groupby object.

We’ll group by Country, at the same time calculating the sums for the Population and Area columns. We’ll drop the density column as we don’t need it anymore.

How to iterate over rows in a DataFrame in pandas

Though iterating over rows might not be the fastest solution, it can still sometimes come in handy. You can do this by using a loop over .iterrows() function.

Consider trying to do the same operation with an apply function or vectorized representation of Pandas DataFrame. On big datasets, this will increase the speed of the calculations.

Below we’ll divide the Population column by 1000 and get the population numbers in thousands. There are 3 alternative code examples below.

How to select rows from a DataFrame based on column values

Let’s select countries with a population of more than 10 million people and an area of less than 2000 square kilometers.
You can do this by entering logical constraints within [].

How to change the order of your DataFrame columns

You can do this simply by slicing your existing DataFrame in a different order.

Cleaning up data with pandas

To start working with data, you need to clean it up.

The first basic steps are:

Drop duplicates in a DataFrame.
Fill empty cells with meaningful values or drop columns with a lot of empty values.
Get statistics on the column values.

Let’s download the dataset with the tennis game results.

We’ll drop any duplicates with pd.drop_duplicates, with inplace = True applying changes to the DataFrame.

Now let’s find out whether there are NaN values in our DataFrame.

df.isna().any() is True when the column contains NaN values.

In the minutes column we have 91% NaN values, so we’ll drop this column because it doesn’t contain any useful information.

The winner_age, loser_age, loser_rank, and winner_rank columns don’t have many NaN values, so we’ll replace the NaN values with a median number.

With df.describe we can get statistics on numeric columns data.

That is it for our pandas tutorial. We’ve tried to provide answers to many of the most common questions people have when they are just starting out with pandas. Tell us in the comments about any other topics you’d like us to cover in future tutorials.

Open the notebook

Other tutorials and research

Getting Started Tutorial: Notebook, Video
Advanced Visualization Tutorial with Seaborn: Notebook
Visualization with Pyplot in Datalore: Notebook, Video
Analysis of 10,000,000 Jupyter notebooks: Blogpost, Published notebook
GPU models specification analysis
Developer ecosystem research for Python

Code With Me Beta: Support for Audio and Video Calls JetBrains 2020/21 Annual Highlights: 10 Million Users, 30 Tools, and More!

Discover more

The first Datalore release of the year delivers several new features that make working with data even easier. These updates are already available to Datalore Cloud users. For Datalore On-Premises, instance administrators can enable them by updating their Datalore instance. Let’s dive in! Data …

The final Datalore release of the year is here, and it brings multiple improvements designed to make your day-to-day work smoother. Whether you’re trying to keep reports error-free, managing user roles, or setting up environments, Datalore 2025.6 adds new tools that will help you stay organized and …

Datalore 2025.5 is out with multiple new features and improvements, including a new Git integration for workspace backups, the ability to share reports with preselected values in dropdowns, date pickers, and other inputs, and support for new LLMs in AI Assistant. These updates are already a…

Financial Data Analysis and Visualization with Python

The financial ecosystem relies heavily on Excel, but as data grows, it's showing its limitations. It's time for a change. Enter Python, a game-changer in finance. In this article, I'll guide you through financial data analysis and visualization using Python. We'll explore how this powerful tool can uncover valuable insights, empowering smarter decisions.

Datalore

Pandas Tutorial: 10 Popular Questions for Python Data Frames

Dataset

Pandas loc and iloc

Renaming columns in pandas

Selecting multiple columns in a pandas DataFrame

Pandas merge two tables by column

Change column type in pandas with pandas apply

Groupby and turn into a DataFrame

How to iterate over rows in a DataFrame in pandas

How to select rows from a DataFrame based on column values

How to change the order of your DataFrame columns

Cleaning up data with pandas

Other tutorials and research

Discover more

Datalore 2026.1: New Data Explorer Cells, Instance-Wide BYOK for AI, Stronger Security via Sidecar Containers in Kubernetes, and More

Issue Center, All Reports Tab, Improved Report Sharing, the Ability to Change User Roles, and More in Datalore 2025.6

Workspace Backups in Git Repositories, the Ability to Share Reports With Preselected Values, Mobile Layout for Reports, and More in Datalore 2025.5

Financial Data Analysis and Visualization in Python With Datalore and AI Assistant

Datalore

Pandas Tutorial: 10 Popular Questions for Python Data Frames

Dataset

Pandas loc and iloc

Renaming columns in pandas

Selecting multiple columns in a pandas DataFrame

Pandas merge two tables by column

Change column type in pandas with pandas apply

Groupby and turn into a DataFrame

How to iterate over rows in a DataFrame in pandas

How to select rows from a DataFrame based on column values

How to change the order of your DataFrame columns

Cleaning up data with pandas

Other tutorials and research

Subscribe to Datalore News and Updates

Discover more

Datalore 2026.1: New Data Explorer Cells, Instance-Wide BYOK for AI, Stronger Security via Sidecar Containers in Kubernetes, and More

Issue Center, All Reports Tab, Improved Report Sharing, the Ability to Change User Roles, and More in Datalore 2025.6

Workspace Backups in Git Repositories, the Ability to Share Reports With Preselected Values, Mobile Layout for Reports, and More in Datalore 2025.5

Financial Data Analysis and Visualization in Python With Datalore and AI Assistant