Effortlessly Organize Your Data: Python Pandas Sorting

Created OnApril 15, 2023

Introduction

While dealing with the data frames, programmers often need to clean the data. This may also include sorting the data in ascending or descending order.

This article will explain how to sort the pandas data frame.

Table of Contents

Introduction
What is Sorting in Python Pandas?
How To Sort Using The sort_values Function
How To Sort In Descending Order
How To Sort Multiple Columns In Pandas DataFrame
How To Deal With NaN In Pandas While Sorting
How To Sort Pandas Data Frame Using Lambda Function
Changing The sorting Algorithms
Conclusion

What is Sorting in Python Pandas?

Sorting in Python Pandas is the process of arranging data in a specific order based on values in one or more columns. Pandas provides two main functions for sorting data: sort_values() and sort_index(). Other related functions include nlargest(), nsmallest(), and rank(). Sorting data in Pandas is essential for efficient data analysis.

How To Sort Using The sort_values Function

The most common technique to sort the data frame is to use the sort_values() function of pandas. The sort_values() is a built-in function of pandas. The function takes the column name as the parameter, then sorts the data frame based on the column.

We can sort both numbers as well as strings using the sort_values() function.

Example (1)

import pandas as pd

# Creating a dictionary with data
data = {'name': ['Alex', 'Bob', 'Charlie','David','Emma','Frank','George','Harry','Igor','Jack'],
        'age': [35,26,55,48,22,32,44,39,50,40],
        'city': ['Seattle','Chicago','New York','San Francisco','Los Angeles','Houston','Phoenix','Philadelphia','San Diego','Austin'],
        'gender':['M','M','M','M','F','M','M','M','M','M'],
        'income':[50000,70000,60000,90000,80000,55000,65000,85000,100000,75000]
        }

# Creating a DataFrame from the dictionary
df = pd.DataFrame(data)

# Adding a new column
df['Occupation'] = ['Engineer','Data Analyst','Doctor','Lawyer','Designer','Teacher','Accountant','Police','Manager','Architect']

#Printing the DataFrame
print("The data frame before sorting is as follows:")
print(df)
df=df.sort_values(by=['income'])
print("The data frame after sorting is as follows:")
print(df)

Output:

The data frame before sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
0	Alex	35	Seattle	M	50000	Engineer
1	Bob	26	Chicago	M	70000	Data Analyst
2	Charlie	55	New York	M	60000	Doctor
3	David	48	San Francisco	M	90000	Lawyer
4	Emma	22	Los Angeles	F	80000	Designer
5	Frank	32	Houston	M	55000	Teacher
6	George	44	Phoenix	M	65000	Accountant
7	Harry	39	Philadelphia	M	85000	Police
8	Igor	50	San Diego	M	100000	Manager
9	Jack	40	Austin	M	75000	Architect

IDX	Name	Age	City	Gender	Income	Occupation
0	Alex	35	Seattle	M	50000	Engineer
1	Bob	26	Chicago	M	70000	Data Analyst
2	Charlie	55	New York	M	60000	Doctor
3	David	48	San Francisco	M	90000	Lawyer
4	Emma	22	Los Angeles	F	80000	Designer
5	Frank	32	Houston	M	55000	Teacher
6	George	44	Phoenix	M	65000	Accountant
7	Harry	39	Philadelphia	M	85000	Police
8	Igor	50	San Diego	M	100000	Manager
9	Jack	40	Austin	M	75000	Architect

The data frame after sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
0	Alex	35	Seattle	M	50000	Engineer
5	Frank	32	Houston	M	55000	Teacher
2	Charlie	55	New York	M	60000	Doctor
6	George	44	Phoenix	M	65000	Accountant
1	Bob	26	Chicago	M	70000	Data Analyst
9	Jack	40	Austin	M	75000	Architect
4	Emma	22	Los Angeles	F	80000	Designer
7	Harry	39	Philadelphia	M	85000	Police
3	David	48	San Francisco	M	90000	Lawyer
8	Igor	50	San Diego	M	100000	Manager

Notice the change in the index numbers in sequence.

Explanation:

First, we imported the pandas library into our code.
Next, we have created a dictionary data type, a collection of key-value pairs. We called this dictionary data.
Next, we created a data frame from the dictionary using the DataFrame function of pandas.
Next, we added new columns to the data frame Occupation. We printed the data frame before the sorting process.
Next, we used the sort_values() function to sort the data frame. We have passed the “income” as the parameter, which signifies that we need to sort the data frame based on the values of the income column.
Next, we printed the data frame after the sorting process.

How To Sort In Descending Order

Sorting the data frame in descending order is relatively simple. We only need to pass one additional parameter to the function sort_values(), i.e., ascending=False.

Example (2)

import pandas as pd

# Creating a dictionary with data
data = {'name': ['Alex', 'Bob', 'Charlie','David','Emma','Frank','George','Harry','Igor','Jack'],
        'age': [35,26,55,48,22,32,44,39,50,40],
        'city': ['Seattle','Chicago','New York','San Francisco','Los Angeles','Houston','Phoenix','Philadelphia','San Diego','Austin'],
        'gender':['M','M','M','M','F','M','M','M','M','M'],
        'income':[50000,70000,60000,90000,80000,55000,65000,85000,100000,75000]
        }

# Creating a DataFrame from the dictionary
df = pd.DataFrame(data)

# Adding a new column
df['Occupation'] = ['Engineer','Data Analyst','Doctor','Lawyer','Designer','Teacher','Accountant','Police','Manager','Architect']

#Printing the DataFrame
print("The data frame before sorting is as follows:")
print(df)
df=df.sort_values(by=['income'],ascending=False)
print("The data frame after sorting is as follows:")
print(df)

Output:

The data frame before sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
0	Alex	35	Seattle	M	50000	Engineer
1	Bob	26	Chicago	M	70000	Data Analyst
2	Charlie	55	New York	M	60000	Doctor
3	David	48	San Francisco	M	90000	Lawyer
4	Emma	22	Los Angeles	F	80000	Designer
5	Frank	32	Houston	M	55000	Teacher
6	George	44	Phoenix	M	65000	Accountant
7	Harry	39	Philadelphia	M	85000	Police
8	Igor	50	San Diego	M	100000	Manager
9	Jack	40	Austin	M	75000	Architect

The data frame after sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
8	Igor	50	San Diego	M	100000	Manager
3	David	48	San Francisco	M	90000	Lawyer
7	Harry	39	Philadelphia	M	85000	Police
4	Emma	22	Los Angeles	F	80000	Designer
9	Jack	40	Austin	M	75000	Architect
1	Bob	26	Chicago	M	70000	Data Analyst
6	George	44	Phoenix	M	65000	Accountant
2	Charlie	55	New York	M	60000	Doctor
5	Frank	32	Houston	M	55000	Teacher
0	Alex	35	Seattle	M	50000	Engineer

How To Sort Multiple Columns In Pandas DataFrame

Pandas allows us to sort the data frame based on multiple columns.

The rule to sort using multiple columns is as follows:

You need to pass multiple names of columns as the parameter for sorting the data frame.
Pandas will sort based on the order you provided to the sort_values() function. Suppose you have provided three columns named A, B, and C as the function’s parameters.
Pandas will sort based on A. If there is a tie, then sorting will be done based on column B. If there is any tie again, then pandas will sort based on the value of column C.

Example (3)

import pandas as pd
data = {'name': ['Sophia', 'Liam', 'Emma','Noah','Mia','Aiden','Isabella','Jacob','Madison','Ethan'],
'age': [6,6,6,8,4,9,3,10,2,1],
'city': ['New York','Chicago','Los Angeles','Houston','Phoenix','San Francisco','Seattle','Philadelphia','Austin','San Diego'],
'gender':['F','M','F','M','F','M','F','M','F','M'],
'income':[1000,2000,3000,4000,5000,6000,9000,9000,9000,10000]
}
df = pd.DataFrame(data)
df['Occupation'] = ['Student','Student','Student','Student','Student','Student','Student','Student','Student','Student']

#Printing the DataFrame
print("The data frame before sorting is as follows:")
print(df)
df=df.sort_values(by=['income',"age"])
print("The data frame after sorting is as follows:")
print(df)

Output:

The data frame before sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
0	Sophia	6	New York	F	1000	Student
1	Liam	6	Chicago	M	2000	Student
2	Emma	6	Los Angeles	F	3000	Student
3	Noah	8	Houston	M	4000	Student
4	Mia	4	Phoenix	F	5000	Student
5	Aiden	9	San Francisco	M	6000	Student
6	Isabella	3	Seattle	F	9000	Student
7	Jacob	10	Philadelphia	M	9000	Student
8	Madison	2	Austin	F	9000	Student
9	Ethan	1	San Diego	M	10000	Student

The data frame after sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
0	Sophia	6	New York	F	1000	Student
1	Liam	6	Chicago	M	2000	Student
2	Emma	6	Los Angeles	F	3000	Student
3	Noah	8	Houston	M	4000	Student
4	Mia	4	Phoenix	F	5000	Student
5	Aiden	9	San Francisco	M	6000	Student
8	Madison	2	Austin	F	9000	Student
6	Isabella	3	Seattle	F	9000	Student
7	Jacob	10	Philadelphia	M	9000	Student
9	Ethan	1	San Diego	M	10000	Student

Notice that pandas have sorted based on the value of their income first. So three rows have the same 9000 values. Next, padas sorted those three rows with respect to age values.

How To Deal With NaN In Pandas While Sorting

Sometimes, we do not have the values for the cells to fill with. In such a situation, pandas put the rows having the nan values at last while sorting, irrespective of whether you are sorting based on the ascending or descending order. However, pandas also allow us to have control over such situations. We can use the parameter na_position to control where we want to place the NaN values.

It takes several values as the arguments like “first”,”last” etc.

Example (4)

import pandas as pd
import numpy as np

data = {'name': ['John', 'Mary', 'Mike','Sara','Kevin','Emily','Daniel','Jessica','Brian','Ashley'],
        'age': [35,28,np.nan,32,40,25,np.nan,29,22,37],
        'city': ['Houston','Chicago','New York','San Francisco','Los Angeles','Phoenix','Seattle','Philadelphia','San Diego','Austin'],
        'gender':['M','F','M','F','M','F','M','F','M','F'],
        'income':[50000,70000,60000,90000,80000,55000,65000,85000,100000,np.nan]
        }

# Creating a DataFrame from the dictionary
df = pd.DataFrame(data)

# Adding a new column
df['Occupation'] = ['Engineer','Teacher','Doctor','Lawyer','Designer','Accountant','Police','Manager','Architect','Artist']

#Printing the DataFrame
print("The data frame before sorting is as follows:")
print(df)
df=df.sort_values(by=['income','age'],ascending=[False,True], na_position='first')
print("The data frame after sorting is as follows:")
print(df)

Output:

The data frame before sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
0	John	35	Houston	M	50000	Engineer
1	Mary	28	Chicago	F	70000	Teacher
2	Mike	NaN	New York	M	60000	Doctor
3	Sara	32	San Francisco	F	90000	Lawyer
4	Kevin	40	Los Angeles	M	80000	Designer
5	Emily	25	Phoenix	F	55000	Accountant
6	Daniel	NaN	Seattle	M	65000	Police
7	Jessica	29	Philadelphia	F	85000	Manager
8	Brian	22	San Diego	M	100000	Architect
9	Ashley	37	Austin	F	NaN	Artist

The data frame after sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
9	Ashley	37	Austin	F	NaN	Artist
8	Brian	22	San Diego	M	100000	Architect
3	Sara	32	San Francisco	F	90000	Lawyer
7	Jessica	29	Philadelphia	F	85000	Manager
4	Kevin	40	Los Angeles	M	80000	Designer
1	Mary	28	Chicago	F	70000	Teacher
6	Daniel	NaN	Seattle	M	65000	Police
2	Mike	NaN	New York	M	60000	Doctor
5	Emily	25	Phoenix	F	55000	Accountant
0	John	35	Houston	M	50000	Engineer

Notice that in the output, the NaN value came first.

How To Sort Pandas Data Frame Using Lambda Function

We can also sort the pandas data frame using the lambda function. For this, we can adopt several logic. One unique way of doing this is to use the natsort function. First, you need to install the natsort library in your system. For this, open your terminal and run the following commands:

pip install natsort

Example (5)

import pandas as pd
import numpy as np
from natsort import index_natsorted

data = {'name': ['John', 'Mary', 'Mike','Sara','Kevin','Emily','Daniel','Jessica','Brian','Ashley'],
        'age': [35,28,np.nan,32,40,25,np.nan,29,22,37],
        'city': ['Houston','Chicago','New York','San Francisco','Los Angeles','Phoenix','Seattle','Philadelphia','San Diego','Austin'],
        'gender':['M','F','M','F','M','F','M','F','M','F'],
        'income':[50000,70000,60000,90000,80000,55000,65000,85000,100000,np.nan]
        }

# Creating a DataFrame from the dictionary
df = pd.DataFrame(data)

# Adding a new column
df['Occupation'] = ['Engineer','Teacher','Doctor','Lawyer','Designer','Accountant','Police','Manager','Architect','Artist']

#Printing the DataFrame
print("The data frame before sorting is as follows:")
print(df)
df=df.sort_values(
   by="income",
   key=lambda x: np.argsort(index_natsorted(df["income"]))
)
print("The data frame after sorting is as follows:")
print(df)

Output:

The data frame before sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
0	John	35	Houston	M	50000	Engineer
1	Mary	28	Chicago	F	70000	Teacher
2	Mike	NaN	New York	M	60000	Doctor
3	Sara	32	San Francisco	F	90000	Lawyer
4	Kevin	40	Los Angeles	M	80000	Designer
5	Emily	25	Phoenix	F	55000	Accountant
6	Daniel	NaN	Seattle	M	65000	Police
7	Jessica	29	Philadelphia	F	85000	Manager
8	Brian	22	San Diego	M	100000	Architect
9	Ashley	37	Austin	F	NaN	Artist

The data frame after sorting is as follows:

IDX	Name	Age	City	Gender	Income	Occupation
9	Ashley	37	Austin	F	NaN	Artist
0	John	35	Houston	M	50000	Engineer
5	Emily	25	Phoenix	F	55000	Accountant
2	Mike	NaN	New York	M	60000	Doctor
6	Daniel	NaN	Seattle	M	65000	Police
1	Mary	28	Chicago	F	70000	Teacher
4	Kevin	40	Los Angeles	M	80000	Designer
7	Jessica	29	Philadelphia	F	85000	Manager
3	Sara	32	San Francisco	F	90000	Lawyer
8	Brian	22	San Diego	M	100000	Architect

Changing The sorting Algorithms

By default, the sort_values() function uses the quick sort algorithm. However, the users can also choose other algorithms to sort the data. Although the result will be the same, the time complexity and the time taken for execution may differ.

Suppose you want merge sort as the algorithm to sort the data, then use the following code:

df.sort_values(
   by=<name of the column>,
  ascending=False,
 kind="mergesort"
 ).

Conclusion

In this article, we have learned how to sort the pandas data frame using different techniques in Python. We have seen the usage of sort_values() functions along with their attributes. We strongly recommend the readers practice the discussed concepts and look up the Python pandas documentation to understand the topic better.

References: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

Last Updated OnApril 15, 2023

byAsif Rahaman

Effortlessly Organize Your Data: Python Pandas Sorting

How Can We Help?

Introduction

What is Sorting in Python Pandas?

How To Sort Using The sort_values Function

Example (1)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

Explanation:

How To Sort In Descending Order

Example (2)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

How To Sort Multiple Columns In Pandas DataFrame

Example (3)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

How To Deal With NaN In Pandas While Sorting

Example (4)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

How To Sort Pandas Data Frame Using Lambda Function

Example (5)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

Changing The sorting Algorithms

Conclusion

Asif Rahaman

Leave a comment
Cancel reply

Sign Up

Sign In

Forgot Password

How Can We Help?

Introduction

What is Sorting in Python Pandas?

How To Sort Using The sort_values Function

Example (1)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

Explanation:

How To Sort In Descending Order

Example (2)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

How To Sort Multiple Columns In Pandas DataFrame

Example (3)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

How To Deal With NaN In Pandas While Sorting

Example (4)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

How To Sort Pandas Data Frame Using Lambda Function

Example (5)

Output:

The data frame before sorting is as follows:

The data frame after sorting is as follows:

Changing The sorting Algorithms

Conclusion

Asif Rahaman

Related Posts

Python Pandas Date: Parsing, Arithmetic, and Resampling

Exploring Python Pandas Options: Your Data Power Tool

Python Pandas Join: Where Datasets Converge

Python Pandas Categorize Data: All You NEED to Know

Leave a commentCancel reply

Leave a comment
Cancel reply