Simplify Data Analysis with Python Pandas Aggregation

Created OnApril 12, 2023

Introduction

Pandas is an open-source data manipulation and analysis library built on NumPy. It is one of the Python ecosystem’s most widely used data analysis libraries and provides a powerful data manipulation and analysis toolset. One of the critical features of Pandas is its support for aggregations, which allows users to perform various types of operations on a data frame or Series to produce summarized data. In this article, we will explore the various aggregations that Pandas supports and how they can process and analyze data.

Table of Contents

Introduction
What is Pandas Aggregation?
Built-In Pandas Aggregation Functions
Python Pandas Aggregation Functions list
Custom Pandas Aggregation Functions
Applying Aggregations to Subsets of Data
Group and Aggregate Data In Pandas Using groupby, pivot_table, crosstab Functions
The groupby Function
The pivot_table Function
The crosstab Function
Conclusion

In this article, we shall learn about aggregation functions. We shall learn both about custom functions as well as built-in functions.

What is Pandas Aggregation?

Aggregations are operations that perform some type of computation or transformation on a data set and produce a data summary. For example, an aggregation could compute the set’s mean, sum, or count. Aggregations are an essential part of data analysis and are used to summarize and condense data into a manageable form that can be more easily understood and analyzed.

Aggregations in Pandas can be performed on both DataFrames and Series. A data frame is a two-dimensional data structure storing data in rows and columns. A Series is a one-dimensional data structure that can store data in a single column or row.

Pandas provide several built-in aggregation functions that can be used to perform operations on data. Some common aggregations include mean, sum, count, min, max, and standard deviation.

In addition to these built-in functions, Pandas also supports custom aggregation functions, allowing users to perform their computations on data.

Built-In Pandas Aggregation Functions

Built-in Pandas Aggregation Functions are pre-defined operations that can be applied to DataFrames and Series objects in Python to compute summary statistics such as mean, sum, count, min, max, and standard deviation. These functions provide a fast and flexible way to analyze data sets.

Python Pandas Aggregation Functions list

Mean
Sum
Count
Min
Max
Standard Deviation

Pandas Mean Aggregation Function:

The mean aggregation is used to compute the average of a set of values. It is calculated by summing up all the values in the data set and dividing by the number of values. The mean is a commonly used aggregation that provides a good indicator of the central tendency of a data set.

Example (1)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df.mean())

Output:

A 3.0

B 3.0

dtype: float64

Pandas Sum Aggregation Function

The sum aggregation is used to compute the sum of a set of values. The sum is calculated by adding up all the values in the data set. The sum is a commonly used aggregation that indicates a set’s total amount of data.

Example (2)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df.sum())

Output:

A 15

B 15

dtype: int64

Pandas Count Aggregation Function

Count aggregation is used to compute the number of values in a data set. The count is calculated by counting the number of values in the data set. The count is a commonly used aggregation that provides a good indicator of the size of a data set.

Example (3)

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df.count())

Output:

A 5

B 5

dtype: int64

Pandas Min Aggregation Function

The min aggregation is used to compute the minimum value in a data set. The min is calculated by finding the smallest value in the data set. The min is a commonly used aggregation that provides a good indicator of the lower bound of a data set.

Example (4)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df.min())

Output:

A 1

B 1

dtype: int64

Pandas Max Aggregation Function

The max aggregation is used to compute the maximum value in a data set. The max is calculated by finding the largest value in the data set. The max is a commonly used aggregation that provides a good indicator of the upper bound of a data set.

Example (5)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df.max())

Output:

A 5

B 5

dtype: int64

Pandas Standard Deviation Aggregation Function

The standard deviation aggregation is used to compute the standard deviation of a set of values. The standard deviation is calculated by finding the square root of the variance of the data set. The standard deviation is a commonly used aggregation that provides a good indicator of the spread of a data set.

Example (6)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df.std())

Output:

A 1.581139

B 1.581139

dtype: float64

Custom Pandas Aggregation Functions

In addition to the built-in aggregation functions, Pandas also provides support for custom aggregation functions. Custom aggregation functions allow users to perform their computations on data.

To create a custom aggregation function, a user must define a function that takes in a Series and returns a single value. This function can then be passed to the agg method on a DataFrame or Series to perform the aggregation.

Example (7)

import pandas as pd
def custom_agg(series):
    return sum(series) / len(series)
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df.agg(custom_agg))

Output:

A 3.0

B 3.0

dtype: float64

Applying Aggregations to Subsets of Data

In some cases, performing aggregations on only a subset of data may be necessary. Pandas provide several options for performing aggregations on subsets of data.

One option is to use the groupby method to group data based on one or more columns and then perform an aggregation on each group. The groupby method returns a DataFrameGroupBy object that can be used to perform aggregations on each group.

Example (8)

Import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1], 'C': ['a', 'b', 'a', 'b', 'a']})
grouped = df.groupby('C')
print(grouped.mean())

Output:

A B

C

a 3.0 3.0

b 3.0 3.0

Another option is to use boolean indexing to select a subset of data based on a condition and then perform an aggregation on the selected data.

Example (9)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
print(df[df['A'] > 2].mean())

Output:

A 4.0

B 3.0

dtype: float64

Group and Aggregate Data In Pandas Using groupby, pivot_table, crosstab Functions

The groupby(), pivot_table(), and crosstab() functions in Pandas are used to group and aggregate data in different ways.

The groupby() function is used to group data based on one or more columns and then apply aggregation functions to the grouped data. This function splits the data into groups based on the values in one or more columns and then calculates statistics for each group. The aggregation functions can be built-in functions such as mean, sum, count, or custom functions defined by the user.
The pivot_table() Function creates a pivot table, a two-dimensional data structure used to summarize and aggregate data. A pivot table is a multi-indexed table that can be used to calculate and visualize data differently. The pivot_table() Function creates a pivot table from a DataFrame by specifying one or more columns to group the data and one or more columns to aggregate the data. The resulting pivot table can compare data distribution between multiple categories.
The crosstab() function creates a cross-tabulation table, which shows the frequency of occurrences between two or more variables. The crosstab() function is particularly useful when comparing the frequency of occurrences between two categorical variables. The resulting cross-tabulation table shows the frequency of occurrences for each combination of values in the two variables.

The groupby Function

The groupby function groups data based on one or more columns. This function returns a DataFrameGroupBy object that can be used to perform aggregations on each group. The groupby function is beneficial when you want to perform the same aggregation on multiple data groups.

Example (10)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1], 'C': ['a', 'b', 'a', 'b', 'a']})
grouped = df.groupby('C')
print(grouped.mean())

Output:

A B

C

a 3.0 3.0

b 3.0 3.0

Explanation:

In this example, we first create a DataFrame with columns ‘A‘, ‘B‘, and ‘C‘. We then use the groupby() function to group the data based on column ‘C‘. The mean() method is then called on the DataFrameGroupBy object to perform the mean aggregation on each group. The output shows the mean of columns ‘A‘ and ‘B‘ for each group defined by column ‘C‘.

The pivot_table Function

The pivot_table() function creates a pivot table from a data frame. A pivot table is a table that summarizes data by grouping data along one or more columns and aggregating the data in another column. The pivot_table() function is particularly useful when comparing aggregated data across multiple categories.

Example (11)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1], 'C': ['a', 'b', 'a', 'b', 'a']})
pivot = df.pivot_table(index='C', values='A', aggfunc='mean')
print(pivot)

Output:

A

C

a 3.0

b 3.0

Explanation:

In this example, we first create a DataFrame with columns ‘A‘, ‘B‘, and ‘C‘. We then use the pivot_table() function to create a pivot table with column ‘C‘ as the index, column ‘A‘ as the values, and the mean aggregation function. The output shows the mean of column ‘A‘ for each group defined by column ‘C‘.

The crosstab Function

The crosstab() Function creates a cross-tabulation table from a data frame. A cross-tabulation table summarizes the frequency of occurrences between two or more variables. The crosstab() function is particularly useful when comparing the frequency of occurrences between two categorical variables.

Example (12)

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'a', 'b', 'a'], 'C': ['x', 'x', 'y', 'y', 'y']})
crosstab = pd.crosstab(index=df['B'], columns=df['C'], margins=True)
print(crosstab)

Output:

C x y All

B

a 0 2 2

b 2 2 4

All 2 4 6

Explanation:

In this example, we first create a DataFrame with columns ‘A‘, ‘B‘, and ‘C‘. We then use the crosstab() function to create a cross-tabulation table with column ‘B‘ as the index and column ‘C‘ as the columns. The margins argument is set to True to include the row and column sums in the output. The output shows the frequency of occurrences between columns ‘B‘ and ‘C‘.

Conclusion

Pandas aggregations are a powerful tool for summarizing and aggregating data. With the built-in aggregation functions and the ability to create custom aggregation functions, Pandas provides a comprehensive solution for performing aggregations on data. Additionally, the ability to perform aggregations on subsets of data further expands the capabilities of Pandas aggregations. With these features, Pandas aggregations are a valuable data analysis and manipulation tool.

Last Updated OnApril 12, 2023

byAsif Rahaman

Sign Up

Sign In

Forgot Password

How Can We Help?

Introduction

What is Pandas Aggregation?

Built-In Pandas Aggregation Functions

Python Pandas Aggregation Functions list

Pandas Mean Aggregation Function:

Example (1)

Output:

Pandas Sum Aggregation Function

Example (2)

Output:

Pandas Count Aggregation Function

Example (3)

Output:

Pandas Min Aggregation Function

Example (4)

Output:

Pandas Max Aggregation Function

Example (5)

Output:

Pandas Standard Deviation Aggregation Function

Example (6)

Output:

Custom Pandas Aggregation Functions

Example (7)

Output:

Applying Aggregations to Subsets of Data

Example (8)

Output:

Example (9)

Output:

Group and Aggregate Data In Pandas Using groupby, pivot_table, crosstab Functions

The groupby Function

Example (10)

Explanation:

The pivot_table Function

Example (11)

Output:

Explanation:

The crosstab Function

Example (12)

Output:

Explanation:

Conclusion

Asif Rahaman

Related Posts

Python Pandas Date: Parsing, Arithmetic, and Resampling

Exploring Python Pandas Options: Your Data Power Tool

Python Pandas Join: Where Datasets Converge

Python Pandas Categorize Data: All You NEED to Know

Leave a commentCancel reply

Leave a comment
Cancel reply