How Can We Help?
Introduction
Python is a popular language in the data science community, thanks to its robust libraries and frameworks that make it easy to analyze, manipulate and visualize data. One of the most potent libraries in Python for data analysis is Pandas. Pandas provides many tools for working with data, but sometimes you may need to create custom functions to perform a specific task.
In this article, we will explore how to create custom functions in Pandas.
What Is Python Pandas User Defined Function?
In programming languages, we usually encounter two types of functions: built-in and user-defined or custom functions. A user defined function is a user-defined function that can be used to perform a specific task. In the context of Pandas, a custom function can be used to manipulate data in an impossible way using the built-in functions.
For example, you may need to perform a complex calculation on a column of data or extract specific information from a dataset. In such cases, you can create a custom function that will allow you to perform the task efficiently.
Creating User Defined Function In Pandas
To create a custom function in Pandas, you need to define a function that accepts a Pandas object as input and returns a Pandas object as output. The input object can be a DataFrame, Series, or any other Pandas object. The output object can also be a DataFrame, Series, or any other Pandas object.
Suppose we have a data frame containing information about a company’s sales in different regions.
Example (1)
import pandas as pd
# create a data frame with 'Region' and 'Sales' columns
sales = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West'],
'Sales': [50000, 60000, 55000, 45000]
})
# define a function to add a prefix to the 'Region' column
def add_prefix(df):
df['Region'] = 'Region-' + df['Region'] # add the prefix to the 'Region' column
return df # return the modified DataFrame
# define a function to update the 'Sales' column by doubling each value
def update_sales(df):
df['Sales']=df['Sales']*2 # double the 'Sales' column
return df # return the modified DataFrame
# modify the 'Region' column by calling the add_prefix function
sales = add_prefix(sales)
# modify the 'Sales' column by calling the update_sales function
sales = update_sales(sales)
# print the modified DataFrame
print(sales)
Output:
Idx | Region | Sales |
---|---|---|
0 | Region-North | 100000 |
1 | Region-South | 120000 |
2 | Region-East | 110000 |
3 | Region-West | 90000 |
Explanation:
- First, we imported the panda’s library in our code using the import statement of Python. Next, we created a data frame with the help of the DataFrame function of Python. We defined two columns of the data frame, namely Region and Sales.
- Next, we have defined two functions, namely
add_prefix
, and update_sales. These two functions are our custom functions. The add_prefix function updates all the column data of the data frame bu adding one prefix to each of the entries. On the other hand, the update_sales function updates the “Sales” function by doubling the entries. - Both the functions return data frame, and hence they are nonvoid functions.
- Next, we called the functions and printed the modified data frame.
Passing Arguments to User Defined Function
Sometimes you may need to pass arguments to a custom function. For example, you may want to pass a value that will be used in a calculation or a condition that will be used to filter the data. To pass arguments to a custom function, you can define the arguments in the function definition and then pass the arguments when you apply the function using the apply method.
Example (2)
import pandas as pd
# create a DataFrame with 'Region' and 'Sales' columns
sales = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West'],
'Sales': [50000, 60000, 55000, 45000]
})
# define a function to add a prefix to the 'Region' column
def add_prefix(df,name):
df['Region'] = name + df['Region'] # add the prefix to the 'Region' column
return df # return the modified DataFrame
# define a function to update the 'Sales' column by doubling each value
def update_sales(df,n):
df['Sales']=df['Sales']*n # double the 'Sales' column
return df # return the modified DataFrame
# modify the 'Region' column by calling the add_prefix function
sales = add_prefix(sales, "random prefix")
# modify the 'Sales' column by calling the update_sales function
sales = update_sales(sales,3)
# print the modified DataFrame
print(sales)
Output:
Idx | Region | Sales |
---|---|---|
0 | random prefixNorth | 150000 |
1 | random prefixSouth | 180000 |
2 | random prefixEast | 165000 |
3 | random prefixWest | 135000 |
In the above example, we have passed two parameters to the functions add_prefix
and update_sales. For add_prefix
, we have passed the data frame and name variable. The name variable is the string that we need to add as a prefix to the column entries, and the second argument n in update_sales
is the number with which we need to multiply the entries of the “Sales” column.
It is also possible to utilize the existing data frame columns and create another column in the data frame.
Example (3)
import pandas as pd
# create a DataFrame with 'Region' and 'Sales' columns
sales = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West'],
'Sales': [50000, 60000, 55000, 45000]
})
# define a function to calculate the commission amount
def calculate_commission(sales_amount, commission_rate):
commission_amount = sales_amount * commission_rate
return commission_amount
# set the commission rate to 5%
commission_rate = 0.05
# apply the calculate_commission function to each row in the 'Sales' column,
# passing in the commission_rate as a keyword argument
sales['Commission'] = sales['Sales'].apply(calculate_commission, commission_rate=commission_rate)
# print the modified DataFrame
print(sales)
Output:
Idx | Region | Sales | Commision |
---|---|---|---|
0 | North | 50000 | 2500.0 |
1 | South | 60000 | 3000.0 |
2 | East | 55000 | 2750.0 |
3 | West | 45000 | 2250.0 |
Using Lambda Functions in Pandas
Lambda functions in Python are small anonymous functions without any name. They, however, prove to be very handy and save the number of lines of code. Programmers often use them when they want to create a function that would be used only once. In such cases, you can use a lambda function instead of defining a separate function.
Suppose we have a data frame containing information about a company’s sales in different regions, and we want to convert the sales amount to thousands of dollars.
Example (4)
import pandas as pd
# create a DataFrame with 'Region' and 'Sales' columns
sales = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West'],
'Sales': [50000, 60000, 55000, 45000]
})
sales['Sales'] = sales['Sales'].apply(lambda x: x*5)
# print the modified DataFrame
print(sales)
Output:
Idx | Region | Sales |
---|---|---|
0 | North | 250000 |
1 | South | 300000 |
2 | East | 275000 |
3 | West | 225000 |
Explanation:
In the above code, we have used the lambda function for the “Sales” column of the data frame. We have applied the lambda function such that the entries are multiplied by 5 and get updated.
Conclusion
User defined functions are a powerful tool in Pandas that can be used to manipulate data in an impossible way using built-in functions. In this article, we have explored how to create user defined functions in Pandas, pass arguments to user defined functions, and use lambda functions in Pandas. You can perform complex operations on data and extract specific information from datasets using custom functions. With the help of these functions, you can get the most out of Pandas and take your data analysis skills to the next level.