How Can We Help?
Pandas is a software library written for the Python programming language that offers data structures and operations for manipulating numerical tables and time series. In particular, Pandas supports data manipulation and analysis of tables and time series.
- What is Python Pandas DataFrame?
- How to Create a Basic Pandas DataFrame
- How To Create DataFrame From dict ,arrays, and lists
- How To Select One Or More Column In Pandas DataFrame
- How To Select One or More Rows In Pandas Data Frame
- How to Iterate Over The Rows In Pandas Data Frame
- How To Iterate Over Columns In Pandas Data Frame
- Conclusion
What is Python Pandas DataFrame?
A DataFrame is a 2-dimensional labeled data structure with columns of different or similar types. You can think of it as a spreadsheet, SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. You can optionally pass index (row labels) and column (column labels) arguments along with the data. If you pass an index or columns, you guarantee the index or columns of the resulting DataFrame. The data frames are great tools for data visualization and machine learning models.
This article will explain the python pandas data frame in detail.
How to Create a Basic Pandas DataFrame
Creating data frames in the pandas library is very easy. However, there is an enormous number of ways to make data frames. This may range from creating it manually or using the DataFrame function.
Example (1)
import pandas as pd
# Creating a DataFrame from a dictionary
data = {'Country': ['Belgium', 'India', 'Brazil'],
'Capital': ['Brussels', 'New Delhi', 'Brasília'],
'Population': [11190846, 1303171035, 207847528]}
df = pd.DataFrame(data)
print(df)
Output:
Country | Capital | Population | |
---|---|---|---|
0 | Belgium | Brussels | 11190846 |
1 | India | New Delhi | 1303171035 |
2 | Brazil | Brasília | 207847528 |
Observe that we haven’t provided any indexing to the data frame, but still, we have indexed in the output. Pandas data frame automatically creates the indexing for us for each row.
How To Create DataFrame From dict ,arrays, and lists
You can create a Pandas DataFrame from a variety of structures in Python, including
- Dictionaries
- NumPy arrays
- Lists
Here’s an example of creating a DataFrame from a dictionary:
Example (2)
import pandas as pd
# Creating a DataFrame from a dictionary
data = {'Country': ['Belgium', 'India', 'Brazil'],
'Capital': ['Brussels', 'New Delhi', 'Brasília'],
'Population': [11190846, 1303171035, 207847528]}
df = pd.DataFrame(data)
print(df)
This would produce a DataFrame with 3 rows and 3 columns, with the column names ‘Country’, ‘Capital’, and ‘Population’.
Output:
Country | Capital | Population | |
---|---|---|---|
0 | Belgium | Brussels | 11190846 |
1 | India | New Delhi | 1303171035 |
2 | Brazil | Brasília | 207847528 |
Here’s an example of creating a DataFrame from a NumPy array:
Example (3)
import numpy as np
import pandas as pd
# Creating a DataFrame from a NumPy array
a = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(a, columns=['a', 'b', 'c'])
print(df)
Output:
a | b | c | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 |
This would create a DataFrame with 2 rows and 3 columns, with column names ‘a’, ‘b’, and ‘c’.
And here’s an example of creating a DataFrame from a list:
Example (4)
import pandas as pd
# Creating a DataFrame from a list
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)
Output:
Name | Age | |
---|---|---|
0 | Alex | 10 |
1 | Bob | 12 |
2 | Clarke | 13 |
This would create a DataFrame with 3 rows and 2 columns, with column names ‘Name’ and ‘Age’.
How To Select One Or More Column In Pandas DataFrame
To select one or multiple columns from a Pandas DataFrame, you can use the [] operator along with the column name(s).
Example (5)
import pandas as pd
# Create a sample dataframe
data = {'Product': ['Table', 'Chair', 'Sofa', 'Bed'],
'Price': [100, 50, 300, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Select the 'Product' and 'Quantity' columns
product_and_quantity = df[['Product', 'Quantity']]
# Print the first 3 rows of the 'Product' and 'Quantity' columns
print(product_and_quantity)
Output:
Product | Quantity | |
---|---|---|
0 | Table | 2 |
1 | Chair | 4 |
2 | Sofa | 1 |
3 | Bed | 3 |
In the above example, we used the Product and Quantity columns to get the two columns out of the data frame. You are free to choose any columns.
How To Select One or More Rows In Pandas Data Frame
Selection of the rows is also a relatively easy task. One can perform the same either through the indexing method or by the label names.
To select rows in a Pandas DataFrame, you can use one of the following methods:
df.loc[]: Selects rows by label
df.iloc[]: Selects rows by index
Here’s an example using the df.loc[] method to select rows in a Pandas DataFrame:
Example (6)
import pandas as pd
# Create a sample dataframe
data = {'Product': ['Table', 'Chair', 'Sofa', 'Bed'],
'Price': [100, 50, 300, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Select rows with index labels '0' and '2'
selected_rows = df.loc[['0', '2']]
# Print the selected rows
print(selected_rows)
This would print the rows with index labels ‘0’ and ‘2’:
Output:
Product | Price | Quantity | |
---|---|---|---|
0 | Table | 100 | 2 |
2 | Sofa | 300 | 1 |
Here’s an example using the df.iloc[] method to select rows in a Pandas DataFrame:
Example (7)
import pandas as pd
# Create a sample dataframe
data = {'Product': ['Table', 'Chair', 'Sofa', 'Bed'],
'Price': [100, 50, 300, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Select rows with indices 0 and 2
selected_rows = df.iloc[[0, 2]]
# Print the selected rows
print(selected_rows)
This would also print the rows with indices 0 and 2:
Output:
Product | Price | Quantity | |
---|---|---|---|
0 | Table | 100 | 2 |
2 | Sofa | 300 | 1 |
You can also use boolean indexing to select rows in a DataFrame based on a condition. For example:
Example (8)
import pandas as pd
# Create a sample dataframe
data = {'Product': ['Table', 'Chair', 'Sofa', 'Bed'],
'Price': [100, 50, 300, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Select rows where the price is greater than 100
selected_rows = df[df['Price'] > 100]
# Print the selected rows
print(selected_rows)
This would select and print the rows where the price is greater than 100:
Output:
Product | Price | Quantity | |
---|---|---|---|
2 | Sofa | 300 | 1 |
3 | Bed | 200 | 3 |
Handling Missing Data In Pandas DataFrame
In a Pandas DataFrame, missing data is represented as NaN (not a number). There are several ways to handle missing data in a Pandas DataFrame:
Drop rows with missing data: You can use the dropna() function to drop rows that contain missing data. For example:
Example (9)
import pandas as pd
# Create a sample dataframe with missing data
data = {'Product': ['Table', 'Chair', np.nan, 'Bed'],
'Price': [100, 50, np.nan, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Drop rows with missing data
df.dropna(inplace=True)
# Print the resulting dataframe
print(df)
This would drop the row with the missing data and print the resulting DataFrame:
Output:
Product | Price | Quantity | |
---|---|---|---|
0 | Table | 100.0 | 2 |
1 | Chair | 50.0 | 4 |
3 | Bed | 200.0 | 3 |
Fill missing data with a placeholder value: You can use the fillna() function to fill missing data with a placeholder value. For example:
Example (10)
import pandas as pd
# Create a sample dataframe with missing data
data = {'Product': ['Table', 'Chair', np.nan, 'Bed'],
'Price': [100, 50, np.nan, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Fill missing data with 0
df.fillna(0, inplace=True)
# Print the resulting dataframe
print(df)
This would fill the missing data with 0 and print the resulting DataFrame:
Output:
Product | Price | Quantity | |
---|---|---|---|
0 | Table | 100.0 | 2 |
1 | Chair | 50.0 | 4 |
2 | 0 | 0.0 | 1 |
3 | Bed | 200.0 | 3 |
Interpolate missing data: You can use the interpolate() function to interpolate missing data based on neighboring values. For example:
Example (11)
import pandas as pd
# Create a sample dataframe with missing data
data = {'Product': ['Table', 'Chair', np.nan, 'Bed'],
'Price': [100, 50, np.nan, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Interpolate missing data
df.interpolate(inplace=True)
# Print the resulting dataframe
print(df)
This would interpolate the missing data based on the neighboring values and print the resulting DataFrame:
Output:
Product | Price | Quantity | |
---|---|---|---|
0 | Table | 100.00 | 2 |
1 | Chair | 50.00 | 4 |
2 | Chair | 75.00 | 1 |
3 | Bed | 200.0 | 3 |
Observe that in the above example, Python has automatically assigned some value to the missing values in the data frame.
How to Iterate Over The Rows In Pandas Data Frame
Iteration over the rows in the pandas data frame is pretty straightforward. We need to use the iterrow function of pandas. We can get the individual cell values with the help of the name of the columns too. The below example illustrates the same:
Example (12)
import pandas as pd
# Create a sample dataframe
data = {'Product': ['Table', 'Chair', 'Sofa', 'Bed'],
'Price': [100, 50, 300, 200],
'Quantity': [2, 4, 1, 3]}
df = pd.DataFrame(data)
# Iterate over the rows
for index, row in df.iterrows():
# Calculate the total price for the row
total_price = row['Price'] * row['Quantity']
# Print the product name and total price
print(row['Product'], total_price)
This would print the product name and total price for each row:
Copy | code |
---|---|
Table | 200 |
Chair | 200 |
Sofa | 300 |
Bed | 600 |
How To Iterate Over Columns In Pandas Data Frame
Just like the previous example, we can also iterate the columns in the pandas using the iteritems function of pandas.
Example (13)
import pandas as pd
# Create a sample dataframe
data = {'Country': ['USA', 'India', 'China'],
'Capital': ['Washington', 'New Delhi', 'Beijing'],
'Population': [ 331002651, 1303171035, 207847528]}
df = pd.DataFrame(data)
# Iterate over the columns
for column_name, column in df.iteritems():
# Print the column name and values
print(column_name)
print(column)
Output:
Country | Capital | Population | |
---|---|---|---|
0 | Belgium | Brussels | 11190846 |
1 | India | New Delhi | 1303171035 |
2 | Brazil | Brasília | 207847528 |
a | b | c | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 |
Country | Capital | Population | |
---|---|---|---|
0 | Belgium | Brussels | 11190846 |
1 | India | New Delhi | 1303171035 |
2 | Brazil | Brasília | 207847528 |
Name | Age | |
---|---|---|
0 | Alex | 10 |
1 | Bob | 12 |
2 | Clarke | 13 |
Product | Quantity | |
---|---|---|
0 | Table | 2 |
1 | Chair | 4 |
2 | Sofa | 1 |
3 | Bed | 3 |
[‘Belgium’ ‘India’ ‘Brazil’]
Country | |
---|---|
0 | USA |
1 | India |
2 | China |
Name: Country, dtype: object
Capital | |
---|---|
0 | Washington |
1 | New Delhi |
2 | Beijing |
Name: Capital, dtype: object
Population | |
---|---|
0 | 331002651 |
1 | 1303171035 |
2 | 207847528 |
Name: Population, dtype: int64
Conclusion
In this article, we have understood in detail the python pandas data frame. We strongly recommend that readers look up the python pandas library and have more understanding of the topic by going through more examples and illustrations. We also encourage readers to post their queries in our Oraask community.