How Can We Help?
In this article, we will explore the fundamentals and most used methods of the Pandas Series.
What is Pandas Series?
Pandas series is a one-dimensional labeled array that is capable of storing data of various types, such as integers, strings, Python objects, etc. The Series object has a single axis (the index) that associates a name to each series element through which the element is accessed.
Data Structures in Pandas
Pandas have three different data structures:
Data structure | Dimensionality | Spreadsheet analog |
---|---|---|
Series | 1D | Column |
DataFrame | 2D | Single sheet |
Panel | 3D | Multiple sheets |
In this article, we will focus on the Series structures, which are one of the most widely used.
There are several ways to create a Pandas Series:
- Creating an Empty Series
- Creating a Series from list
- Creating a Series from dict
- Creating a Series from array
- Creating a Series from scalar value
Create an Empty Series
Let’s create our first Pandas Series; first, we import the pandas module, and then we can use the pd.Series() function as follows:
import pandas as pd
ser = pd.Series()
print(ser)
Output:
Series([], dtype: float64)
The output shows that the series data type is float by default.
Create Pandas Series From List
First, we create a list, and then we can create a series from that list using pd.Series().
import pandas as pd
list = ['O','r','a','a','s','k']
ser = pd.Series(list)
print(ser)
Output:
0 O
1 r
2 a
3 a
4 s
5 k
dtype: object
Create a Series from dict
Pandas Series can be created from dict indicating index or without providing an index. We will explain both methods.
Create a Series from dict Without Specifying an Index
We can pass a dict as input; Although we don’t specify the index, the dictionary keys are taken in a sorted order to construct the index.
data = {'zero' : 0., 'one' : 1., 'two' : 2.}
s = pd.Series(data)
print(s)
Output:
zero 0.0
one 1.0
two 2.0
dtype: float64
Create a Series from dict Indicating an Index
If we pass an index, the values in data corresponding to the index’s labels will be pulled out.
data = {'zero' : 0., 'one' : 1., 'two' : 2.}
s = pd.Series(data,index=['one','two','n','zero'])
print(s)
Output:
one 1.0
two 2.0
n NaN
zero 0.0
dtype: float64
Note that the index order is persisted, and the missing element is filled with NaN (Not a Number).
Create a Series from Array
Pandas Series can be created from an array indicating an index or without providing an index. We will explain both methods.
Create a Series from Array without Specifying Index
First, import a numpy module, then use the array() function as follows:
import numpy as np
array1 = np.array(['O','r','a','a','s','k'])
ser = pd.Series(array1)
print(ser)
Output:
0 O
1 r
2 a
3 a
4 s
5 k
dtype: object
In this case, if the index is not passed, then by default index is range(n) where n is array length: the indexes ranging from 0 to len(array1)-1 or 5.
Create a Series from Array Indicating Index
Now, let’s see how to create a Pandas Series providing a custom Index. To do so, we will use index param, which takes a list of index values. We must ensure the number of elements in the index list matches the array size.
array1 = np.array(['O','r','a','a','s','k'])
ser = pd.Series(array1, index=[100,101,102,103,104,105])
print(ser)
Output:
100 O
101 r
102 a
103 a
104 s
105 k
dtype: object
As we passed the index values, we can now see the customized indexed values in the output.
Create a Series from Scalar Value
To create a series from scalar value, we must indicate an index. The scalar value will be repeated to match the length of the index.
ser = pd.Series(23, index=[0, 1, 2, 3])
print(ser)
Output:
0 23
1 23
2 23
3 23
dtype: int64
The output shows that the scalar value (23) is repeated 4 times.
Binary Operation on Series
Pandas Series allows us to perform binary operations such as sum, mean, product, etc. To do so, we have to use the following functions: sum(), mean() and prod().
We will use the following series to perform the binary operations mentioned before:
s = pd.Series([1, 2, 3, 4], index=['one', 'two', 'three', 'four'])
print(s)
Output:
one 1
two 2
three 3
four 4
dtype: int64
Sum
This method is used to get the sum of the values for the requested axis by using sum().
s.sum()
Output:
10
Mean
Pandas Series.mean() method is used to get the mean of the values over the requested axis.
s.mean()
Output:
2.5
Prod
The prod() method returns the product of the values for the requested axis.
s.prod()
Output:
24
Conversion Operation on Series
On Pandas Series, we can perform conversion operations like changing the datatype of a series, changing a series to a list, etc. In order to perform conversion operation we must use the following methods .astype(), .tolist(), etc.
We will use the following series to perform the conversion operations mentioned before:
ser = pd.Series([1, 2, 3], dtype='int64')
print(ser)
Output:
0 1
1 2
2 3
dtype: int64
Converting Data type of Series
Pandas astype() is used to change the data type of a Series. In this example, we will convert the original Series data type to Float by using .astype() and passing ‘float’ as an argument.
ser_float = ser.astype('float')
print(ser_float)
Output:
0 1.0
1 2.0
2 3.0
dtype: float64
As we can see in the output, we have created a new Series ‘ser_float’ whose data type is ‘float’.
We can also convert the original Series data type to String by using .astype() and passing ‘str as an argument.
ser_str = ser.astype('str')
print(ser_str)
Output:
0 1
1 2
2 3
dtype: object
As we can see in the output, we have created a new Series ‘ser_str’ whose data type is ‘string’.
Converting Data type
To convert a pandas Series to a list, we can use the tolist() method on the series we want to convert.
ser_list = ser.tolist()
print(ser_list)
Output:
[1, 2, 3]
As we can see in the output, we have created a List ‘ser_list’ with the values [1, 2, 3].
Indexing Series
Pandas support three types of multi-axis indexing:
- Using []
- .loc
- .iloc
Having the following Series:
list = ['O','r','a','a','s','k']
list_index = ['O-1','r-2','a-3','a-4','s-5','k-6']
ser = pd.Series(list, index = list_index)
print(ser)
Output:
O-1 O
r-2 r
a-3 a
a-4 a
s-5 s
k-6 k
dtype: object
Using []
This is the most basic indexing; we indicate the index of the data we want to retrieve
ser[5]
Output:
‘k’
Using .loc
Uses row and column labels.
ser.loc['k-6']
Output:
‘k’
Using .iloc
Uses row and column indexes.
ser.iloc[5]
Output:
‘k’
Conclusion
In this article, we have learned the main features of the Pandas library and how to create Pandas Series from lists, arrays, dicts, and scalar values in Python. We have also covered binary operations on series such as sum, mean, and product, and we explored the conversion operation on series using .tolist() and astype(). Finally, we explored how to handle the data from these data structures by selecting rows and columns.