Pandas Python PDF
Pandas Python PDF
Pandas Python PDF
Series
DataFrame
Panel
These data structures are built on top of Numpy array, which means they are fast.
Dimension & Description
The best way to think of these data structures is that the higher dimensional data
structure is a container of its lower dimensional data structure. For example, DataFrame
is a container of Series, Panel is a container of DataFrame.
Building and handling two or more dimensional arrays is a tedious task, burden is placed
on the user to consider the orientation of the data set when writing functions. But using
Pandas data structures, the mental effort of the user is reduced.
For example, with tabular data (DataFrame) it is more semantically helpful to think of
the index (the rows) and the columns rather than axis 0 and axis 1.
Mutability
All Pandas data structures are value mutable (can be changed) and except Series all
are size mutable. Series is size immutable.
Note − DataFrame is widely used and one of the most important data structures. Panel
is used much less.
Series
Series is a one-dimensional array like structure with homogeneous data. For example,
the following series is a collection of integers 10, 23, 56, …
10 23 56 17 52 61 73 90 26 72
Key Points
Homogeneous data
Size Immutable
Values of Data Mutable
DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For example,
The table represents the data of a sales team of an organization with their overall
performance rating. The data is represented in rows and columns. Each column
represents an attribute and each row represents a person.
Column Type
Name String
Age Integer
Gender String
Rating Float
Key Points
Heterogeneous data
Size Mutable
Data Mutable
Panel
Panel is a three-dimensional data structure with heterogeneous data. It is hard to
represent the panel in graphical representation. But a panel can be illustrated as a
container of DataFrame.
Key Points
Heterogeneous data
Size Mutable
Data Mutable
Series is a one-dimensional labeled array capable of holding data of any type (integer,
string, float, python objects, etc.). The axis labels are collectively called index.
pandas.Series
A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
1
data
data takes various forms like ndarray, list, constants
2
index
Index values must be unique and hashable, same length as data.
Default np.arrange(n) if no index is passed.
3
dtype
dtype is for data type. If None, data type will be inferred
4
copy
Copy data. Default False
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion
in rows and columns.
Features of DataFrame
pandas.DataFrame
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
The parameters of the constructor are as follows −
1
data
data takes various forms like ndarray, series, map, lists, dict, constants and also
another DataFrame.
2
index
For the row labels, the Index to be used for the resulting frame is Optional Default
np.arange(n) if no index is passed.
3
columns
For column labels, the optional default syntax is - np.arange(n). This is only true if
no index is passed.
4
dtype
Data type of each column.
5
copy
This command (or whatever it is) is used for copying of data, if the default is False.
Create DataFrame
A pandas DataFrame can be created using various inputs like −
Lists
dict
Series
Numpy ndarrays
Another DataFrame
In the subsequent sections of this chapter, we will see how to create a DataFrame using
these inputs.
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5
Example 2
Live Demo
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Example 3
Live Demo
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Its output is as follows −
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Note − Observe, the dtype parameter changes the type of Age column to floating point.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve',
'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
Its output is as follows −
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Note − Observe the values 0,1,2,3. They are the default index assigned to each using
the function range(n).
Example 2
Let us now create an indexed DataFrame using arrays.
Live Demo
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve',
'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
Its output is as follows −
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
Note − Observe, the index parameter assigns an index to each row.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print df
Its output is as follows −
a b c
0 1 2 NaN
1 5 10 20.0
Note − Observe, NaN (Not a Number) is appended in missing areas.
Example 2
The following example shows how to create a DataFrame by passing a list of dictionaries
and the row indices.
Live Demo
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print df
Its output is as follows −
a b c
first 1 2 NaN
second 5 10 20.0
Example 3
The following example shows how to create a DataFrame with a list of dictionaries, row
indices, and column indices.
Live Demo
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a',
'b1'])
print df1
print df2
Its output is as follows −
#df1 output
a b
first 1 2
second 5 10
#df2 output
a b1
first 1 NaN
second 5 NaN
Note − Observe, df2 DataFrame is created with a column index other than the dictionary
key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices
same as dictionary keys, so NaN’s appended.
import pandas as pd
df = pd.DataFrame(d)
print df
Its output is as follows −
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Note − Observe, for the series one, there is no label ‘d’ passed, but in the result, for
the d label, NaN is appended with NaN.
Let us now understand column selection, addition, and deletion through examples.
Column Selection
We will understand this by selecting a column from the DataFrame.
Example
Live Demo
import pandas as pd
df = pd.DataFrame(d)
print df ['one']
Its output is as follows −
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
Column Addition
We will understand this by adding a new column to an existing data frame.
Example
Live Demo
import pandas as pd
df = pd.DataFrame(d)
print df
Its output is as follows −
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Column Deletion
Columns can be deleted or popped; let us take an example to understand how.
Example
Live Demo
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
import pandas as pd
df = pd.DataFrame(d)
print df.loc['b']
Its output is as follows −
one 2.0
two 2.0
Name: b, dtype: float64
The result is a series with labels as column names of the DataFrame. And, the Name of
the series is the label with which it is retrieved.
Selection by integer location
Rows can be selected by passing integer location to an iloc function.
Live Demo
import pandas as pd
df = pd.DataFrame(d)
print df.iloc[2]
Its output is as follows −
one 3.0
two 3.0
Name: c, dtype: float64
Slice Rows
Multiple rows can be selected using ‘ : ’ operator.
Live Demo
import pandas as pd
df = pd.DataFrame(d)
print df[2:4]
Its output is as follows −
one two
c 3.0 3
d NaN 4
Addition of Rows
Add new rows to a DataFrame using the append function. This function will append the
rows at the end.
Live Demo
import pandas as pd
df = df.append(df2)
print df
Its output is as follows −
a b
0 1 2
1 3 4
0 5 6
1 7 8
Deletion of Rows
Use index label to delete or drop rows from a DataFrame. If label is duplicated, then
multiple rows will be dropped.
If you observe, in the above example, the labels are duplicate. Let us drop a label and
will see how many rows will get dropped.
Live Demo
import pandas as pd
df = df.append(df2)
print df
Its output is as follows −
a b
1 3 4
1 7 8
A panel is a 3D container of data. The term Panel data is derived from econometrics
and is partially responsible for the name pandas − pan(el)-da(ta)-s.
The names for the 3 axes are intended to give some semantic meaning to describing
operations involving panel data. They are −
items − axis 0, each item corresponds to a DataFrame contained inside.
major_axis − axis 1, it is the index (rows) of each of the DataFrames.
minor_axis − axis 2, it is the columns of each of the DataFrames.
pandas.Panel()
A Panel can be created using the following constructor −
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
The parameters of the constructor are as follows −
Parameter Description
data Data takes various forms like ndarray, series, map, lists, dict, constants and also
another DataFrame
items axis=0
major_axis axis=1
minor_axis axis=2
Create Panel
A Panel can be created using multiple ways like −
From ndarrays
From dict of DataFrames
From 3D ndarray
Live Demo
data = np.random.rand(2,4,5)
p = pd.Panel(data)
print p
Its output is as follows −
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4
Note − Observe the dimensions of the empty panel and the above panel, all the objects
are different.
From dict of DataFrame Objects
Live Demo
Items
Major_axis
Minor_axis
Using Items
Live Demo
We will majorly focus on the DataFrame objects because of its importance in the real
time data processing and also discuss a few other DataStructures.
1
axes
Returns a list of the row axis labels
2
dtype
Returns the dtype of the object.
3
empty
Returns True if series is empty.
4
ndim
Returns the number of dimensions of the underlying data, by definition 1.
5
size
Returns the number of elements in the underlying data.
6
values
Returns the Series as ndarray.
7
head()
Returns the first n rows.
8
tail()
Returns the last n rows.
Let us now create a Series and see all the above tabulated attributes operation.
Example
Live Demo
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
#Create a series with 100 random numbers
s = pd.Series(np.random.randn(4))
print ("Is the Object empty?")
print s.empty
Its output is as follows −
Is the Object empty?
False
ndim
Returns the number of dimensions of the object. By definition, a Series is a 1D data
structure, so it returns
Live Demo
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
2
axes
Returns a list with the row axis labels and column axis labels as the only members.
3
dtypes
Returns the dtypes in this object.
4
empty
True if NDFrame is entirely empty [no items]; if any of the axes are of length 0.
5
ndim
Number of axes / array dimensions.
6
shape
Returns a tuple representing the dimensionality of the DataFrame.
7
size
Number of elements in the NDFrame.
8
values
Numpy representation of NDFrame.
9
head()
Returns the first n rows.
10
tail()
Returns last n rows.
Let us now create a DataFrame and see all how the above mentioned attributes operate.
Example
Live Demo
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
print df
Its output is as follows −
Our data series is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
T (Transpose)
Returns the transpose of the DataFrame. The rows and columns will interchange.
Live Demo
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print df.T
Its output is as follows −
The transpose of the data series is:
0 1 2 3 4 5 6
Age 25 26 25 23 30 29 23
Name Tom James Ricky Vin Steve Smith Jack
Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8
axes
Returns the list of row axis labels and column axis labels.
Live Demo
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Row axis labels and column axis labels are:")
print df.axes
Its output is as follows −
Row axis labels and column axis labels are:
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("The data types of each column are:")
print df.dtypes
Its output is as follows −
The data types of each column are:
Age int64
Name object
Rating float64
dtype: object
empty
Returns the Boolean value saying whether the Object is empty or not; True indicates that
the object is empty.
Live Demo
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Is the object empty?")
print df.empty
Its output is as follows −
Is the object empty?
False
ndim
Returns the number of dimensions of the object. By definition, DataFrame is a 2D object.
Live Demo
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print df
print ("The dimension of the object is:")
print df.ndim
Its output is as follows −
Our object is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print df
print ("The shape of the object is:")
print df.shape
Its output is as follows −
Our object is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print df
print ("The total number of elements in our object is:")
print df.size
Its output is as follows −
Our object is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print df
print ("The first two rows of the data frame is:")
print df.head(2)
Its output is as follows −
Our data frame is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print df
print ("The last two rows of the data frame is:")
print df.tail(2)
Its output is as follows −
Our data frame is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
import pandas as pd
import numpy as np
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
print df
Its output is as follows −
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
7 34 Lee 3.78
8 40 David 2.98
9 30 Gasper 4.80
10 51 Betina 4.10
11 46 Andres 3.65
sum()
Returns the sum of the values for the requested axis. By default, axis is index (axis=0).
Live Demo
import pandas as pd
import numpy as np
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
print df.sum()
Its output is as follows −
Age 382
Name TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Rating 44.92
dtype: object
Each individual column is added individually (Strings are appended).
axis=1
This syntax will give the output as shown below.
Live Demo
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print df.sum(1)
Its output is as follows −
0 29.23
1 29.24
2 28.98
3 25.56
4 33.20
5 33.60
6 26.80
7 37.78
8 42.98
9 34.80
10 55.10
11 49.65
dtype: float64
mean()
Returns the average value
Live Demo
import pandas as pd
import numpy as np
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
print df.mean()
Its output is as follows −
Age 31.833333
Rating 3.743333
dtype: float64
std()
Returns the Bressel standard deviation of the numerical columns.
Live Demo
import pandas as pd
import numpy as np
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
print df.std()
Its output is as follows −
Age 9.232682
Rating 0.661628
dtype: float64
Summarizing Data
The describe() function computes a summary of statistics pertaining to the DataFrame
columns.
Live Demo
import pandas as pd
import numpy as np
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
print df.describe()
Its output is as follows −
Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000
This function gives the mean, std and IQR values. And, function excludes the character
columns and given summary about numeric columns. 'include' is the argument which is
used to pass necessary information regarding what columns need to be considered for
summarizing. Takes the list of values; by default, 'number'.
import pandas as pd
import numpy as np
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
print df.describe(include=['object'])
Its output is as follows −
Name
count 12
unique 12
top Ricky
freq 1
Now, use the following statement and check the output −
Live Demo
import pandas as pd
import numpy as np
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
print df. describe(include='all')
Its output is as follows −
Age Name Rating
count 12.000000 12 12.000000
unique NaN 12 NaN
top NaN Ricky NaN
freq NaN 1 NaN
mean 31.833333 NaN 3.743333
std 9.232682 NaN 0.661628
min 23.000000 NaN 2.560000
25% 25.000000 NaN 3.230000
50% 29.500000 NaN 3.790000
75% 35.500000 NaN 4.132500
max 51.000000 NaN 4.800000
import pandas as pd
import numpy as np
def adder(ele1,ele2):
return ele1+ele2
df =
pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.pipe(adder,2)
print df.apply(np.mean)
Its output is as follows −
col1 col2 col3
0 2.176704 2.219691 1.509360
1 2.222378 2.422167 3.953921
2 2.241096 1.135424 2.696432
3 2.355763 0.376672 1.182570
4 2.308743 2.714767 2.130288
Row or Column Wise Function Application
Arbitrary functions can be applied along the axes of a DataFrame or Panel using
the apply() method, which, like the descriptive statistics methods, takes an optional axis
argument. By default, the operation performs column wise, taking each column as an
array-like.
Example 1
Live Demo
import pandas as pd
import numpy as np
df =
pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.apply(np.mean)
print df.apply(np.mean)
Its output is as follows −
col1 -0.288022
col2 1.044839
col3 -0.187009
dtype: float64
By passing axis parameter, operations can be performed row wise.
Example 2
Live Demo
import pandas as pd
import numpy as np
df =
pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.apply(np.mean,axis=1)
print df.apply(np.mean)
Its output is as follows −
col1 0.034093
col2 -0.152672
col3 -0.229728
dtype: float64
Example 3
Live Demo
import pandas as pd
import numpy as np
df =
pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.apply(lambda x: x.max() - x.min())
print df.apply(np.mean)
Its output is as follows −
col1 -0.167413
col2 -0.370495
col3 -0.707631
dtype: float64
import pandas as pd
import numpy as np
df =
pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
# My custom function
df['col1'].map(lambda x:x*100)
print df.apply(np.mean)
Its output is as follows −
col1 0.480742
col2 0.454185
col3 0.266563
dtype: float64
Example 2
Live Demo
import pandas as pd
import numpy as np
# My custom function
df =
pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df.applymap(lambda x:x*100)
print df.apply(np.mean)
Its output is as follows −
col1 0.395263
col2 0.204418
col3 -0.795188
dtype: float64
Python Pandas - Reindexing
Reindexing changes the row labels and column labels of a DataFrame.
To reindex means to conform the data to match a given set of labels along a particular
axis.
Multiple operations can be accomplished through indexing like −
Reorder the existing data to match a new set of labels.
Insert missing value (NA) markers in label locations where no data for the label
existed.
Example
Live Demo
import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({
'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
'x': np.linspace(0,stop=N-1,num=N),
'y': np.random.rand(N),
'C': np.random.choice(['Low','Medium','High'],N).tolist(),
'D': np.random.normal(100, 10, size=(N)).tolist()
})
print df_reindexed
Its output is as follows −
A C B
0 2016-01-01 Low NaN
2 2016-01-03 High NaN
5 2016-01-06 Low NaN
import pandas as pd
import numpy as np
df1 =
pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 =
pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
df1 = df1.reindex_like(df2)
print df1
Its output is as follows −
col1 col2 col3
0 -2.467652 -1.211687 -0.391761
1 -0.287396 0.522350 0.562512
2 -0.255409 -0.483250 1.866258
3 -1.150467 -0.646493 -0.222462
4 0.152768 -2.056643 1.877233
5 -1.155997 1.528719 -1.343719
6 -1.015606 -1.245936 -0.295275
Note − Here, the df1 DataFrame is altered and reindexed like df2. The column names
should be matched or else NAN will be added for the entire column label.
import pandas as pd
import numpy as np
df1 =
pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 =
pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print df2.reindex_like(df1)
import pandas as pd
import numpy as np
df1 =
pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 =
pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print df2.reindex_like(df1)
Renaming
The rename() method allows you to relabel an axis based on some mapping (a dict or
Series) or an arbitrary function.
Let us consider the following example to understand this −
Live Demo
import pandas as pd
import numpy as np
df1 =
pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print df1