Python Ques
Python Ques
Pandas is a powerful open-source data manipulation and analysis library for Python.
It provides easy-to-use data structures and functions to work with structured data,
such as tabular data, time series, and more. Pandas is widely used in data analysis
and manipulation tasks due to its flexibility, efficiency, and rich functionality.
You can install Pandas using pip, the Python package manager. Run the following
command in your terminal or command prompt:
You can create a DataFrame from a dictionary using the pd.DataFrame() constructor.
Each key-value pair in the dictionary corresponds to a column in the DataFrame.
python
import pandas as pd
df = pd.DataFrame(data)
The head() function returns the first n rows of a DataFrame, while the tail()
function returns the last n rows. They are useful for quickly inspecting the
beginning or end of a large DataFrame.
python
You can use the isnull() function to check for missing values in a DataFrame. It
returns a DataFrame of the same shape as the input with True where NaN values are
present and False otherwise.
python
missing_values = df.isnull()
8.Explain the purpose of the shape attribute in Pandas.
You can rename columns in a DataFrame using the rename() function. Specify the
current column names as keys and the new names as values in a dictionary.
The dtype parameter in Pandas specifies the data type of the elements in a
DataFrame or Series. It allows you to explicitly set or infer the data type of each
column, such as int, float, object, datetime, etc., during DataFrame creation or
manipulation. It helps ensure data integrity and optimize memory usage.
df = pd.DataFrame(data, dtype=int)
loc is used for label-based indexing, meaning you can specify row and column labels
to select data. iloc is used for integer-based indexing, meaning you can specify
integer indices to select data.
# Using loc
df.loc[2, 'column_name']
# Using iloc
df.iloc[2, 0]
You can select specific columns from a DataFrame by passing a list of column names
to the indexing operator [] or by using the loc or iloc accessor methods.
# Using loc
selected_columns = df.loc[:, ['column1', 'column2']]
# Using iloc
selected_columns = df.iloc[:, [0, 1]]
# Drop column
df.drop(columns=['column_name'], inplace=True)
# Drop row
df.drop(index=0, inplace=True)
The isin() function is used to filter rows in a DataFrame based on whether the
values in a column are present in a specified list or array. It returns a boolean
mask indicating which rows match the specified condition.
You can set a specific column as the index in a DataFrame using the set_index()
function. Specify the column name to be used as the index.
df.set_index('column_name', inplace=True)
The at and iat accessors are used for fast scalar value access in a DataFrame. They
provide optimized methods for accessing a single value based on label (at) or
integer position (iat).
# Using at accessor
value = df.at[row_label, column_label]
You can reset the index of a DataFrame using the reset_index() function. By
default, it creates a new DataFrame with the old index as a column and a new
sequential index. Use the drop parameter to avoid adding the old index as a column.
df.reset_index(inplace=True, drop=True)
The isin() function in Pandas is used to filter rows based on whether the values in
a column are present in a specified list or array. It returns a boolean mask
indicating which rows match the specified condition.
You can filter rows based on multiple conditions using boolean indexing with
logical operators (& for AND, | for OR, ~ for NOT). Enclose each condition within
parentheses.
Missing values in a DataFrame can be handled using methods like fillna(), dropna(),
or interpolate(). fillna() is used to fill missing values with a specified value,
dropna() is used to remove rows or columns with missing values, and interpolate()
is used to fill missing values by interpolation.
df.drop_duplicates(inplace=True)
You can convert data types in a Pandas DataFrame using the astype() function or
specific conversion functions like to_numeric(), to_datetime(), or to_timedelta().
The groupby() function in Pandas is used to split a DataFrame into groups based on
some criteria, such as unique values in one or more columns. It is typically
followed by an aggregation function to perform calculations within each group.
You can pivot a DataFrame using the pivot() function, which reshapes the data by
rearranging the rows and columns. It requires specifying columns to use as the new
index, columns, and values.
The merge() function in Pandas is used to combine two or more DataFrames based on
one or more common columns. It performs database-style joins, such as inner, outer,
left, and right joins, to merge DataFrames.