
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Working with HTML Data
The Pandas library provides extensive functionalities for handling data from various formats. One such format is HTML (HyperText Markup Language), which is a commonly used format for structuring web content. The HTML files may contain tabular data, which can be extracted and analyzed using the Pandas library.
An HTML table is a structured format used to represent tabular data in rows and columns within a webpage. Extracting this tabular data from an HTML is possible by using the pandas.read_html() function. Writing the Pandas DataFrame back to an HTML table is also possible using the DataFrame.to_html() method.
In this tutorial, we will learn about how to work with HTML data using Pandas, including reading HTML tables and writing the Pandas DataFrames to HTML tables.
Reading HTML Tables from a URL
The pandas.read_html() function is used for reading tables from HTML files, strings, or URLs. It automatically parses <table> elements in HTML and returns a list of pandas.DataFrame objects.
Example
Here is the basic example of reading the data from a URL using the pandas.read_html() function.
import pandas as pd # Read HTML table from a URL url = "https://www.tutorialspoint.com/sql/sql-clone-tables.htm" tables = pd.read_html(url) # Access the first table from the URL df = tables[0] # Display the resultant DataFrame print('Output First DataFrame:', df.head())
Following is the output of the above code −
Output First DataFrame:
ID | NAME | AGE | ADDRESS | SALARY | |
---|---|---|---|---|---|
0 | 1 | Ramesh | 32 | Ahmedabad | 2000.0 |
1 | 2 | Khilan | 25 | Delhi | 1500.0 |
2 | 3 | Kaushik | 23 | Kota | 2000.0 |
3 | 4 | Chaitali | 25 | Mumbai | 6500.0 |
4 | 5 | Hardik | 27 | Bhopal | 8500.0 |
Reading HTML Data from a String
Reading HTML data directly from a string can be possible by using the Python's io.StringIO module.
Example
The following example demonstrates how to read the HTML string using StringIO without saving to a file.
import pandas as pd from io import StringIO # Create an HTML string html_str = """ <table> <tr><th>C1</th><th>C2</th><th>C3</th></tr> <tr><td>a</td><td>b</td><td>c</td></tr> <tr><td>x</td><td>y</td><td>z</td></tr> </table> """ # Read the HTML string dfs = pd.read_html(StringIO(html_str)) print(dfs[0])
Following is the output of the above code −
C1 | C2 | C3 | |
---|---|---|---|
0 | a | b | c |
1 | x | y | z |
Example
This is an alternative way of reading the HTML string with out using the io.StringIO module. Here we will save the HTML string into a temporary file and read it using the pandas.read_html() function.
import pandas as pd # Create an HTML string html_str = """ <table> <tr><th>C1</th><th>C2</th><th>C3</th></tr> <tr><td>a</td><td>b</td><td>c</td></tr> <tr><td>x</td><td>y</td><td>z</td></tr> </table> """ # Save to a temporary file and read with open("temp.html", "w") as f: f.write(html_str) df = pd.read_html("temp.html")[0] print(df)
Following is the output of the above code −
C1 | C2 | C3 | |
---|---|---|---|
0 | a | b | c |
1 | x | y | z |
Handling Multiple Tables from an HTML file
While reading an HTML file of containing multiple tables, we can handle it by using the match parameter of the pandas.read_html() function to read a table that has specific text.
Example
The following example reads a table that has a specific text from the HTML file of having multiple tables using the match parameter.
import pandas as pd # Read tables from a SQL tutorial url = "https://www.tutorialspoint.com/sql/sql-clone-tables.htm" tables = pd.read_html(url, match='Field') # Access the table df = tables[0] print(df.head())
Following is the output of the above code −
Field | Type | Null | Key | Default | Extra | |
---|---|---|---|---|---|---|
1 | ID | int(11) | NO | PRI | NaN | NaN |
2 | NAME | varchar(20) | NO | NaN | NaN | NaN |
3 | AGE | int(11) | NO | NaN | NaN | NaN |
4 | ADDRESS | char(25) | YES | NaN | NaN | NaN |
5 | SALARY | decimal(18,2) | YES | NaN | NaN | NaN |
Writing DataFrames to HTML
Pandas DataFrame objects can be converted to HTML tables using the DataFrame.to_html() method. This method returns a string if the parameter buf is set to None.
Example
The following example demonstrates how to write a Pandas DataFrame to an HTML Table using the DataFrame.to_html() method.
import pandas as pd # Create a DataFrame df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"]) # Convert the DataFrame to HTML table html = df.to_html() # Display the HTML string print(html)
Following is the output of the above code −
<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>A</th> <th>B</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>2</td> </tr> <tr> <th>1</th> <td>3</td> <td>4</td> </tr> </tbody> </table>