Skip to content

A Python script for scraping website sitemap links and saving them to text files.

Notifications You must be signed in to change notification settings

SSujitX/Sitemap-Postlink-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Website Sitemap Scraper

The Website Sitemap Scraper is a Python script that allows you to fetch and extract sitemap links from a website. This tool is useful for collecting information about a website's structure and content.

Features

  • Fetches sitemap links from a specified website.
  • Saves the sitemap links to a text file for future reference.

Prerequisites

Before you begin, ensure you have met the following requirements:

  • Python 3.7 or higher installed on your system.
  • The following Python libraries installed:
    • httpx: Used for making asynchronous HTTP requests.
    • selectolax: Used for parsing HTML/XML content.

You can install the required libraries using pip:

pip install -r requirements.txt

Usage

  1. Clone this repository to your local machine:
git clone https://github.com/your-username/Sitemap-Postlink-Scraper.git
  1. Navigate to the project directory:
cd Sitemap-Postlink-Scraper
  1. Run the script:
python sitemap_post_scraper.py
  1. Follow the on-screen instructions to provide the URL of the website you want to scrape.

  2. If a sitemap is found on the website, the script will fetch and save the sitemap links to a text file named _sitemap_links.txt.

About

A Python script for scraping website sitemap links and saving them to text files.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages