Ratings | | Unique User Downloads | | Download Rankings |
Not yet rated by the users | | Total: 141 | | All time: 9,196 This week: 78 |
|
Description | | Author |
This class can parse a sitemap to get the URLs of the site pages.
It can take as a parameter the URL of a given sitemap.
The class loads and parse the sitemap XML file to extract the URLs of the site pages and other resources that are listed.
If the sitemap points to other sitemap files, the class also loads and parse those sitemaps, so it can return all URLs that are listed.
Innovation Award
August 2021
Number 7 |
Sitemaps are unique resources that many sites contain to list all the URLs of the pages and other relevant site resources.
Sitemaps may be helpful to share the list of site pages with search engines like Google.
Search engines can use a sitemap to get the list of all the site's pages. This possibility may help a site to notify Google faster about newly published pages.
Sitemaps may also be useful for tools that can crawl the site pages to verify any errors.
This package can crawl a sitemap to retrieve the list of all the pages of a site. The package can be helpful to develop tools that need to crawl the site pages.
Manuel Lemos |
| |
|
|
Innovation award
Nominee: 6x |
|
Example
<?php
require_once __DIR__ . '/../vendor/autoload.php';
if($argc == 2) {
$crawler = new \BABA\Utilities\SitemapCrawler();
$crawler->crawleit($argv[1]);
foreach($crawler->getUrls() as $url) {
echo "$url\n";
}
} else {
echo "cravleit.php <url of your sitemap>\n";
}
|
Details
PHP Sitemap Crawler
Scrape list of url from sitemap
Main purpose of this library is to scrape list of url from sitemap file
Install
git clone https://github.com/sjurajpuchky/php-sitemap-crawler.git
cd php-sitemap-crawler
composer install
Examples
In folder samples you can find some basic usage of library.
# License
GPL-2.0-only
# Authors
Juraj Puchký - BABA Tumise s.r.o. <info@baba.bj>
https://www.seoihned.cz - SEO optilamizace
https://www.baba.bj - Tvorba webových stránek
https://www.webtrace.cz - Tvorba portál? a ecommerce b2b/b2c (eshop?) na zakázku
# Log
1.0.0 - first release
# Copyright
© 2021 BABA Tumise s.r.o.
|
Applications that use this package |
|
No pages of applications that use this class were specified.
If you know an application of this package, send a message to the author to add a link here.