thesitemapper is able to create an HTML Site Map which displays page titles, descriptions, and urls to be used as an index page for a web site and an XML Site Map to be used with search engines to help them identify new and used pages.
The application can be set up to automatically crawl a number of web sites, create an XML site map and html site map for each web site, and then ftp them to the required location on the appropriate web server. An in built scheduler enables the crawler to start at predefined times for each site.
Each site is set up independently, and you can set up as many sites as you wish. Each site can then be identified to be active so that crawling the site will automatically generate an HTML site map and XML site map when the crawler has completed. You may also manually generate the site maps at any time without re-crawling.
Clicking on the 'Crawl All sites ...' button will crawl all sites that have been identified as being active. When the site is first created, the site is always identified as active.
Clicking on the 'Start crawl' button associated with a particular site will only crawl that site.
Once crawling has completed, the results can be automatically ftp'd to the destination server and search engines pinged to indicate a new XML map.
There are a number of formats to choose from for the html page displays. These include single columns, 2-column, multi-page A to Z and various combinations of those. You may also create your own template web page to match the look of your site - the results can then be automatically inserted into the template web page.
When you tick the box 'Check Google Analytics' on the Web Settings Page, the page report listing identifies if you have Google Analytics installed on the pages.
This is designed to help you configure Google Analytics, whether using the older urchin.js code or if you recently upgraded to the new ga.js tracking code. This diagnostic tool identifies pages on your web site that have GA tracking code properly installed. This makes it easy for you to isolate the pages with tracking problems, fix them, and effectively manage your Google Analytics installation.
This page sets up various crawling parameters.
For a complete description of the meaning of these settings, refer to http://www.sitemaps.org/protocol.php
You may set the Change Frequency, File last modified and Priority.
You may select the formatting of each page element by clicking on the Fonts, Colors etc button. You may either enter a css style name - which will need to exist in a style sheet for it to render correctly - or you may select fixed fonts, size and colors for the elements.
This button displays a set of options which may be used to alter other formatting definitions such as table cell padding, table cell spacing and so on.
When you create a new site, the format settings for fonts, colors etc are pre-defined to give a standard looking display.
When you crawl the web site, the folder names are extracted and stored with the url. The folder names may then be displayed on the html site map to categorize the displays. However, the folder names are not always appropriate and the Folder Alias button allows you to enter a different folder name which will appear on the html site map.
If you wish to use your own web page layout in the form of an HTML page, enter the following at the point in the template file where you want the html site map to be displayed :
<!-- THESITEMAPPER -->
Then enter the file name into the "Template file" text box. When the html site map is created, it will place the site map at that point in the template file.
When a site is crawled, you can set the application to automatically ftp the results to your web server on completion.
First enter in your FTP settings, FTPHost, Username and Password. The XML Site Map will be ftp'd to the root of the web site.
Automatically FTP site maps when created - Tick this box to automatically ftp all the site maps when they are created.
Notify (ping) search URLs on completion - Tick this box so that the search sites are automatically pinged when the site maps are created and after they have been ftp'd to your site.
More and more search engines are using the XML site map method and this form enables you to add new urls yourself, just add the root url to the list.
Enter the remote path for HTML site map on the server - This will be a folder name where you want the site map to be ftp'd to.
FTP XML Site Map and FTP HTML Site Map - These allow you to manually ftp the generated site maps to you web server - useful when you want to test the ftp system.
Ping Search URLs - This allows you to manually ping the search engines.
Ticking the 'Enable for this site' will Enable the scheduler. Choose the days when it should run and the time it should start from.
You may also use the Windows Scheduler to schedule the crawl. This is done using the command line as described by clicking here.
The results page is a simple display of the XML Site Map and also provides a validation of the XML Site Map.
If you wish to exclude text from the crawler, such as menu, footer or other non relevant information, then use the following comments :
<!-- exclude_start -->
text to be excluded
<!-- exclude_end -->
You may run the application from the Windows Command line using :
To start the crawl use
which will cause all sites to be crawled and all indexes to be created.
Putting this into the Windows Scheduler will allow you to run the application at defined times without using the inbuilt scheduler system.