Fubra Blog

Alex Buell

Sitemap Validation

Posted 4:31 PM Wednesday September 17, 2008 by Alex Buell

Introduction

Hello, my name is Alex Buell, I am profoundly deaf, and work as a Linux system administrator within the Fubra infosphere. I usually spend most of my time working on open source projects, giving back to the community extra value in tools that allow us to do our job.

What are sitemaps?

They provide a way for webmasters (people who run websites) to give out information about the content on their websites. Search engines (i.e. www.google.co.uk) look ('crawling') through websites to build up indexes to allow people to search for things that they are interested in looking for.

Essentially a sitemap is just a file containing URLs in XML format, along with additional metadata about each of these URLs (when was it last updated, how often does it change, how vital it is), enabling search engines to make intelligent decisions about searching for links to or from pages on the web sites.

Sitemaps have to be regularly updated so we have a tool that automatically generate sitemaps which are stored within the web site, search engines can download these files and read through all the URLs.

A new tool for validating sitemaps

We at Fubra have developed a web tool to help with validating sitemaps on our websites. The tool does two things; reads the sitemap files off the website, and presents to us in a human readable form, and if asked, check through the URLs, displaying the http code (i.e. 404, 301, or 200) in the status icons on the right side.

Sitemap Validator

How to use the Validator webtool

To use the tool, point your browser at the Sitemap Validator site, and type in the URL (i.e. www.talkfootball.co.uk), and see what the tool does with it.

As it stands, the validator tool will look through the URLs and its associated metadata for validity and colour the icon on the right accordingly. Red icons means that the URL is a duplicate, orange means the date and time associated with the URL is invalid, whilst green means that the URL and its metadata is valid. The tool sorts the URLs in a way that all invalid URLs comes first and all valid URLs comes last. This makes it easy to see what is wrong with the sitemap.

Additionally, there is a button 'Check' in the top row, which, if clicked upon, will run a check on all the URLs on the site, and display the http code.

Any other business?

All comments, flames and feedback welcome, please do drop me an email at alex at fubra dot com