Mambo is a good open source content management system and is used by many webmasters. The package itself tries to support seo, search engine optimisation, efforts and it is able to produce search engine friendly url’s.

Unfortunately there is a hidden problem which many people may not be aware of.

One of the article layout options relates to the display of some helpful icons to print, create a pdf or to email the article being read. This appears to be a very user friendly option and designed to enhance the user experience on the site.

When you look behind the code a problem appears.

I have a mambo website and noticed googlebot indexing some strange looking urls when reviewing my weblogs. Further investigation revealed that it was following the printer, pdf and email links to find additional pages.

This creates two issues: -

1. The new urls are not search engine friendly but more importantly …

2. The page content is the same as the original page and hence can lead to a duplicate content penalty by Google. This could reduce the ranking of your pages or possibly lead to them being left out of the index altogether.

What is the solution?

1. The rapid solution is to turn off these features in the mambo configuration options.

2. The more complex but better option is to exclude googlebot from indexing these urls by use of the robots.txt file

3. Insert a meta tag in the header of the duplicate page such as < meta name="robots" content="noindex,nofollow">

When you have done this one further problem will remain. When the search engine returns and tries to find these urls they will not be found. In these circumstances most mambo installations will not produce a page not found 404 error but will show the site home page. Nice for the customer experience but again flagging duplicate content to the search engine.

Now I need to sit down and write an article on how to properly produce a 404 error response.