What is Duplicate Content?
In web site terms, "duplicate content" is content that is the same, or where large parts of it are the same, as content published elsewhere. This can be content that is duplicated on the same site, or content that is duplicated by being published on other sites. Most duplicate content occurs by accident or through people not being aware of the implications of having it. Some content duplication is deliberate and is used by spammers to try to influence search engine results.
Duplicate Content in Mambo
Mambo can produce duplicate content. The key culprits are as follows:
- "print this page" link
- PDF link
- Mambo's URL structure.
We looked at the anatomy of the Mambo URL and talked about why you should use SEF URl's with Mambo. So, let's consider the other two issues.
If you don"t need the "print this page", pdf and "tell a friend" features - disable them from within your global configuration options.
The greatest contributor to duplicate content in Mambo, outside of the URL structure, is the PDF generator. Mambo's PDF generator is fairly basis, does not include images and is limited with its use of character sets. If you want to provide PDF's of your content you would be better off creating your own PDF files and making these available through links on your site. To prevent searchbots from indexing the PDF downloads, add a rel="noindex, nofollow" within the link. However, if you wish to utilise the in-built PDF generator function of Mambo you can still tell searchbots not to crawl the generated output.
How to Tell Search Engines Not to Index Mambo PDFs
In Mambo versions less than 4.7, the PDF generator link uses JavaScript. It builds a dynamic link through the PDF icon which will appear on each of your pages in your Mambo site. Note: This only appears if you have enabled PDF's through the options within your site's global configuration content screen. Due to the way the link is constructed, blocking it through a robots.txt directive is not practical. However, the following small core hack tells search engines not to index or follow the link.
In Mambo <=4.6.4, go to /components/com_content/content.html.php
Look for the following:
/** * Writes PDF icon */
At around line 620 (depending on your version of Mambo), look for the following code:
<a href="javascript:void window.open('<?php echo $link; ?>', 'win2', '<?php echo $status; ?>');" title="<?php echo T_('PDF');?>">
Let's add a rel="noindex.nofollow" to this code to tell search robots not to index output of that link and not to follow the link. This result will look like this:
<a href="javascript:void window.open('<?php echo $link; ?>', 'win2', '<?php echo $status; ?>');" rel="noindex, nofollow" title="<?php echo T_('PDF');?>">
You can do the same for the "print this page" function or, alternatively, disable it altogether. Most visitors will know how to use their browser's built-in print function and provding a "print.css" for your template will enable you to provide a much better printed page than you can otherwise get through the default print feature.
In the next tutorial we will look at the common causes of duplicate content. Till next time…







