How to Avoid Duplicate Content by UnReal Web Marketing
Search engines dislike duplicate content. One reason is that major search engines aim to provide searchers with a diverse cross-section of unique content. Duplicate content often results in duplicate listings that impair the searcher's experience.
MOUNT SINAI, NY, May 15, 2010 /24-7PressRelease/ -- Search engines dislike duplicate content for a few reasons. One is that major search engines such as Google, Yahoo and Bing aim to provide searchers with a diverse cross-section of unique content, and duplicate content often results in duplicate listings that impair the searcher's experience. Another reason is that search engines don't want to spend the resources (bandwidth) on indexing pages that are very similar.
In some instances, pages containing duplicate content are filtered at the time search engine results are sorted, so there is no guarantee as to which version of a page will appear in results and which won't. Duplicate content may even hinder some sites and web pages from getting indexed by search engines, and there are some cases in which a search engine crawler will stop indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.
While content duplication is sometimes used in an attempt to manipulate search engine rankings to garner more website traffic, in most cases it occurs without ill intent on behalf of the site owner or webmaster. The following is a list of duplicate content scenarios that could be burdening your site.
Scenario #1: E-commerce sites that include product descriptions from manufacturers, producers, and publishers
Product distribution websites often use text from the manufacturer or producer of the product as a description for the item on their own pages. With the addition of the product name, creator, manufacturer, writer, or recording artist appearing on the page, there is a considerable amount of duplicate content on pages that don't originate from the same website. Here are some examples:
http://www.amazon.com/Sony-VGN-TXN15P-B-Notebook-Processor/dp/B000J43MR0
http://www.crowdstorm.com/Sony_VAIO_11_1_Widescreen_Notebook_PC_VGN_T ... +2973.html
http://www.clearanceclub.com/products/6495-VAIO-VGN-TXN15P-B
http://www.provantage.com/sony-vgntxn15p-b~7SONN0UX.htm
Scenario #2: Printer-friendly pages
Many sites offer "printer-friendly" versions of their content on different pages. Without the application of robots.txt disallow statements or meta "noindex" tags on these pages to keep search engines from indexing them, they may be indexed as duplicate content. See these samples:
http://www.constructionbook.com/xq/ASP/productid.5395/qx/printable_view_product.htm
http://www.tigerdirect.com/applications/searchtools/item-details-prin ... PX849%20SB
Scenario #3: Websites that create session IDs
A session ID lets you create customized applications for a more personalized user experience, thus increasing the appeal of your website. A visitor to your site would be assigned a unique session ID which is either stored in a cookie on the user side or is propagated in the URL.
Websites with session IDs serve information in their URLs to track visitors as they go through the pages of that site. When search engine crawlers detect this tracking information they may index the same page several times under different URLs. A good example of this is www.staples.com.
Search engine guidelines advise you to allow bots or spiders to crawl your sites without session IDs that track their path through the site. While this technique is great for tracking individual user behavior, the access pattern of bots is entirely different. Since bots cannot always decipher URLs that look different but point to the same page, the use of session IDs may result in incomplete indexing of your site.
Scenario #4: URLs that include multiple data variables
When multiple data variables exist within a URL, this causes bots to crawl and index the same page under different URLs. Here are some examples of sites that show different data variables in their URLs.
http://www.homedepot.com/webapp/wcs/stores/servlet/ProductDisplay?sto ... reId=10051
catalogId=10053
productId=100022126
categoryID=5028
http://www1.macys.com/catalog/index.ognc?CategoryID=30977&PageID=3097 ... ryID=30977
PageID=30977
LinkType=EverGreen
It is difficult for a search engine bot or spider to crawl the URLs listed above. If this scenario applies to your website, you may want to implement the mod-re-write server settings.
Scenario #5: Pages sharing similar elements
Some websites have elements that are very common from one page to another, such as title, meta descriptions, headings, navigation, and text that is shared site-wide. This can be a problem since bots might consider it to be duplicate content. Beware of this scenario if you own an e-commerce site that includes your brand name and information about that brand in every title on every page of your site. In addition, the use of content management systems that do not allow for distinct meta description tags to be placed on each page of a website can cause a similar dilemma.
Here are two well-known websites that use their brand names on every page:
http://www.barnesandnoble.com
http://www.officemax.com
Press Release Contact Information:
David Montalvo
UnReal Web Marketing
Senior SEO
P.O. BOX 590
Mount Sinai, NY
United States 11784
Voice: 631-891-8536
Fax: 775-261-0944
Website: Visit Our Website


