Oct 26, 2020

How to Check for Duplicate Content: Tools and Tips

You probably know that your website should always contain original content. If your site contains duplicate content, it is a huge mistake that can hurt your site ranking and your reputation. Plagiarism, or passing someone else’s work off as your own without permission, is unacceptable both online and offline. Duplicate content can cause you to be penalized by Google by having your page rank lowered or by having your web page removed altogether from searches. This defeats the purpose of publishing content at all.

Another possibility that you have to consider is that others may duplicate the content on your site and try to use it without your permission. These unscrupulous marketers may blatantly use the content you created on their websites without ever asking you or letting you know, and they may end up outranking you in the search engines.

How is Duplicate Content Defined?

Duplicate content is content that appears on more than one online location, meaning different websites. If you publish your own content in more than one place, you have duplicate content. If you copy someone else’s content onto your site or if they publish yours on their site, that’s duplicate content.

Search engines can have a difficult time determining which content is more relevant to a query in the search engine when content is too similar. The goal of search engines is to give users the best results possible when they search for a particular term. Google and other search engines may choose to exclude duplicate content from their search engine queries.

Some Causes of Duplicate Content

In many cases, the use of duplicate content is not intentional or intended to be malicious. Google refers to duplicate content as blocks of text that are identical or “appreciably similar” within or across domains. Examples of non-malicious duplicate content include store item descriptions and printer-only versions of web pages.

Deliberate duplication of content is another matter. When the same content is used on multiple domains in an attempt to increase traffic or manipulate search engine rankings, it can be frustrating for people who are attempting to search for information and end up getting the same content in multiple places. This is why search engines do their best to discourage this practice.

Using Google to check for Duplicate Content

One quick way to check if a page may be considered duplicate is by copying around ten words from the start of a sentence and then pasting it with quotes into Google. This is actually Google’s recommended way to check.

If you test this for a page on your website, you would expect to see only your webpage to show up and ideally with no other results.

If other websites show as well as your site, Google hints that it thinks the original source is the result it shows first. If this isn’t your website, you may have a duplicate content issue.

Repeat this process by testing a few random short sentences of text from your webpage into Google.

Free Tools to Check for Duplicate Content

When you are writing your content, you may unintentionally make your content too similar to already-published content. It’s always a good idea to double check everything you write using plagiarism checkers to make sure your content is viewed as unique. Several of these tools are available at no cost.

Here are some good free tools that can be used to check for duplicate content:

Copyscape – This tool can quickly check the content that you have written against already published content in a matter of seconds. The comparison tool will highlight content that shows up as duplicate, and it will let you know what percentage of your content matches already-published content.

Plagspotter – This tool can identify duplicate pages of content across the web. It’s a great tool for finding plagiarists who have stolen your content. It also allows you to automatically monitor your URLs on a weekly basis to identify duplicate content.

Duplichecker – This tool quickly checks the originality of the content you are planning to post on your site. Registered users can do up to 50 searches per day.

Siteliner – This is a great tool that can check your entire site once a month for duplicate content. It can also check for broken links and identifies pages that are most prominent to search engines.

Smallseotools – A variety of SEO tools are available, including a plagiarism checker that identifies fragments of identical content.

And if you want to dig deeper, these links also offer more tools at an affordable cost.

Premium Tools to Check for Plagiarism

Premium plagiarism checkers come with the ability to check for duplicate content using advanced algorithms. They give you the peace of mind of knowing your work won’t be attributed to someone who didn’t write it.

Premium plagiarism tools usually offer reports that can verify proof of originality. Future implications that your work is not original can be contradicted with these reports that can be saved in a PDF format.

Examples of premium tools to check for duplicate content include:

Grammarly – Their premium tool offers both a plagiarism checker and a check for grammar, word choice, and sentence structure.

Plagium – Offers a free quick search or a premium deep search.

Plagiarismcheck.org – Detects exact matches and paraphrased text.

Has Your Content Been Scraped?

The content on your website should be completely original, and the above tools can help you to make sure you have not inadvertently made your content too similar to content that appears on someone else’s website.

The other reason to continually check for duplicate content is there are websites out there who intentionally steal content from someone else’s blog to use on their own. This is typically done using automated software. If you are in the habit of auditing the content on your own site, you may be able to find that some of it has been scraped. How can you catch content scrapers? What should you do if you discover your content published verbatim on someone else’s site?

Ways to Catch Content Scrapers

Using premium plagiarism tools on a regular basis can help you locate content that you have written on someone else’s site. There are a few other options to catch content that has been scraped.

Trackbacks in Wordpress may show up in spam if you use Askimet. When your content always includes links to some of your other posts, you may be able to find content scrapers this way.

Use Webmaster tools, and check links to your site. When you have a large number of links from a particular site, you may find that some of your content has been scraped onto theirs. The only way to be sure is to visit their site and check which pages are linking to your site. You may find your own exact content appearing on their site.

Use Google Alerts to be notified if any of your post titles appear on the web after your content has already been published.

The more you establish yourself as an authority in your niche, the more you may find that those who have not yet established their own voice or authority want to borrow yours. It allows them to provide authoritative information on their blog without having to put forth the effort to create quality content themselves.

What to Do About Content Scrapers

Scraping content is unethical. Once you have discovered that your content has been scraped, you have a couple of options on what you should do.

Contact the owner of the website that published your content and let them know that you have found your content on their site. The site owner may not be aware that stolen content has been added to their site, so give them the benefit of the doubt. You can reach them through their contact form or through any of the social media platforms they participate in.

If it is a high-quality site, give them the option of keeping the content up while giving you credit as the author and a link to your site. Another option is to offer to write a revised article in exchange for a link. If it’s a low quality-site, let them know you want your content removed immediately.

If there is no apparent way to contact the owner of the website, do a Whois lookup. This will probably let you know who they are unless it is privately registered. If you are still unable to find out who the owner of the site is, you will be able to find out who is hosting it using the free tool Whoishostingthis.com. Contact the hosting company and let them know that the owner of the website is publishing copyrighted content. Web hosting companies take this kind of complaint seriously and will offer assistance in a timely manner.

Protecting Content with DMCA

You have the copyright to any original content that you publish on your site. One way of protecting yourself is to place a DMCA badge on your site. The DMCA states that they will do a takedown at no charge if your content is stolen while protected with one of their badges.

The DMCA helps to deter thieves and offers tools to help you locate unauthorized copies of your content on someone else’s site. They will quickly take down plagiarized content including pictures and videos.

Final Thoughts on Duplicate Content

People who go online to obtain information expect to find original and helpful content, and that’s what they should be able to find. Duplicate content should be avoided whenever possible. Content should be well-written and unique so that readers can have the best online experience possible.

Upgrade to Power Membership to continue your access to thousands of articles, toolkits, podcasts, lessons and much much more.
Become a Power Member

CPD points available

This content is eligible for CPD points. Please sign in if you wish to track this in your account.