Duplicate content describe content that appears on more than one website which contain essentially the same content. Search engines such as Google can penalize/not display that site in any relevant search results. Though only the website which was first indexed originally will be considered and other websites will be discounted.
Substantial Blocs of content
According to Google, duplicative content usually refers to substantial blocks of content, either in-domain or cross-domain, that are completely identical or are significantly similar. Same-copy content is defined as blocks of content that are appreciably similar, which may range from exact copies to content containing large sections of copied text. Duplicate content refers to substantial portions of the same or highly similar text appearing in more than one webpage, both within the domain and between the domains. Taken narrowly, duplicate content refers to content that is very similar, or exactly the same, across multiple pages on either ones website or other websites.
One issue that you might run into with duplicate content is that while your site might initially have published it, other sites blindly duplicating the content might appear in results for relevant search queries. Copying is especially problematic if your website has low domain authority, while one copying your content has high domain authority. Websites with higher domain authorities are typically crawled more frequently, which results in copied content being first scanned by the crawlers instead of the original site of the content.
Duplicate content means similar content appears at several locations (URLs) across the internet, and thus, search engines are unsure of the URLs that should appear in the search results. When there are several pieces of, as Google calls them, appreciably similar content in more than one place on the internet, search engines may have trouble deciding which version is most relevant for a given search term. Search engines will struggle to aggregate the linking metrics (authority, relevance, and trust) for a content, particularly when other websites are linking to more than one version of the content.
The pain occurs mostly because search engines are confused by the many versions of the same content, and only display one, thereby reducing visibility of each of the duplicates. This hurts a web pages rankings, a problem that gets worse when people start linking to the various versions of the same content. Google is very aware of content being duplicated across multiple pages, and often tries to filter pages out in order to display correct content. Content that is continued on a second page may result in Google seeing content that is significantly similar.
Paginate results in duplicate content on both an articles URL, as well as on an articles URL +/comments-page-1/,/comments-page-2/, etc. Paginate leads to multiple duplicates of the same content across the website, since the article URL appends comment page after comment page. Depending on the content management system (CMS), pagination may be implemented to distribute comments over several pages.
Combine similar pages to avoid duplicates
For example, if you have a travel site that has individual pages about two cities, but there is identical information on both pages, you can combine those pages into one page that covers both cities, or you can extend each page so that it contains unique content for each city. If you have a lot of pages that are similar, consider expanding each page or merging pages into a single one. If you have more than one page that is similar, consider making each content unique, adding valuable content, or consolidating the pages into one where possible. As mentioned earlier, you can take the duplicate content, combine and enhance it, creating one higher-quality, unique content to tell search engines what pages you want to rank.
Crawling of pages
If search engines cannot crawl pages with duplicate content, they cannot automatically determine these URLs are pointing to the same content, and so will actually need to treat these as distinct, unique pages. External duplicate content, also known as cross-domain duplicates, occurs when two or more different domains have identical copies of the page indexed by the search engine. In some cases, content is intentionally duplicated between domains to try and manipulate search engine rankings or gain more traffic.
Same items sold
If a lot of different websites are selling the same items, and all use the manufacturers descriptions for these items, identical content ends up on a lot of places on the Internet. Sometimes, you will find an eCommerce website that makes up a new URL for each different version of that T-shirt…which results in thousands of pages with duplicate content. More than one place is defined as one location that has a unique web address (URL) — so, if the same content appears on more than one web address, you have got duplicate content.
Where you have several categories, such as product categories, you might not describe each product completely enough to the point that there is enough content to generate one webpage, which could then be indexable as its own page. Duplicate content also causes confusion for search engines when determining whether to funnel link metrics such as trusted authority, link equity, and so on, into one page, or to spread link metrics such as trusted authority across several versions. Internal duplicate content does not incur any penalties either, but can greatly hinder your ability to control which of your pages are linked in search results, as we discuss in the next section. With malign content duplication, like plagiarism, there is no real SEO penalty, but in many cases, the website unethically publishing your content will have broken the most important search engine guidelines, which have nothing to do with content duplication.