Is there a way for Google to detect and tell if a text is being generated and coming from ChatGPT from OpenAI? And would this possible detection lead to a penalty on that text ranking in the Google search results?
Let's start with the basic question first around detection.
Can Google detect and tell if a text is from ChatGPT?
Yes and no; in theory, Google could most likely detect a text is being generated by AI or ChatGPT if they created a system for it.
There is no indication that Google does this when indexing and ranking different content.
In a much-cited interview with John Mueller from Google in April 2022, he is asked whether Google can understand the difference between whether a human or an AI algorithm generates the content, to which he responds, "I can't claim that."
In general, this interview led to huge interest and debate because John Muller also compared AI-generated content to being automatically generated and thereby spam, as was the previous stand of Google in its Google Search Essentials (formerly Webmaster Guidelines).
This is not the stance of Google on AI content anymore.
Why would Google, as a search engine be interested in detecting content coming from ChatGPT?
In short, Google wants to present the best search results to the users possible and is continuously working on improving how to do this. A big part of this optimization is to prevent showing spammy content.
Spammy automatically generated (or "auto-generated") content is content that's been generated programmatically without producing anything original or adding sufficient value; instead, it's been generated for the primary purpose of manipulating search rankings and not helping users.
The reason is that the quality of auto-generated content generated by different types of scrips has been of very low quality and mainly contained keywords and been aimed at manipulating Google's search results.
That's why Google has been trying to detect this type of content to remove it from the SERPs and keep the integrity of its search results.
The take was that by focusing on high-quality, human-generated content Google could provide a better experience for its users and maintain the credibility of its search engine.
But with the rise of the different new types of AI (machine learning, GANs, or what we want to call them), marketers and writers no longer just use these to generate an endless amount of spam content but utilize these systems to create more, better and more helpful content for the users.
This is why the question of whether Google wants to differentiate between human-generated and AI content becomes much less clear.
How could Google detect ChatGPT content?
To understand how Google would spot a text coming from an AI system like ChatGPT, one has to know how these language models work.
A language model works by working with probability and can predict the next word in a sentence based on the words that came before it. This is done by analyzing the patterns and statistics of the language in a large corpus of text and using that information to predict what words are likely to follow a given sequence of words. By doing this, a language model can generate text that sounds natural and coherent, even if it is not always 100% accurate.
There are different examples of demos of these checks on whether a text is generated by artificial intelligence.
One of the more advanced ones I have seen is the GLTR (Giant Language model Test Room), which you can try for free here: http://gltr.io/dist/index.html
The GLTR is a tool that enables forensic inspection of text to detect whether it could be written by a person or an AI.
It was developed by a team of researchers from the MIT-IBM Watson AI lab and Harvard NLP and allows users to analyze the visual footprint of a language model on an input text.
With the GLTR, each text is analyzed based on the likelihood that each word would be the predicted word given the context to the left. If the actual word used would be in the top 10 predicted words, the background is colored green; for the top 100 expected words, it is colored yellow; for the top 1000, it is colored red; otherwise, it is colored violet.
The GLTR offers the following features:
- The ability to try sample texts and see if you can spot the difference between machine-generated and human-generated text
- Histograms that show statistics about the text, including the fraction of probability for the actual word divided by the maximum probability of any word at a given position, and the entropy along the top 10 results for each word.
Another script detecting AI written content is the "GPT-2 Output Detector Demo" from Huggingface.
You can try it here: https://huggingface.co/openai-detector/
Is worth noting that these examples and detectors are built on the GPT-2 NLG model. As ChatGPT is built on the more advanced GPT-3 it will be more difficult to detect (it still seems to detect GPT-3 though).
And as these models develop they will be more and more advanced and thereby more and more have the resemblance of human writing. In the GLTR demo, this would mean histograms that would look the same whether a human or an AI wrote them.
This could also be why OpenAI and other similar platforms are working on ways to allow players like Google and other search engines to identify if a text is coming from AI by implementing a sort of watermark in the text.
This embedded secret signal in the text generated would then indicate the source.
The main reason for this would not be to just allow checking for a text generated mainly for SEO purposes - like if someone uses ChatGPT for SEO - but to prevent plagiarism by content-rewriting or impersonation by hijacking the writing style of others.
What Does Google Say?
Google has not officially announced that they are detecting whether the content is written by AI vs a real person.
But in the discussion there have been around AI-generated content, even before ChatGPT was introduced, they have stated that they are not per se against AI content but rather focusing on whether it is helpful content or not.
This is great in line with the long history of the search giants' effort to combat spam and content that is purely generated with SEO in mind, rather than the user.
In general, Google is one of the most AI-positive companies in the world and has for many years been among the companies investing most in AI research.
“It (AI) can make humans more productive than we have ever imagined”
Sundar Pichai, CEO of Google’s owner Alphabet
What Do SEO Experts Say?
There is a considerable debate among SEO marketers and experts. The argument is whether Google can detect ChatGPT-generated content and if it affects rankings.
There are three different viewpoints:
Viewpoint 1) It can be detected and it will have a negative impact
Some experts believe that Google can detect ChatGPT content and that it will have a negative effect on rankings. They argue that Google can detect the unnatural language patterns that are generated by ChatGPT and that this will be seen as spammy content.
Viewpoint 2) It can't be detected and thereby will have no impact
On the other hand, some believe that Google is not able to detect ChatGPT content and that it will not have any effect on rankings. They argue that the content generated by ChatGPT is indistinguishable from content written by a real person and that Google will not be able to tell the difference.
Viewpoint 3) It can be detected but won't affect as long as its helpful content
The third perspective is - and the one we subscribe to here at SEO.ai - that it does not matter whether Google can or is detecting AI-generated content or not.
What matters is whether the content generated by a human or an artificial intelligence algorithm is helpful to the users. This is the content that Google wants to promote in its search result (SERPs) as it best answers the user's query.
At this point, it is impossible to know which side is right and how it affects SEO. Google has not officially announced that it is detecting whether AI or a real person writes the content. Until they do, the debate will continue.
To sum up the above:
- Google may be able to detect a text generated by ChatGPT
- Currently, there is no indication that Google does this
- Google wants to present the best search results to users and prevent spammy content
- Google's stance on AI content has changed
- Google could potentially detect ChatGPT content by analyzing patterns and statistics of language in a text
- The GLTR is an example of a method for detecting whether a person or an AI writes the text.