Since ChatGPT’s launch, there has been a heated debate over the “fair use” of public website content for AI training, and whether this is plagiarism.
The debate has escalated since OpenAI announced its ChatGPT plugin on March 23rd.
One of OpenAI’s plugins is the official web browser hosted by ChatGPT. This allows the model to read information directly from the internet.
It’s worth repeating here because the examples of daily posts and tweets and prompts argue otherwise.
Current instance of ChatGPT Can not Access anything on the Internet.
We do not use databases or store website content the way search engines do with their indexes.
What this means is that without the plugin, ChatGPT is still stuck in 2021, predicting the next word based on old training data.
Even the current Bing implementation (briefly) takes a keyword from a prompt, performs a Bing search, feeds the results it shows for that keyword, and “summarizes” those results to the AI. I am asking for
And that’s how plugins change everything.
ChatGPT will soon be able to feed content from third-party websites for AI to summarize or manipulate, much like Bing does.
Many third-party plugins and tools can already scrape content from websites, feed it into prompts to OpenAI APIs, and summarize or manipulate that text.
However, using official web browser plugins greatly increases this usage.
You can block OpenAI’s ChatGPT-User bot
OpenAI has provided more details about the bots, including how to block them.
Note that OpenAI follows the robot protocol and behaves like any other bot. Assume content can be accessed unless otherwise specified in the robots.txt file.
OpenAI and ChatGPT don’t crawl the web like search engines do. And as far as we know, they haven’t used this data for training (yet?). All requests are the result of direct requests from users.
Another interesting fact: we use the Bing search API to do this. This probably means that if Bing can’t display the website’s content, neither can ChatGPT.
This leads me to a question I see a lot these days.
Should I block access to my website by OpenAI bots?
The citation/plagiarism/source/copyright debate has been raging for some time and could easily take 20,000 words to delve into.
My short answer: no.
Most websites should not block access to websites by AI. Let’s dig deeper into why.
take a wait-and-see approach
New technologies should not be blocked until we have enough data to make informed decisions.
Sure, there could be copyright issues, but AI plugins could also be a new source of discovery and traffic.
OpenAI says the plugin will cite sources when retrieving data from third-party websites. This means that if a user pulls in your content, they are definitely likely to get clicks from ChatGPT.
Blocking access only means that ChatGPT (or a user) quotes someone else’s website.
Many people in this discussion start with the erroneous assumption that if they can’t get content from ChatGPT, they need to go to the website.
I do not think so. The reality is that you get content from your competitors.
Given the number of people using ChatGPT to create content these days, if someone used this tool to get content from a website, there’s a good chance they’d link the output to a place to post it. If you block, you will miss this chance.
need to think long term
I remember having a similar conversation about iPhone apps and the app store, which first appeared in 2008.
App stores have changed the interface of mobile phones. Sure, apps can (and still can) do most of what a website can do, but the app store is where people find and discover websites.
AI has a similar effect in modifying the user interface of the Internet.
This is not going to kill search engines.
But AI will be a new starting point for many web users. This could be your plugin’s only chance to reach these users.
Just like search, social, retail platforms and app stores, we need to start thinking of AI as a new acquisition channel.
It was last week that I started thinking about strategies for AI and AI plugins. Most marketers are already behind, but it’s never too late.
The opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.