“What’s surprising about these great language models is how well they know how the world works just by reading whatever they can find,” says Chris Manning, professor at Stanford specializing in AI and language.
But GPT and his ilk are essentially very talented statistical parrots. They learn to recreate the word and grammar patterns found in the language. It means they can let go of nonsense, extremely inaccurate facts, and hateful language scraped from the darkest corners of the canvas.
Amnon Shashua, professor of computer science at the Hebrew University of Jerusalem, is the co-founder of another startup that is building a bilingual AI model based on this approach. He knows a thing or two about marketing AI, having sold his last business, Mobileye, which was the first to use AI to help cars spot things on the road, to Intelligence in 2017 for $ 15.3 billion.
Shashua’s new business, AI21, who came out of stealth last week, developed an AI algorithm, called Jurassic-1, which demonstrates remarkable language skills in English and Hebrew.
In demos, Jurassic-1 can generate paragraphs of text on a given topic, come up with catchy titles for blog posts, write simple pieces of computer code, and more. Shashua says the model is more sophisticated than GPT-3, and he thinks future versions of Jurassic might be able to build some sort of common sense understanding of the world from the information it collects.
Further efforts to recreate GPT-3 reflect the diversity of languages around the world and on the Internet. In April, researchers from Huawei, the Chinese tech giant, published details of a GPT-type Chinese language model called PanGu-alpha (written PanGu-α). In May, Naver, a South Korean search giant, said it has developed its own language model, called HyperCLOVA, which “speaks” Korean.
Jie tang, professor at Tsinghua University, leads a team at the Beijing Artificial Intelligence Academy who developed another Chinese language model called Wudao (meaning “enlightenment”) with help from government and industry.
The Wudao model is considerably larger than any other, which means its simulated neural network is spread across more cloud computers. Increasing the size of the neural network was essential to make GPT-2 and -3 more efficient. Wudao can also work with images and text, and Tang has founded a company to market it. “We believe this can be the cornerstone of any AI,” Tang said.
Such enthusiasm seems justified by the capabilities of these new AI programs, but the race to commercialize such language models may also go faster than efforts to add safeguards or limit abuse.
Perhaps the most pressing concern about AI language models is how they might be misused. Since templates can produce compelling text on a topic, some people worry that they could easily be used to generate fake reviews, spam, or fake news.
“I would be surprised if the disinformation operators didn’t invest at least some serious energy in experimenting with these models,” says Micah Musser, a research analyst at Georgetown University who studied the potential of linguistic models to disseminate disinformation.
Musser says research suggests it won’t be possible to use AI to detect AI-generated misinformation. There is unlikely to be enough information in a tweet for a machine to judge whether it was written by a machine.
More problematic types of prejudice can also be hidden within these gigantic language models. Research has shown that language models formed on Chinese Internet content will reflect censorship who shaped this content. The programs also inevitably capture and reproduce subtle and overt biases regarding race, gender and age in the language they use, including hateful statements and ideas.
Likewise, these great linguistic models can fail in surprising or unexpected ways, adds Percy Liang, another professor of computer science at Stanford and the principal investigator at a new center dedicated to studying the potential of powerful general purpose AI models like GPT-3.