Just four months after the release of ChatGPT, OpenAI has announced its next-generation artificial intelligence (AI) model — and it’s bigger and better than ever. GPT-4 not only blows ChatGPT out of the water in a range of standardized tests like the SATs and the bar exam, it also adds a key new feature: it can see.
GPT-4 is a “large multimodal model,” meaning it can analyze not only text, but images as well. The exciting addition of “computer vision” allows the AI’s users to input photos and drawings, which the model can analyze and, seemingly, understand.
In a promotional video released by OpenAI, the company claims that, given a photo of helium balloons tied to an anvil, alongside a query such as, ‘What would happen if the strings were cut?’ — GPT-4 would be able to logically deduce that the balloons would fly away.
Though GPT-4 can’t generate images (OpenAI’s DALL-E has that covered) the applications of its computer vision are stunning.
In a live demonstration of the AI’s capabilities, OpenAI’s president and co-founder Greg Brockman showed that the AI could create an entire website basely solely on a hand-drawn note.
GPT-4 can generate and take in up to 25,000 words, which is an eight-fold improvement over ChatGPT. The company says it can be used to assist in “composing songs, writing screenplays, or learning a user’s writing style.”
In a research paper released with the launch of GPT-4, OpenAI shared GPT-4’s scores on a variety of academic tests, including AP exams, the bar, the Graduate Record Examination (GRE), and even sommelier certification exams.
The results show just how far the AI has come since its predecessor GPT-3.5, which powered ChatGPT. While GPT-3.5 scored in the 10th percentile, eking just above 50 per cent on the Uniform Bar Exam, GPT-4 could land itself squarely in the courtroom with a 90th percentile score.
Similar large jumps in test scores were seen for the LSAT, the quantitative and verbal GREs, and Medical Knowledge Self-Assessment Program.
GPT-4’s test scores suggest it excels with basic reasoning and comprehension, but still struggles with creative thought, demonstrated by its poor performance on the AP English Literature and Language exams and the GRE Writing exam. It seems there has been little improvement in this area as those test scores were unchanged from GPT-3.5’s performance.
OpenAI claims that its new model “exhibits human-level performance” while still being “less capable than humans in many real-world scenarios.” The company added that it finished training the new AI last August, but has been withholding it to make the chatbot safer for users.
“GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations,” the company said.
Tempering expectations, the company added that GPT-4 is still prone to “hallucinations” like its predecessors, which describes when the AI produces sentences that are coherent, but are factually inaccurate or not based in reality. OpenAI CEO Sam Altman says the model hallucinates significantly less than GPT-3.5, but the company is looking forward to “feedback on its shortcomings.”
Another way GPT-4 improves on its predecessor is that it now works across dozens of languages. Under testing, GPT-4 performed better on multiple-choice questions in languages such as Latvian, Welsh and Swahili, better than GPT-3.5 could perform in English.
GPT-4 uses a transformer-style architecture in its neural network. The method uses an attention mechanism, loosely inspired by human cognition, that allows the neural network to parse out which pieces of data are more relevant than others, improving the model’s accuracy and cutting down on training time.
The idea of transformers was revolutionized and popularized by Google Brain researchers in a 2017 paper Attention is All You Need. One of the lead researchers of the paper was Aidan Gomez, who is now the CEO of Cohere in Toronto, a Canadian natural language processing company that operates in the same space as OpenAI.
Many of the world’s most important advancements in machine learning, which laid the bedrock for ChatGPT, were pioneered by Canadian scientists.
Three men are lauded as the godfathers of AI and two of them are Canadian: Yoshua Bengio of the Université de Montréal and Geoffrey Hinton of the University of Toronto (U of T). The third, Yann LeCun, is French, but some of his most groundbreaking research was done at Bell Labs and U of T.
In fact, the chief science officer and co-founder of OpenAI, Ilya Sutskever, was educated at U of T and was a PhD student of Hinton’s.
As for Bengio, he’s the most cited computer scientist in the world. When asked if he could draw a direct line from his work to ChatGPT he said, point-blank, “Yeah, definitely.”
Bengio warns of AI’s potential to disrupt the social and economic fabric of the world through job losses, misinformation campaigns and the potential for AI-equipped weapons. He and other scientists have called for greater regulations of AI to ensure its benefits are enjoyed by all.
He also points out that ChatGPT is far from being able to reason like a human, and that such technology is still a ways away. But he is certain that a day will come when humans are able to create an artificial general intelligence with human-level cognition.
“What’s inevitable is that the scientific progress will get there. What is not is what we decide to do with it.”
OpenAI acknowledged the potential for its tool to be used for malicious intent in its GPT-4 research paper, writing, “GPT-4 and successor models have the potential to significantly influence society in both beneficial and harmful ways. We are collaborating with external researchers to improve how we understand and assess potential impacts, as well as to build evaluations for dangerous capabilities that may emerge in future systems.”
The company added that it will publish a follow-up paper with “recommendations on steps society can take to prepare for AI’s effects and initial ideas for projecting AI’s possible economic impacts.”