Search This Blog

Wednesday, April 05, 2023

Large Language Models Are Rapidly Approaching an Important Threshold

In a previous post from a few years ago, I presented the graph below, titled "ImageNet Large Scale Visual Recognition Challenge Results." The graph illustrates the progress made in accuracy from 2010 to 2017 for a task that is now considered relatively simple in machine vision: analyzing an image and identifying the objects within it (e.g., car, gazelle, daisy, spoon, etc.). The participants in the contest were various Artificial Intelligence/Machine Vision groups, including numerous universities and companies like Google. As can be observed, the results in 2010 were quite disappointing, with even the best team frequently misidentifying the objects in the images. I recall amusingly thinking to myself when I first saw those results in 2010, "they still have a long way to go before they have something useful."

However, by 2017, these systems had surpassed human capabilities (after all, humans make mistakes too), and by 2020, at my company, we had transitioned all of our products—both deployed and under development—to utilize these AI systems. This was due to their remarkable performance and significantly streamlined nature compared to our previous code.

 

 

In a closely related domain, that of AI-based reading and writing, a system called GPT-4 was introduced. Accompanying its release was a research paper, which featured the following chart:


GPT is following a similar trajectory to the ImageNet challenge described above. It currently makes too many mistakes to be relied upon. However, assuming its current trajectory persists, I expect GPT to surpass a similar accuracy threshold as the ImageNet challenge in the not-too-distant future, making it comparable to human performance.

My guess is that threshold will be crossed by the end of this decade.