To me, one of the most interesting things about modern AI systems is that we really don't have a firm understanding of how they work. As usual per my recent posts I'm going to let ChatGPT-4 explain in more detail. Here's my query:
Although researchers have made significant advancements in understanding deep learning and large language models, there is still a substantial amount of trial and error involved in finding the optimal model architectures. While certain architectural patterns and structures have been shown to work well in practice, a complete understanding of why they work better than others is still an active area of research.
Several factors contribute to the current state of understanding:
Intuition and heuristics: Researchers often rely on intuition and heuristics when designing model architectures. They build upon successful structures from prior research and make modifications to address specific challenges or improve performance. However, this process does not always result in a deep understanding of why a particular structure works better than another.
Complexity and scale: Deep learning models, especially large language models, can be incredibly complex, with millions or even billions of trainable parameters. The sheer scale of these models makes it challenging to develop a comprehensive understanding of their inner workings.
Empirical results: Researchers often evaluate model architectures based on their empirical performance on specific tasks or benchmarks. While this approach can identify architectures that work well in practice, it may not provide a clear theoretical understanding of the underlying reasons for their success.
Lack of a unified theory: Deep learning currently lacks a unified theoretical framework that can comprehensively explain the behavior of different model architectures. Researchers are still working to develop such a framework, which would provide a more systematic understanding of why certain structures work better than others.
Despite these challenges, researchers are actively working to improve their understanding of deep learning models and their architectures. New techniques, such as model visualization, interpretability research, and information-theoretic analysis, are being developed to shed light on the inner workings of these models. As the field of deep learning continues to mature, it is likely that researchers will gain a deeper understanding of why certain model structures work better than others, resulting in more systematic approaches to architecture design.
"They build upon successful structures..." Sounds rather like evolution by trial and error to me.
We have created useful immensely complex entities with mostly guesses and "intuition" of why they work. I once believed that until we understood how the human brain worked, we wouldn't be able to make a lot of progress creating AI systems. But "nature" didn't need to know how brains work in order to cause the existence of a human brain. And it turns out that we don't need to know how the internals of the AI systems work in order to create such systems.
Trial and error gets you a long way with sufficient computing power.