Do we understand how neural networks work?

Aug 13, 2025

Yes and no, but mostly no.

4 Comments

The author correctly identifies a real limitation: we have strong formal understanding of training procedures (matrix operations, gradient descent, objective functions), but limited mechanistic understanding of the internal representations that emerge at scale.

However, the conclusion that neural networks are therefore largely opaque or “just statistics” is directionally incomplete.

From a biological and historical perspective, this framing overlooks the continuity between modern neural networks and earlier Connectionist models derived from empirical neurophysiology. Work on retinal signal processing and optic nerve transmission demonstrated that perception is not a simple feedforward encoding problem, but an iterative refinement process shaped by local interactions, context propagation, and state transitions. These dynamics were formally captured using convolutional operators and Markov transition frameworks long before their rebranding as “deep learning.”

In that context, modern architectures are not inexplicable artifacts of gradient descent; they are scaled instantiations of known principles:

• Local receptive fields → convolutional structure

• Context accumulation → attention mechanisms

• State evolution → implicit Markovian dynamics in latent space

Where the author is correct is at the level of global interpretability: the full system state is combinatorially large, and no human can exhaustively trace all interactions. But this is a complexity constraint, not a failure of understanding. We encounter the same limitation in genomics, climate systems, and neural circuitry.

The more precise statement is:

We understand the governing principles and constraints of these systems, but lack full observability of their emergent internal state trajectories at scale.

Finally, describing LLMs as “glorified autocomplete” is technically accurate at the objective-function level, but operationally reductive. Predictive sequence modeling, when embedded in high-dimensional latent spaces, produces structured world models, not mere surface statistics, as evinced by arithmetic behaviors, latent feature circuits, and simulations of agent-like reasoning noted even in the author’s own examples.

In short:

Neural networks are not mysterious in principle; they are biologically inspired dynamical systems whose emergent representations exceed current interpretability tooling.

Robert Shepherd

Feb 21

I am revealing myself to be Very Stupid, but—

I don’t understand what people mean when they say we don’t understand how neural networks work. If you have a vector space with a fantastically large number of dimensions, and use the dimensions to correlate loads of associations in a confusing way… then of course you’ll get something that works like this.

I can see that actually saying very much about how the model does its correlations would be very hard indeed, because there are an unfathomable number of them in a very high number of dimensions. But the basic principle seems intuitive to me?

Maybe this is just because I’m obsessed with evolution— my understanding is that it’s not exactly correct to say LLMs are evolved, but it’s also… not that far away from being correct? There’s an iterative process in the absurdly large space of “all possible correlations” which fits whatever the training data is, and the model converges into it over time. It’s not surprising we don’t understand the details of how it works; neither do the blind forces of nature. But the landscape it’s working in seems easier to grasp

Reply (1)

SE Gyges

Feb 22

People with the right background seem to find this relatively intuitive, the challenge is mostly conveying it to a general audience.

Reply (1)

Robert Shepherd

Feb 22

oh good, I was worried I’d missed something vital

Very Sane AI Newsletter

Do we understand how neural networks work?