The Importance of Forgetting in Artificial and Animal Intelligence

Similar to biological networks, deep neural networks (DNNs) exhibit critical learning periods. Compromised data during this early stage can lead to a state where it is impossible for the networks to recover. This is an excellent post from Amazon Science on how “forgetting” appears to be an essential part of the learning process in both artificial and biological systems.
How Do Deep Neural Networks Learn?
DNNs have approached human-like performance in niche learning tasks from recognizing speech to finding objects in images. But how does a DNN learn? What “information” does it contain? How is such information represented, and where is it stored?
To frame these questions mathematically, researchers had to form a viable definition of “information” in deep networks. Traditional information theory is built around Claude Shannon’s idea to quantify how many bits are needed to send a message. But as Shannon himself noted, this is a measure of information for communication. When applied to measure how much information a DNN has in its weights about the task it is trying to solve, it has the unwelcome tendency to give degenerate nonsensical values.
This paradox led to the introduction of a more general notion of the information Lagrangian — which defines information as the trade-off between how much noise could be added to the weights between layers and the resulting accuracy of its input-output behavior. Intuitively, even if a network is very large, if we can replace most computations with random noise and still get the same output, then the DNN does not actually contain that much information.
The Surprising Learning Curve
As learning progresses, one would expect the amount of information stored in the weights of the network to increase monotonically: the more you train, the more you learn. However, the information in the weights follows a completely different path:
- First, the information contained in the weights increases sharply, as if the network was trying to acquire information about the data set.
- Following this, the information in the weights drops — almost as though the network was “forgetting”, or shedding information about the training data.
Amazingly, such forgetting occurs while performance in the learning task continues to increase!
When shared with biologists, they were not surprised. In biological systems, forgetting is an important aspect of learning. Animal brains have a bounded capacity — there is an ongoing need to forget useless information and consolidate useful information. However, DNNs are not biological in nature. There is no apparent reason why memorizing first, and then forgetting, should be beneficial.
Critical Learning Periods
Biological networks have another fundamental property: they lose their plasticity over time. If people do not learn a skill (say, seeing or speaking) during a critical period of development, their ability to learn that skill is permanently impaired. This is common in humans, where failure to correct visual defects early enough during childhood can result in lifelong amblyopia — impaired vision in one eye, even if the defect is later corrected. The importance of the critical learning period is especially pronounced in the animal kingdom — for example, it is vital for birds developing the ability to sing.
Researchers repeated a classical experiment of neuroscience pioneers Hubel and Wiesel, who in the 1950s and 1960s studied the effect of temporary visual deficit in cats after birth. They “blindfolded” the DNNs by blurring the training images at the beginning of training, then let the network train on clear images. The deficiency introduced in the initial period resulted in permanent deficit (classification accuracy loss), no matter how much additional training the network performed.
In other words, DNNs exhibit critical learning periods just like biological systems. If we mess with the data during the “information acquisition” phase, the network gets into a state from which it cannot recover. Altering the data after this critical period has no effect.
Information Plasticity
Through “artificial neural recording” — measuring the information flow among different neurons — researchers found that during the critical period, the way information flows between layers is fluid. However, after the critical period, these pathways become fixed.
Unlike neural plasticity, a DNN exhibits some form of “information plasticity”, where the ability to process information is lost during learning. But rather than being a consequence of aging or some complex biochemical phenomenon, this “forgetting” appears to be an essential part of learning. This is true for both artificial and biological systems.
Practical Applications: Task2Vec
These findings have practical implications. Task2Vec is a method for transforming learning tasks into vectors, so they can be compared, clustered, and selected based on neighborhood criteria. The amount of information needed to fine-tune one model from another serves as a distance between the tasks the two models represent. This means we can now measure how difficult it would be to fine-tune a given model for a given task.
This research is already making its way into products like Amazon Rekognition Custom Labels, where customers can provide a few sample images of objects, and the system learns a model to detect and classify them in never-before-seen images.
AI is truly in its infancy. The depth of the intellectual questions raised by the field is invigorating. For now, there’s consolation for those of us aging and beginning to forget things — we can take comfort in the knowledge that we are still learning.