💡
2023 Seminar Series
We have exciting changes pertaining to the ContinualAI seminars for 2023! Join us live for 60 minutes of discussion each month with 3 different speakers.

Continual evaluation for lifelong learning: Identifying the stability gap (ICLR 2023 Spotlight Paper): Time-dependent data-generating distributions have proven to be difficult for gradient-based training of neural networks, as the greedy updates result in catastrophic forgetting of previously learned knowledge. Despite the progress in the field of continual learning to overcome this forgetting, we show that a set of common state-of-the-art methods still suffers from substantial forgetting upon starting to learn new tasks, except that this forgetting is temporary and followed by a phase of performance recovery. We refer to this intriguing but potentially problematic phenomenon as the stability gap. The stability gap had likely remained under the radar due to standard practice in the field of evaluating continual learning models only after each task. Instead, we establish a framework for continual evaluation that uses per-iteration evaluation and we define a new set of metrics to quantify worst-case performance. Empirically we show that experience replay, constraint-based replay, knowledge-distillation, and parameter regularization methods are all prone to the stability gap; and that the stability gap can be observed in class-, task-, and domain-incremental learning benchmarks. Additionally, a controlled experiment shows that the stability gap increases when tasks are more dissimilar. Finally, by disentangling gradients into plasticity and stability components, we propose a conceptual explanation for the stability gap.
Lifelong Learning in the Clinical Open World: Deep Learning algorithms for medical use cases are mainly evaluated in static settings. Yet real clinics are dynamic environments where an array of factors, from disease patterns to acquisition practices, change over time. This disparity causes unexpected performance degradations during deployment. In this talk, I outline common causes of data drift for computer tomography and magnetic resonance images and introduce continual learning solutions that are effective for semantic segmentation. In addition, I give an overview of current factors limiting model adaptation in diagnostic radiology, including the current regulatory landscape and lack of prospective validation, and summarize my recommendations for safely approving and monitoring lifelong learning systems.
Emerging topics in continual learning for computational biology: Computational Biology is the science of using mathematics, statistics and computer science to model biological data. Single cell genomics is a field of Computational Biology that concerns modelling and studying diseases and biological systems at the single cell resolution, to unravel changes in the cellular states during disease progression and empower targeted therapies and drug discovery. Biological data produced by single cell technologies are increasing in complexity, sample size might be small or limited and they often suffer from distributional shifts. The community though has been curating large collections of cells in healthy and non-healthy states. A promising direction is, therefore, to learn models of biological system of interest from an existing data-rich or relevant domain. Because of its promises on learning on non-iid data arriving sequentially, there are many opportunities for continual learning in single cell computational biology. In this talk, I will describe some of the emerging biological problems in single cell genomics that can be formulated as a continual learning problem and provide examples where continual learning can be used to make biomedical discoveries.
S-Prompts Learning with Pre-trained Transformers: An Occam’s Razor for Domain Incremental Learning: State-of-the-art deep neural networks are still struggling to address the catastrophic forgetting problem in continual learning. In this paper, we propose one simple paradigm (named as S-Prompting) and two concrete approaches to highly reduce the forgetting degree in one of the most typical continual learning scenarios, i.e., domain increment learning (DIL). The key idea of the paradigm is to learn prompts independently across domains with pre-trained transformers, avoiding the use of exemplars that commonly appear in conventional methods. This results in a win-win game where the prompting can achieve the best for each domain. The independent prompting across domains only requests one single cross-entropy loss for training and one simple K-NN operation as a domain identifier for inference. The learning paradigm derives an image prompt learning approach and a novel language-image prompt learning approach. Owning an excellent scalability (0.03% parameter increase per domain), the best of our approaches achieves a remarkable relative improvement (an average of about 30%) over the best of the state-of-the-art exemplar-free methods for three standard DIL tasks, and even surpasses the best of them relatively by about 6% in average when they use exemplars.
Few-Shot Continual Active Learning by a Robot: Most continual learning methods proposed in the literature are focused on task-based continual learning setups. In this setup, a CL model learns a sequence of tasks, one at a time, with all data of the current task labeled and available in an increment, but not of previous or future tasks. This setup, however, is rarely encountered in real-world robotics applications, where a robot might get limited supervision from its users to learn new tasks. Therefore, in this paper, we consider a challenging but realistic continual learning problem, Few-Shot Continual Active Learning (FoCAL), where a CL agent is provided with unlabeled data for a new or a previously learned task in each increment and the agent only has limited labeling budget available. Towards this, we build on the continual learning and active learning literature and develop a framework that can allow a CL agent to continually learn new object classes from a few labeled training examples. Our framework represents each object class using a uniform Gaussian mixture model (GMM) and uses pseudo-rehearsal to mitigate catastrophic forgetting. The framework also uses uncertainty measures on the Gaussian representations of the previously learned classes to find the most informative samples to be labeled in an increment. We evaluate our approach on the CORe-50 dataset and on a real humanoid robot for the object classification task. The results show that our approach not only produces state-of-the-art results on the dataset but also allows a real robot to continually learn unseen objects in a real environment with limited labeling supervision provided by its user.
SparCL: Sparse Continual Learning on the Edge: Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning (SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.
Last modified 2d ago