Iterative local optimization of normalizing flows

Localized Learning

Iterative local optimization of normalizing flows

Localized Learning

While global end-to-end learning has become the de facto training algorithm, it requires centralized computation and is thus only feasible on a single device or a carefully synchronized cluster. This limits learning on unreliable or limited resources devices, which may have limited connectivity, such as heterogeneous hardware clusters or wireless sensor networks. For example, global learning cannot natively handle hardware or communication faults and may not fit on memory-constrained devices, which could range from a GPU to a tiny sensor. To address these limitations, this project will study the fundamentals of localized learning broadly defined as any training method that updates model parts via non-global objectives.

Topics include but are not limited to decoupled or early-exit training (e.g., [Belilovsky et al., 2020; Xiong et al., 2020; Gomez et al., 2022]), greedy training (e.g., [Löwe et al., 2019; Belilovsky et al., 2019]), iterative layer-wise learning (e.g., [Inouye & Ravikumar, 2018; Zhou et al., 2022; Elkady et al., 2022]), self-learning or data-dependent functions (e.g., batch normalization [Ioffe & Szegedy, 2015]), and non-global training on edge devices (e.g., [Baccarelli et al., 2020]). Out-of-scope topics include data-parallel training including standard federated learning (because the whole model is updated at the same time), alternating or cyclic optimization (because the algorithm uses a global objective), or algorithms that reduce memory requirements but still optimize a global objective (e.g., checkpointing, synthetic gradients, or global model-pipelined training).

References

D. I. Inouye and P. Ravikumar. Deep density destructors. In ICML, 2018.

Z. Zhou, Z. Gong, P. Ravikumar, and D. I. Inouye. Iterative alignment flows. In AISTATS, 2022.

M. Elkady, J. Lim, and D. I. Inouye. Discrete tree flows via tree-structured permutations. In ICML, 2022.

S. Löwe, P. O’Connor, and B. Veeling. Putting an end to end-to-end: Gradient-isolated learning of representations. In NeurIPS, 2019.

E. Belilovsky, M. Eickenberg, and E. Oyallon. Greedy layerwise learning can scale to ImageNet. In *ICML,* 2019.

E. Belilovsky, M. Eickenberg, and E. Oyallon. Decoupled greedy learning of CNNs. In ICML, 2020.

Y. Xiong, M. Ren, and R. Urtasun. LoCo: Local contrastive representation learning. In NeurIPS, 2020.

A. N. Gomez, O. Key, K. Perlin, S. Gou, N. Frosst, J. Dean, and Y. Gal. Interlocking backpropagation: Improving depthwise model-parallelismJournal of Machine Learning Research, 23(171): 1–28, 2022.

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

E. Baccarelli, S. Scardapane, M. Scarpiniti, A. Momenzadeh, and A. Uncini. Optimized training and scalable implementation of conditional deep neural networks with early exits for fog-supported IoT applicationsInformation Sciences, 521:107–143, 2020.

Avatar
David I. Inouye
Assistant Professor

My research interests include distribution alignment, localized learning, and explainable AI.

Publications

While normalizing flows for continuous data have been extensively researched, flows for discrete data have only recently been explored. …

The unsupervised task of aligning two or more distributions in a shared latent space has many applications including fair …

We propose a unified framework for deep density models by formally defining density destructors. A density destructor is an invertible …