Loading [MathJax]/jax/output/HTML-CSS/config.js
Volume 6, Issue 2
Understanding the Initial Condensation of Convolutional Neural Networks

Zhangchen Zhou, Hanxu Zhou, Yuqing Li & Zhi-Qin John Xu

CSIAM Trans. Appl. Math., 6 (2025), pp. 272-319.

Published online: 2025-05

Export citation
  • Abstract

Previous research has shown that fully-connected neural networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation [T. Luo et al., J. Mach. Learn. Res., 22(1), 2021]. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in the non-linear learning process that enables neural networks to possess better generalization abilities. However, the impact of neural network architecture on this phenomenon remains a topic of inquiry. In this study, we turn our focus towards convolutional neural networks (CNNs) to investigate how their structural characteristics, in contrast to fully-connected networks, exert influence on the condensation phenomenon. We first demonstrate in theory that under gradient descent and the small initialization scheme, the convolutional kernels of a two-layer CNN condense towards a specific direction determined by the training samples within a given time period. Subsequently, we conduct systematic empirical investigations to substantiate our theory. Moreover, our empirical study showcases the persistence of condensation under broader conditions than those imposed in our theory. These insights collectively contribute to advancing our comprehension of the non-linear training dynamics inherent in CNNs.

  • AMS Subject Headings

68U99, 90C26, 34A45

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{CSIAM-AM-6-272, author = {Zhou , ZhangchenZhou , HanxuLi , Yuqing and Xu , Zhi-Qin John}, title = {Understanding the Initial Condensation of Convolutional Neural Networks}, journal = {CSIAM Transactions on Applied Mathematics}, year = {2025}, volume = {6}, number = {2}, pages = {272--319}, abstract = {

Previous research has shown that fully-connected neural networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation [T. Luo et al., J. Mach. Learn. Res., 22(1), 2021]. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in the non-linear learning process that enables neural networks to possess better generalization abilities. However, the impact of neural network architecture on this phenomenon remains a topic of inquiry. In this study, we turn our focus towards convolutional neural networks (CNNs) to investigate how their structural characteristics, in contrast to fully-connected networks, exert influence on the condensation phenomenon. We first demonstrate in theory that under gradient descent and the small initialization scheme, the convolutional kernels of a two-layer CNN condense towards a specific direction determined by the training samples within a given time period. Subsequently, we conduct systematic empirical investigations to substantiate our theory. Moreover, our empirical study showcases the persistence of condensation under broader conditions than those imposed in our theory. These insights collectively contribute to advancing our comprehension of the non-linear training dynamics inherent in CNNs.

}, issn = {2708-0579}, doi = {https://doi.org/10.4208/csiam-am.SO-2024-0011}, url = {http://global-sci.org/intro/article_detail/csiam-am/24087.html} }
TY - JOUR T1 - Understanding the Initial Condensation of Convolutional Neural Networks AU - Zhou , Zhangchen AU - Zhou , Hanxu AU - Li , Yuqing AU - Xu , Zhi-Qin John JO - CSIAM Transactions on Applied Mathematics VL - 2 SP - 272 EP - 319 PY - 2025 DA - 2025/05 SN - 6 DO - http://doi.org/10.4208/csiam-am.SO-2024-0011 UR - https://global-sci.org/intro/article_detail/csiam-am/24087.html KW - Convolutional neural network, dynamical regime, condensation. AB -

Previous research has shown that fully-connected neural networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation [T. Luo et al., J. Mach. Learn. Res., 22(1), 2021]. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in the non-linear learning process that enables neural networks to possess better generalization abilities. However, the impact of neural network architecture on this phenomenon remains a topic of inquiry. In this study, we turn our focus towards convolutional neural networks (CNNs) to investigate how their structural characteristics, in contrast to fully-connected networks, exert influence on the condensation phenomenon. We first demonstrate in theory that under gradient descent and the small initialization scheme, the convolutional kernels of a two-layer CNN condense towards a specific direction determined by the training samples within a given time period. Subsequently, we conduct systematic empirical investigations to substantiate our theory. Moreover, our empirical study showcases the persistence of condensation under broader conditions than those imposed in our theory. These insights collectively contribute to advancing our comprehension of the non-linear training dynamics inherent in CNNs.

Zhou , ZhangchenZhou , HanxuLi , Yuqing and Xu , Zhi-Qin John. (2025). Understanding the Initial Condensation of Convolutional Neural Networks. CSIAM Transactions on Applied Mathematics. 6 (2). 272-319. doi:10.4208/csiam-am.SO-2024-0011
Copy to clipboard
The citation has been copied to your clipboard