Abstract
Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of high dimension and small sample; (2) appropriate use of geometry of covariances. To address these challenges, we propose a global Matrix Power Normalized COVariance (MPN-COV) Pooling. Our MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample. It can also be regarded as power-Euclidean metric between covariances, effectively exploiting their geometry. Furthermore, a global Gaussian embedding method is proposed to incorporate first-order statistics into MPN-COV. For fast training of MPN-COV networks, we propose an iterative matrix square root normalization, avoiding GPU unfriendly eigen-decomposition inherent in MPN-COV. Additionally, progressive 1x1 and group convolutions are introduced to compact covariance representations. The MPN-COV and its variants are highly modular, readily plugged into existing deep CNNs. Extensive experiments are conducted on large-scale object classification, scene categorization, fine-grained visual recognition and texture classification, showing our methods are superior to the counterparts and achieve state-of-the-art performance.
References
[MPN-COV_ICCV17] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? IEEE Int. Conf. on Computer Vision (ICCV), pp. 2070-2078, 2017.[Fast_MPN-COV_CVPR18] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 947-955, 2018.
[RAID-G_CVPR16] Qilong Wang, Peihua Li, Wangmeng Zuo, Lei Zhang. RAID-G: Robust Estimation of Approximate Infinite Dimensional Gaussian with Application to Materiel Recognition. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4433-4441, 2016.
[G2DeNet_CVPR17] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2730-2739, 2017. (Oral presentation)
[L2EMG_PAMI17] Peihua Li, Qilong Wang, Hui Zeng, Lei Zhang. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 39(4): 803-817, 2017.
Implementation
Implementation of [MPN-COV_ICCV17]
We compute matrix power via EIGen-decompostion via methodology of matrix back-propagation. Through EIG, matrix power is transformed to power of eigenvalues. We implement MPN-COV with MatConvNet package, and the source code is available at https://github.com/jiangtaoxie/MPN-COV-ConvNet.
Implementation of [G2DeNet_CVPR17]
We propose to perform both global covariance pooling and global average pooling, in the form of Gaussian modelling, in deep CNNs. Based on Lie group theory on the manifold of Gaussian distributions [L2EMG_PAMI17], one Gaussian can be uniquely identified as a square root of a positive definite matrix of the covariance matrix and mean vector, which can be inserted into a CNN.
Implementation of [Fast_MPN-COV_CVPR18]
We propose a fast MPN-COV method for computing matrix square root. The key is directed acyclic graph with iteration, where pre-normalization guarantees convergence of follow-up Newton-Schulz formula, while post-compensation recovers data scale induced by pre-norm. We implement fast MPN-COV with several deep learning framework, e.g., PyTorch, TensorFlow and MatConvNet. The source code is available at https://github.com/jiangtaoxie/fast-MPN-COV.