正文
网络压缩论文整理(network compression)
小程序:扫一扫查出行
【扫一扫了解最新限行尾号】
复制小程序
【扫一扫了解最新限行尾号】
复制小程序
1. Parameter pruning and sharing
1.1 Quantization and Binarization
Compressing deep convolutional networks using vector quantization
Quantized convolutional neural networks for mobile devices
Improving the speed of neural networks on cpus
Deep learning with limited numerical precision
Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
Towards the limit of network quantization
Binaryconnect: Training deep neural networks with binary weights during propagations
Binarynet: Training deep neural net- works with weights and activations constrained to +1 or -1
Xnor-net: Imagenet classification using binary convolutional neural networks
Deep neural networks are robust to weight binarization and other non- linear distortions
Loss-aware binarization of deep networks
Neural networks with few multiplications
1.2 Pruning and Sharing
Comparing biases for minimal network construction with back-propagation
Advances in neural information processing systems 2
Second order derivatives for network pruning: Optimal brain surgeon
Data-free parameter pruning for deep neural networks
Learning both weights and connections for efficient neural networks
Com- pressing neural networks with the hashing trick
Soft weight-sharing for neural network compression
Fast convnets using group-wise brain damage
Less is more: Towards compact cnns
Learning structured sparsity in deep neural networks
Pruning filters for efficient convnets
1.3 Designing Structural Matrix
An exploration of parameter redundancy in deep networks with circulant projections
Fast neural networks with circulant projections
Deep fried convnets
2. Low rank factorization and sparsity
Learning separable filters
Exploiting linear structure within convolutional networks for efficient evaluation
Speeding up convolutional neural networks with low rank expansions
Speeding-up convolutional neural networks using fine-tuned cp- decomposition
Convolutional neural networks with low-rank regularization
Predicting parameters in deep learning
Low-rank matrix factorization for deep neural network training with high-dimensional output targets
3. Transferred/compact convolution filters
Group equivariant convolutional net- works
Doubly convolutional neural networks
Understanding and improving convolutional neural networks via concatenated rectified linear units
Multi-bias non-linear activation in deep neural networks
Exploiting cyclic symmetry in convolutional neural networks
Inception-v4, inception-resnet and the impact of residual connections on learning
Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving
SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND <0.5MB MODEL SIZE
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
4. Knowledge distillation
Model compression
Do deep nets really need to be deep?
Distilling the knowledge in a neural network
Fitnets: Hints for thin deep nets
Bayesian dark knowledge
Face model compression by distilling knowledge from neurons
Net2net: Accelerating learning via knowledge transfer
Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer
5. Other
Dynamic capacity networks
Outrageously large neural networks: The sparsely- gated mixture-of-experts layer
Deep dynamic neural networks for multimodal gesture segmentation and recognition
Deep Networks with Stochastic Depth
Deep pyramidal residual networks with separated stochastic depth
Fast training of convolutional networks through FFTs
Fast algorithms for convolutional neural net- works
S3pool: Pooling with stochastic spatial sampling
6. Survey
- A Survey of Model Compression and Acceleration for Deep Neural Networks