AutoML | Literature on Neural Architecture Search

@phdthesis{YuPhD2021,

title = {From Human-Designed Convolutional Neural Networks Towards Robust Neural Architecture Search },

author = {Yu, Kaicheng},

url = {https://infoscience.epfl.ch/record/287432},

doi = { 10.5075/epfl-thesis-8035},

year  = {2021},

date = {2021-07-01},

abstract = {Artificial intelligence has been an ultimate design goal since the inception of computers decades ago. Among the many attempts towards general artificial intelligence, modern machine learning successfully tackles many complex problems thanks to the progress in deep learning, and in particular in convolutional neural networks (CNN). To design a CNN for a specific task, one common approach consists of adapting the heuristics from the pre-deep-learning era to the CNN domain. In the first part of this thesis, we introduce two methods that follow this approach: i) We build a covariance descriptor, i.e., a local descriptor that is suitable for texture recognition, to replace the first-order fully connected layers in an ordinary CNN, showing that such a descriptor yields state-of-the-art performance on many fine-grained image classification tasks with orders of magnitude fewer feature dimensions; ii) we develop a light-weight recurrent U-Net for image semantic segmentation, inspired by the biological eye saccadic movements, that yields real-time predictions on devices with limited computational resources. As most methods pre-dating automatic machine learning (AutoML), the two above-mentioned CNNs were human-designed. In the past few years, however, neural architecture search~(NAS), which aims to facilitate the design of deep networks for new tasks, has drawn an increasing attention. In this context, the weight-sharing approach, which consists of utilizing a super-net to encompass all possible architectures within a search space, has become a de facto standard in NAS because it enables the search to be done on commodity hardware. In the second part of this thesis, we then provide an in-depth study of recent weight-sharing NAS algorithms. First, we discover a phenomenon in the weight-sharing NAS training pipeline, which we dub multi-model forgetting, that negatively impacts the super-net quality, and propose a statistically motivated approach to address it. Subsequently, we find that (i) on average, many popular NAS algorithms perform similarly to a random architecture sampling policy; (ii) the widely-adopted weight sharing strategy degrades the ranking of the NAS candidates to the point of not reflecting their true performance, thus reducing the effectiveness of the search process. We then further decouple weight sharing from the NAS sampling policy, and isolate 14 factors that play a key role in the success of super-net training. Finally, to improve the super-net quality, we propose a regularization term that aims to maximize the correlation between the performance rankings of the super-net and of the stand-alone architectures using a small set of landmark architectures.},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

Artificial intelligence has been an ultimate design goal since the inception of computers decades ago. Among the many attempts towards general artificial intelligence, modern machine learning successfully tackles many complex problems thanks to the progress in deep learning, and in particular in convolutional neural networks (CNN). To design a CNN for a specific task, one common approach consists of adapting the heuristics from the pre-deep-learning era to the CNN domain. In the first part of this thesis, we introduce two methods that follow this approach: i) We build a covariance descriptor, i.e., a local descriptor that is suitable for texture recognition, to replace the first-order fully connected layers in an ordinary CNN, showing that such a descriptor yields state-of-the-art performance on many fine-grained image classification tasks with orders of magnitude fewer feature dimensions; ii) we develop a light-weight recurrent U-Net for image semantic segmentation, inspired by the biological eye saccadic movements, that yields real-time predictions on devices with limited computational resources. As most methods pre-dating automatic machine learning (AutoML), the two above-mentioned CNNs were human-designed. In the past few years, however, neural architecture search~(NAS), which aims to facilitate the design of deep networks for new tasks, has drawn an increasing attention. In this context, the weight-sharing approach, which consists of utilizing a super-net to encompass all possible architectures within a search space, has become a de facto standard in NAS because it enables the search to be done on commodity hardware. In the second part of this thesis, we then provide an in-depth study of recent weight-sharing NAS algorithms. First, we discover a phenomenon in the weight-sharing NAS training pipeline, which we dub multi-model forgetting, that negatively impacts the super-net quality, and propose a statistically motivated approach to address it. Subsequently, we find that (i) on average, many popular NAS algorithms perform similarly to a random architecture sampling policy; (ii) the widely-adopted weight sharing strategy degrades the ranking of the NAS candidates to the point of not reflecting their true performance, thus reducing the effectiveness of the search process. We then further decouple weight sharing from the NAS sampling policy, and isolate 14 factors that play a key role in the success of super-net training. Finally, to improve the super-net quality, we propose a regularization term that aims to maximize the correlation between the performance rankings of the super-net and of the stand-alone architectures using a small set of landmark architectures.

Close

1625.

Coquelin, Daniel; Sedona, Rocco; Riedel, Morris; Götz, Markus

Evolutionary Optimization of Neural Architectures in Remote Sensing Classification Problems Technical Report

2021.

Links | BibTeX

1624.

Jin, Yaochu; Wang, Handing; Sun, Chaoli

Surrogate-Assisted Evolutionary Neural Architecture Search Proceedings Article

In: Data-Driven Evolutionary Optimization, pp. 373-387, 2021, ISBN: 978-3-030-74640-7.

Links | BibTeX

1623.

Zhu, Wei; Ni, Yuan; Wang, Xiaoling; Xie, Guotong; Zhang, Fang

Discovering Better Model Architectures for Medical Query Understanding Proceedings Article

In: Proceedings of NAACL HLT 2021: IndustryTrack Papers, pp. 230-237, 2021.

Links | BibTeX

1622.

Tsokov, Stefan; Lazarova, Milena; Aleksieva-Petrova, Adelina

An Evolutionary Approach to the Design of Convolutional Neural Networks for Human Activity Recognition Journal Article

In: Indian Journal of Computer Science and Engineering, 2021.

Links | BibTeX

1621.

Geraeinejad, V.; Seenan, S.; Modarressi, M.; Daneshtalab, M.

RoCo-NAS : Robust and Compact Neural Architecture Search Proceedings Article

In: The international joint conference on neural networks IJCNN, 2021.

Links | BibTeX

1620.

Liu, Shaoli; Zheng, Chengjian; Lu, Kaidi; Gao, Si; Wang, Ning; Wang, Bofei; Zhang, Diankai; Zhang, Xiaofeng; Xu, Tianyu

EVSRNet: Efficient Video Super-Resolution With Neural Architecture Search Proceedings Article

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2480-2485, 2021.

Links | BibTeX

1619.

Ma, Xiaohan; Si, Chang; Wang, Ying; Liu, Cheng; Zhang1, Lei

NASA: Accelerating Neural Network Design with a NAS Processor Proceedings Article

In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021.

Links | BibTeX

1618.

Dai, Xiaoliang; Wan, Alvin; Zhang, Peizhao; Wu, Bichen; He, Zijian; Wei, Zhen; Chen, Kan; Tian, Yuandong; Yu, Matthew; Vajda, Peter; Gonzalez, Joseph E.

FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining Book Section

In: CVPR2021, 2021.

Links | BibTeX

1617.

Huang, Shenyang; Francois-Lavet, Vincent; Rabusseau, Guillaume

Understanding Capacity Saturation in Incremental Learning Proceedings Article

In: The 34th Canadian Conference on Artificial Intelligence, 2021.

Links | BibTeX

1616.

Chen, Wuyang; Gong, Xinyu; Wang, Zhangyang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective Proceedings Article

In: ICLR 2021, 2021.

Links | BibTeX

1615.

Yang, Yuxuan; Gao, Zhongke; Li, Yanli; Wang, He

A CNN identified by reinforcement learning-based optimization framework for EEG-based state evaluation Journal Article

In: Journal of Neural Engineering, vol. 18, no. 4, pp. 046059, 2021.

Abstract | Links | BibTeX

@article{Yang_2021,

title = {A CNN identified by reinforcement learning-based optimization framework for EEG-based state evaluation},

author = {Yuxuan Yang and Zhongke Gao and Yanli Li and He Wang},

url = {https://doi.org/10.1088/1741-2552/abfa71},

doi = {10.1088/1741-2552/abfa71},

year  = {2021},

date = {2021-05-01},

journal = {Journal of Neural Engineering},

volume = {18},

number = {4},

pages = {046059},

publisher = {IOP Publishing},

abstract = {Objective. Electroencephalogram (EEG) data, as a kind of complex time-series, is one of the most widely-used information measurements for evaluating human psychophysiological states. Recently, numerous works applied deep learning techniques, especially the convolutional neural network (CNN), into EEG-based research. The design of the hyper-parameters of the CNN model has a great influence on the performance of the model. Therefore, automatically designing these hyper-parameters can save the time and labor of experts. This leads to the appearance of the neural architecture search technique. In this paper, we propose a reinforcement learning (RL)-based step-by-step framework to efficiently search for CNN models. Approach. Specifically, the deep Q network in RL is first used to determine the depth of convolutional layers and the connection modes among layers. Then particle swarm optimization algorithm is used to fine-tune the number and size of convolution kernels. Through this step-by-step strategy, the search space can be narrowed in each step for saving the overall time cost. This framework is employed for both EEG-based sleep stage classification and driver drowsiness evaluation tasks. Main results. The results show that compared with state-of-the-art methods, the high-performance CNN models identified by the proposed optimization framework, can achieve high overall accuracy and better root mean squared error in the two tasks. Significance. Therefore, the proposed optimization framework has a great potential to provide high-performance results for other kinds of classification and prediction tasks. In this way, it can greatly save researchers’ time cost and promote broader applications of CNNs.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

1614.

Schoenherr, Georg P.

The Nonlinearity Coefficient - A Practical Guide to Neural Architecture Design PhD Thesis

2021.

Links | BibTeX

1613.

Zhang, Yi; Liu, Yang; Liu, X. Shirley

Neural network architecture search with AMBER Journal Article

In: Nature Machine Intelligence, pp. 372-373, 2021.

Links | BibTeX

1612.

Yuan, Zhihang; Liu, Jingze; Li, Xingchen; Yan, Longhao; Chen, Haoxiang; Wu, Bingzhe; Yang, Yuchao; Sun, Guangyu

NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators Journal Article

In: Science China Information Sciences, 2021.

Links | BibTeX

1611.

Perenda, Erma; Rajendran, Sreeraj; Bovet, Gerome; Pollin, Sofie; Zheleva, Mariya

Evolutionary Optimization of Residual Neural Network Architectures for Modulation Classification Miscellaneous

2021.

Links | BibTeX

1610.

Wang, Xiaobo

Teacher Guided Neural Architecture Search for Face Recognition Journal Article

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 2817-2825, 2021.

Links | BibTeX

1609.

Anwar, Abrar

Evolving Spiking Circuit Motifs Using Weight Agnostic Neural Networks Journal Article

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 18, pp. 15956-15957, 2021.

Links | BibTeX

1608.

Pan, Zheyi; Ke, Songyu; Yang, Xiaodu; Liang, Yuxuan; Yu, Yong; Zhang, Junbo; Zheng, Yu

AutoSTG: Neural Architecture Search forPredictions of Spatio-Temporal Graphs Proceedings Article

In: WWW 2021, 2021.

Links | BibTeX

1607.

Kyriakides, George; Margaritis, Konstantinos

Evolving graph convolutional networks for neural architecture search Journal Article

In: Neural Computing and Applications, 2021.

Abstract | Links | BibTeX

1606.

Harikrishnan, V. K.; Gambhir, MeenuAshima

Neural AutoML with Convolutional Networks for Diabetic Retinopathy Diagnosis Journal Article

In: Machine Intelligence and Smart Systems, pp. 145-157, 2021.

Links | BibTeX

1605.

Wang, Linnan; Xie, Saining; Li, Teng; Fonseca, Rodrigo; Tian, Yuandong

Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search Journal Article

In: IEEE transactions on pattern analysis and machine intelligence, vol. PP, 2021, ISSN: 0162-8828.

Abstract | Links | BibTeX

1604.

Zhang, Zhentong; Shan, Yugang; Yuan, Jie

Multi-Level Cell Progressive Differentiable Architecture Search to Improve Image Classification Accuracy Journal Article

In: Journal of Signal Processing Systems, 2021.

Abstract | Links | BibTeX

1603.

Zheng, X; Ji, R; Chen, Y; Wang, Q; Zhang, B; Ye, Q; Chen, J; Huang, F; Tian, Y

MIGO-NAS: Towards Fast and Generalizable Neural Architecture Search Journal Article

In: IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 01, pp. 1-1, 2021, ISSN: 1939-3539.

Links | BibTeX

1602.

Zimmer, Lucas; Lindauer, Marius; Hutter, Frank

Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL Journal Article

In: IEEE transactions on pattern analysis and machine intelligence, vol. PP, 2021, ISSN: 0162-8828.

Abstract | Links | BibTeX

1601.

Liu, Lanlan; Zhang, Yuting; Deng, Jia; Soatto, Stefano

Dynamically Grown Generative Adversarial Networks Proceedings Article

In: AAAI 2021, 2021.

Links | BibTeX

1600.

Xu, Y; Xie, L; Dai, W; Zhang, X; Chen, X; Qi, G; Xiong, H; Tian, Q

Partially-Connected Neural Architecture Search for Reduced Computational Redundancy Journal Article

In: IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 01, pp. 1-1, 2021, ISSN: 1939-3539.

Abstract | Links | BibTeX

@article{9354953,

title = {Partially-Connected Neural Architecture Search for Reduced Computational Redundancy},

author = {Y Xu and L Xie and W Dai and X Zhang and X Chen and G Qi and H Xiong and Q Tian},

url = {https://www.computer.org/csdl/journal/tp/5555/01/09354953/1rgCccYlOaQ},

doi = {10.1109/TPAMI.2021.3059510},

issn = {1939-3539},

year  = {2021},

date = {2021-02-01},

journal = {IEEE Transactions on Pattern Analysis & Machine Intelligence},

number = {01},

pages = {1-1},

publisher = {IEEE Computer Society},

address = {Los Alamitos, CA, USA},

abstract = {Differentiable architecture search (DARTS) enables effective neural architecture search (NAS) using gradient descent, but suffers from high memory and computational costs. In this paper, we propose a novel approach, namely Partially-Connected DARTS (PC-DARTS), to achieve efficient and stable neural architecture search by reducing the channel and spatial redundancies of the super-network. In the channel level, partial channel connection is presented to randomly sample a small subset of channels for operation selection to accelerate the search process and suppress the over-fitting of the super-network. Side operation is introduced for bypassing (non-sampled) channels to guarantee the performance of searched architectures under extremely low sampling rates. In the spatial level, input features are down-sampled to eliminate spatial redundancy and enhance the efficiency of the mixed computation for operation selection. Furthermore, edge normalization is developed to maintain the consistency of edge selection based on channel sampling with the architectural parameters for edges. Experimental results demonstrate that the proposed approach achieves higher search speed and training stability than DARTS. PC-DARTS obtains a top-1 error rate of 2.55% on CIFAR-10 with 0.07 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.1% on ImageNet (under the mobile setting) within 2.8 GPU-day.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

1599.

Hao, Jie; Zhu, William

Architecture self-attention mechanism: nonlinear optimization for neural architecture search Journal Article

In: Journal of Nonlinear and Variational Analysis, vol. 5, pp. 119-140, 2021.

Links | BibTeX

1598.

Gao, Yanjie; Gu, Xianyu; Zhang, Hongyu; Lin, Haoxiang; Yang, Mao

Runtime Performance Prediction for Deep Learning Models with Graph Neural Network Technical Report

Microsoft no. MSR-TR-2021-3, 2021.

Abstract | Links | BibTeX

@techreport{gao2021runtime,

title = {Runtime Performance Prediction for Deep Learning Models with Graph Neural Network},

author = {Yanjie Gao and Xianyu Gu and Hongyu Zhang and Haoxiang Lin and Mao Yang},

url = {https://www.microsoft.com/en-us/research/publication/runtime-performance-prediction-for-deep-learning-models-with-graph-neural-network/},

year  = {2021},

date = {2021-02-01},

urldate = {2021-02-01},

number = {MSR-TR-2021-3},

institution = {Microsoft},

abstract = {Recently, deep learning (DL) has been widely adopted in many application domains. Predicting the runtime performance of DL models such as GPU memory consumption and training time is important to boost development productivity and reduce resource waste because improper configurations of hyperparameters and neural architectures can result in many failed training jobs or inappropriate models. However, general runtime performance prediction for DL models is challenging due to the hybrid DL programming paradigm, complicated hidden factors within the framework runtime, fairly huge model configuration space, and wide differences among models. In this paper, we propose DNNPerf, a novel and general machine learning approach to predict the runtime performance of DL models using Graph Neural Network. DNNPerf represents a DL model as a directed acyclic computation graph and designs a rich set of effective performance-related features based on the computational semantics of both nodes and edges. We also propose a new Attention-based Node-Edge Encoder to better encode the node and edge features. DNNPerf is extensively evaluated on thousands of configurations of real-world and synthetic DL models to predict their GPU memory consumption and training time. The experimental results demonstrate that DNNPerf achieves an overall error of 13.684% for the GPU memory consumption prediction and an overall error of 7.443% for the training time prediction, outperforming all the compared methods.},

keywords = {},

pubstate = {published},

tppubtype = {techreport}

}

Close

1597.

Pham, Hieu; Le, Quoc V

AutoDropout - Learning Dropout Patterns to Regularize Deep Networks Technical Report

2021.

Links | BibTeX