Batch normalized Deep Boltzmann Machines

Vu, Hung; Nguyen, Tu Dinh; Le, Trung; Luo, Wei; Phung, Dinh

Batch normalized Deep Boltzmann Machines

conference contribution

posted on 2018-01-01, 00:00 authored by Hung Vu, Tu Dinh Nguyen, Trung Le, Wei LuoWei Luo, Dinh Phung

Training Deep Boltzmann Machines (DBMs) is a challenging task in deep generative model studies. The careless training usually leads to a divergence or a useless model. We discover that this phenomenon is due to the change of DBM layers’ input signals during model parameter updates, similar to other deterministic deep networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). The change of layers’ input distributions not only complicates the learning process but also causes redundant neurons that simply imitate the others’ behaviors. Although this phenomenon can be coped using batch normalization in deep learning, integrating this technique into the probabilistic network of DBMs is a challenging problem since it has to satisfy two conditions of energy function and conditional probabilities. In this paper, we introduce Batch Normalized Deep Boltzmann Machines (BNDBMs) that meet both aforementioned conditions and successfully combine batch normalization and DBMs into the same framework. However, unlike CNNs, due to the probabilistic nature of DBMs, training DBMs with batch normalization has some differences: i) fixing shift parameters $\bnshift$ but learning scale parameters $\bnscale$; ii) avoiding normalizing the first hidden layer and iii) maintaining multiple pairs of population means and variances per neuron rather than one pair in CNNs. We observe that our proposed BNDBMs can stabilize the input signals of network layers and facilitate the training process as well as improve the model quality. More interestingly, BNDBMs can be trained successfully without pretraining, which is usually a mandatory step in most existing DBMs. The experimental results in MNIST, Fashion-MNIST and Caltech 101 Silhouette datasets show that our BNDBMs outperform DBMs and centered DBMs in terms of feature representation and classification accuracy ($3.98%$ and $5.84%$ average improvement for pretraining and no pretraining respectively).

History

Volume

95

Pagination

359-374

Location

Beijing, China

Start date

2018-11-14

End date

2018-11-16

ISSN

2640-3498

Language

eng

Notes

pdf: http://proceedings.mlr.press/v95/vu18a/vu18a.pdf

Publication classification

E1 Full written paper - refereed

Copyright notice

2018, H. Vu, T.D. Nguyen, T. Le, W. Luo & D. Phung

Editor/Contributor(s)

Zhu J, Takeuchi I

Title of proceedings

ACML 2018 : Proceedings of the 10th Asian Conference on Machine Learning

Event

Machine Learning. Conference (10th : 2018 : Beijing, China)

Publisher

JMLR

Place of publication

Cambridge, Mass.

Series

Machine Learning Conference

Publication URL

http://proceedings.mlr.press/v95/vu18a.html

Usage metrics

Keywords

Deep Boltzmann Machines Deep generative model studies DBM layers Convolutional Neural Networks (CNNs)Recurrent Neural Networks (RNNs)4603 Computer vision and multimedia computation

Batch normalized Deep Boltzmann Machines

History

Volume

Pagination

Location

Start date

End date

ISSN

Language

Notes

Publication classification

Copyright notice

Editor/Contributor(s)

Title of proceedings

Event

Publisher

Place of publication

Series

Publication URL

Usage metrics

Categories

Keywords

Licence

Exports