A Boltzmann machine is a type of stochastic recurrent neural network that can learn a probability distribution over its set of inputs. It is particularly used for modeling complex distributions and solving combinatorial optimization problems.
Mathematical Formulation
Energy Function
A Boltzmann machine is defined by an energy function \( E(\mathbf{v}, \mathbf{h}) \) which assigns a scalar energy to each configuration of visible units \( \mathbf{v} \) and hidden units \( \mathbf{h} \). The energy function is given by:
$$ E(\mathbf{v}, \mathbf{h}) = -\sum_{i} \sum_{j} W_{ij} v_i h_j – \sum_{i} b_i v_i – \sum_{j} c_j h_j $$
where:
- \( v_i \) and \( h_j \) are the states of the visible and hidden units, respectively.
- \( W_{ij} \) is the weight between visible unit \( i \) and hidden unit \( j \).
- \( b_i \) and \( c_j \) are the biases of the visible and hidden units, respectively.
Probability Distribution
The joint probability distribution over the visible and hidden units is defined by the Boltzmann distribution:
$$ P(\mathbf{v}, \mathbf{h}) = rac{1}{Z} \exp(-E(\mathbf{v}, \mathbf{h})) $$
where \( Z \) is the partition function, given by:
$$ Z = \sum_{\mathbf{v}} \sum_{\mathbf{h}} \exp(-E(\mathbf{v}, \mathbf{h})) $$
Marginal Probability
The probability of a visible vector \( \mathbf{v} \) is obtained by marginalizing over the hidden units:
$$ P(\mathbf{v}) = rac{1}{Z} \sum_{\mathbf{h}} \exp(-E(\mathbf{v}, \mathbf{h})) $$
Training Boltzmann Machines
Objective Function
The training objective for a Boltzmann machine is to maximize the likelihood of the observed data. This can be achieved by minimizing the negative log-likelihood:
$$ \mathcal{L} = -\sum_{\mathbf{v}} P_{ ext{data}}(\mathbf{v}) \log P(\mathbf{v}) $$
Gradient of the Log-Likelihood
The gradient of the log-likelihood with respect to a weight \( W_{ij} \) is given by:
$$ rac{\partial \log P(\mathbf{v})}{\partial W_{ij}} = \langle v_i h_j
angle_{ ext{data}} – \langle v_i h_j
angle_{ ext{model}} $$
where:
- \( \langle \cdot
angle_{ ext{data}} \) denotes the expectation with respect to the data distribution. - \( \langle \cdot
angle_{ ext{model}} \) denotes the expectation with respect to the model distribution.
Contrastive Divergence
In practice, exact computation of the gradient is intractable. Contrastive Divergence (CD) is a common approximation method used to train Boltzmann machines. The CD algorithm involves the following steps:
- Start with a training example \( \mathbf{v}^{(0)} \).
- Perform Gibbs sampling to obtain a sample \( \mathbf{v}^{(k)} \) after \( k \) steps.
- Update the weights using the difference between the data-dependent and model-dependent expectations:
$$ \Delta W_{ij} = \epsilon (\langle v_i h_j
angle_{ ext{data}} – \langle v_i h_j
angle_{ ext{model}}) $$
where \( \epsilon \) is the learning rate.
Restricted Boltzmann Machines (RBMs)
A Restricted Boltzmann Machine (RBM) is a special type of Boltzmann machine where the visible and hidden units form a bipartite graph. This restriction simplifies the training process and makes RBMs useful for practical applications.
Energy Function for RBMs
The energy function for an RBM is given by:
$$ E(\mathbf{v}, \mathbf{h}) = -\sum_{i} \sum_{j} W_{ij} v_i h_j – \sum_{i} b_i v_i – \sum_{j} c_j h_j $$
Conditional Probabilities
The conditional probabilities for the hidden and visible units are:
$$ P(h_j = 1 | \mathbf{v}) = \sigma \left( \sum_{i} W_{ij} v_i + c_j
\right) $$
$$ P(v_i = 1 | \mathbf{h}) = \sigma \left( \sum_{j} W_{ij} h_j + b_i
\right) $$
where \( \sigma(x) \) is the sigmoid function.
Applications
Boltzmann machines and RBMs are used in various applications, including:
- Dimensionality Reduction: RBMs can be used to reduce the dimensionality of data while preserving important features.
- Collaborative Filtering: RBMs are used in recommendation systems to predict user preferences.
- Feature Learning: RBMs can learn useful features from unlabeled data, making them useful for unsupervised learning tasks.
Conclusion
Boltzmann machines provide a powerful framework for modeling complex probability distributions and solving optimization problems. Restricted Boltzmann Machines (RBMs) are particularly practical and have been successfully applied in various real-world applications.