Probability

Boltzmann Machine

A Boltzmann machine is a type of stochastic recurrent neural network that can learn a probability distribution over its set of inputs. It is particularly used for modeling complex distributions and solving combinatorial optimization problems.

Mathematical Formulation

Energy Function

A Boltzmann machine is defined by an energy function \( E(\mathbf{v}, \mathbf{h}) \) which assigns a scalar energy to each configuration of visible units \( \mathbf{v} \) and hidden units \( \mathbf{h} \). The energy function is given by:

$$ E(\mathbf{v}, \mathbf{h}) = -\sum_{i} \sum_{j} W_{ij} v_i h_j – \sum_{i} b_i v_i – \sum_{j} c_j h_j $$

where:

  • \( v_i \) and \( h_j \) are the states of the visible and hidden units, respectively.
  • \( W_{ij} \) is the weight between visible unit \( i \) and hidden unit \( j \).
  • \( b_i \) and \( c_j \) are the biases of the visible and hidden units, respectively.

Probability Distribution

The joint probability distribution over the visible and hidden units is defined by the Boltzmann distribution:

$$ P(\mathbf{v}, \mathbf{h}) = rac{1}{Z} \exp(-E(\mathbf{v}, \mathbf{h})) $$

where \( Z \) is the partition function, given by:

$$ Z = \sum_{\mathbf{v}} \sum_{\mathbf{h}} \exp(-E(\mathbf{v}, \mathbf{h})) $$

Marginal Probability

The probability of a visible vector \( \mathbf{v} \) is obtained by marginalizing over the hidden units:

$$ P(\mathbf{v}) = rac{1}{Z} \sum_{\mathbf{h}} \exp(-E(\mathbf{v}, \mathbf{h})) $$

Training Boltzmann Machines

Objective Function

The training objective for a Boltzmann machine is to maximize the likelihood of the observed data. This can be achieved by minimizing the negative log-likelihood:

$$ \mathcal{L} = -\sum_{\mathbf{v}} P_{ ext{data}}(\mathbf{v}) \log P(\mathbf{v}) $$

Gradient of the Log-Likelihood

The gradient of the log-likelihood with respect to a weight \( W_{ij} \) is given by:

$$ rac{\partial \log P(\mathbf{v})}{\partial W_{ij}} = \langle v_i h_j
angle_{ ext{data}} – \langle v_i h_j
angle_{ ext{model}} $$

where:

  • \( \langle \cdot
    angle_{ ext{data}} \) denotes the expectation with respect to the data distribution.
  • \( \langle \cdot
    angle_{ ext{model}} \) denotes the expectation with respect to the model distribution.

Contrastive Divergence

In practice, exact computation of the gradient is intractable. Contrastive Divergence (CD) is a common approximation method used to train Boltzmann machines. The CD algorithm involves the following steps:

  1. Start with a training example \( \mathbf{v}^{(0)} \).
  2. Perform Gibbs sampling to obtain a sample \( \mathbf{v}^{(k)} \) after \( k \) steps.
  3. Update the weights using the difference between the data-dependent and model-dependent expectations:

$$ \Delta W_{ij} = \epsilon (\langle v_i h_j
angle_{ ext{data}} – \langle v_i h_j
angle_{ ext{model}}) $$

where \( \epsilon \) is the learning rate.

Restricted Boltzmann Machines (RBMs)

A Restricted Boltzmann Machine (RBM) is a special type of Boltzmann machine where the visible and hidden units form a bipartite graph. This restriction simplifies the training process and makes RBMs useful for practical applications.

Energy Function for RBMs

The energy function for an RBM is given by:

$$ E(\mathbf{v}, \mathbf{h}) = -\sum_{i} \sum_{j} W_{ij} v_i h_j – \sum_{i} b_i v_i – \sum_{j} c_j h_j $$

Conditional Probabilities

The conditional probabilities for the hidden and visible units are:

$$ P(h_j = 1 | \mathbf{v}) = \sigma \left( \sum_{i} W_{ij} v_i + c_j
\right) $$

$$ P(v_i = 1 | \mathbf{h}) = \sigma \left( \sum_{j} W_{ij} h_j + b_i
\right) $$

where \( \sigma(x) \) is the sigmoid function.

Applications

Boltzmann machines and RBMs are used in various applications, including:

  • Dimensionality Reduction: RBMs can be used to reduce the dimensionality of data while preserving important features.
  • Collaborative Filtering: RBMs are used in recommendation systems to predict user preferences.
  • Feature Learning: RBMs can learn useful features from unlabeled data, making them useful for unsupervised learning tasks.

Conclusion

Boltzmann machines provide a powerful framework for modeling complex probability distributions and solving optimization problems. Restricted Boltzmann Machines (RBMs) are particularly practical and have been successfully applied in various real-world applications.

ABOUT ME
Voyager
Using AI for all decision-making.