{"id":290,"date":"2024-03-26T18:51:42","date_gmt":"2024-03-26T09:51:42","guid":{"rendered":"https:\/\/mp-superkler.com\/?p=290"},"modified":"2024-07-25T01:03:15","modified_gmt":"2024-07-24T16:03:15","slug":"boltzmann-machine","status":"publish","type":"post","link":"https:\/\/mp-superkler.com\/?p=290","title":{"rendered":"Boltzmann Machine"},"content":{"rendered":"\n<p>A Boltzmann machine is a type of stochastic recurrent neural network that can learn a probability distribution over its set of inputs. It is particularly used for modeling complex distributions and solving combinatorial optimization problems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Mathematical Formulation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Energy Function<\/h3>\n\n\n\n<p>A Boltzmann machine is defined by an energy function \\( E(\\mathbf{v}, \\mathbf{h}) \\) which assigns a scalar energy to each configuration of visible units \\( \\mathbf{v} \\) and hidden units \\( \\mathbf{h} \\). The energy function is given by:<\/p>\n\n\n\n<p>$$ E(\\mathbf{v}, \\mathbf{h}) = -\\sum_{i} \\sum_{j} W_{ij} v_i h_j &#8211; \\sum_{i} b_i v_i &#8211; \\sum_{j} c_j h_j $$<\/p>\n\n\n\n<p>where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\\( v_i \\) and \\( h_j \\) are the states of the visible and hidden units, respectively.<\/li>\n\n\n\n<li>\\( W_{ij} \\) is the weight between visible unit \\( i \\) and hidden unit \\( j \\).<\/li>\n\n\n\n<li>\\( b_i \\) and \\( c_j \\) are the biases of the visible and hidden units, respectively.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Probability Distribution<\/h3>\n\n\n\n<p>The joint probability distribution over the visible and hidden units is defined by the Boltzmann distribution:<\/p>\n\n\n\n<p>$$ P(\\mathbf{v}, \\mathbf{h}) = \frac{1}{Z} \\exp(-E(\\mathbf{v}, \\mathbf{h})) $$<\/p>\n\n\n\n<p>where \\( Z \\) is the partition function, given by:<\/p>\n\n\n\n<p>$$ Z = \\sum_{\\mathbf{v}} \\sum_{\\mathbf{h}} \\exp(-E(\\mathbf{v}, \\mathbf{h})) $$<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Marginal Probability<\/h3>\n\n\n\n<p>The probability of a visible vector \\( \\mathbf{v} \\) is obtained by marginalizing over the hidden units:<\/p>\n\n\n\n<p>$$ P(\\mathbf{v}) = \frac{1}{Z} \\sum_{\\mathbf{h}} \\exp(-E(\\mathbf{v}, \\mathbf{h})) $$<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Training Boltzmann Machines<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Objective Function<\/h3>\n\n\n\n<p>The training objective for a Boltzmann machine is to maximize the likelihood of the observed data. This can be achieved by minimizing the negative log-likelihood:<\/p>\n\n\n\n<p>$$ \\mathcal{L} = -\\sum_{\\mathbf{v}} P_{ ext{data}}(\\mathbf{v}) \\log P(\\mathbf{v}) $$<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gradient of the Log-Likelihood<\/h3>\n\n\n\n<p>The gradient of the log-likelihood with respect to a weight \\( W_{ij} \\) is given by:<\/p>\n\n\n\n<p>$$ \frac{\\partial \\log P(\\mathbf{v})}{\\partial W_{ij}} = \\langle v_i h_j<br>angle_{ ext{data}} &#8211; \\langle v_i h_j<br>angle_{ ext{model}} $$<\/p>\n\n\n\n<p>where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\\( \\langle \\cdot<br>angle_{ ext{data}} \\) denotes the expectation with respect to the data distribution.<\/li>\n\n\n\n<li>\\( \\langle \\cdot<br>angle_{ ext{model}} \\) denotes the expectation with respect to the model distribution.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Contrastive Divergence<\/h3>\n\n\n\n<p>In practice, exact computation of the gradient is intractable. Contrastive Divergence (CD) is a common approximation method used to train Boltzmann machines. The CD algorithm involves the following steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Start with a training example \\( \\mathbf{v}^{(0)} \\).<\/li>\n\n\n\n<li>Perform Gibbs sampling to obtain a sample \\( \\mathbf{v}^{(k)} \\) after \\( k \\) steps.<\/li>\n\n\n\n<li>Update the weights using the difference between the data-dependent and model-dependent expectations:<\/li>\n<\/ol>\n\n\n\n<p>$$ \\Delta W_{ij} = \\epsilon (\\langle v_i h_j<br>angle_{ ext{data}} &#8211; \\langle v_i h_j<br>angle_{ ext{model}}) $$<\/p>\n\n\n\n<p>where \\( \\epsilon \\) is the learning rate.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Restricted Boltzmann Machines (RBMs)<\/h2>\n\n\n\n<p>A Restricted Boltzmann Machine (RBM) is a special type of Boltzmann machine where the visible and hidden units form a bipartite graph. This restriction simplifies the training process and makes RBMs useful for practical applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Energy Function for RBMs<\/h3>\n\n\n\n<p>The energy function for an RBM is given by:<\/p>\n\n\n\n<p>$$ E(\\mathbf{v}, \\mathbf{h}) = -\\sum_{i} \\sum_{j} W_{ij} v_i h_j &#8211; \\sum_{i} b_i v_i &#8211; \\sum_{j} c_j h_j $$<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conditional Probabilities<\/h3>\n\n\n\n<p>The conditional probabilities for the hidden and visible units are:<\/p>\n\n\n\n<p>$$ P(h_j = 1 | \\mathbf{v}) = \\sigma \\left( \\sum_{i} W_{ij} v_i + c_j<br>\\right) $$<\/p>\n\n\n\n<p>$$ P(v_i = 1 | \\mathbf{h}) = \\sigma \\left( \\sum_{j} W_{ij} h_j + b_i<br>\\right) $$<\/p>\n\n\n\n<p>where \\( \\sigma(x) \\) is the sigmoid function.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Applications<\/h2>\n\n\n\n<p>Boltzmann machines and RBMs are used in various applications, including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dimensionality Reduction:<\/strong> RBMs can be used to reduce the dimensionality of data while preserving important features.<\/li>\n\n\n\n<li><strong>Collaborative Filtering:<\/strong> RBMs are used in recommendation systems to predict user preferences.<\/li>\n\n\n\n<li><strong>Feature Learning:<\/strong> RBMs can learn useful features from unlabeled data, making them useful for unsupervised learning tasks.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Boltzmann machines provide a powerful framework for modeling complex probability distributions and solving optimization problems. Restricted Boltzmann Machines (RBMs) are particularly practical and have been successfully applied in various real-world applications.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Boltzmann machine is a type of stochastic recurrent neural network that can learn a probability distribution<\/p>\n","protected":false},"author":1,"featured_media":293,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[],"class_list":["post-290","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-probability"],"_links":{"self":[{"href":"https:\/\/mp-superkler.com\/index.php?rest_route=\/wp\/v2\/posts\/290","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mp-superkler.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mp-superkler.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mp-superkler.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mp-superkler.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=290"}],"version-history":[{"count":3,"href":"https:\/\/mp-superkler.com\/index.php?rest_route=\/wp\/v2\/posts\/290\/revisions"}],"predecessor-version":[{"id":486,"href":"https:\/\/mp-superkler.com\/index.php?rest_route=\/wp\/v2\/posts\/290\/revisions\/486"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mp-superkler.com\/index.php?rest_route=\/wp\/v2\/media\/293"}],"wp:attachment":[{"href":"https:\/\/mp-superkler.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=290"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mp-superkler.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=290"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mp-superkler.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=290"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}