打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
A Brief Introduction to Graphical Models and Bayesian Networks - Graphical Models

A Brief Introduction to Graphical Models and Bayesian Networks

By Kevin Murphy, 1998.

"Graphical models are a marriage between probability theory andgraph theory. They provide a natural tool for dealing with two problemsthat occur throughout applied mathematics and engineering --uncertainty and complexity -- and in particular they are playing anincreasingly important role in the design and analysis of machinelearning algorithms. Fundamental to the idea of a graphical model isthe notion of modularity -- a complex system is built by combiningsimpler parts. Probability theory provides the glue whereby the partsare combined, ensuring that the system as a whole is consistent, andproviding ways to interface models to data. The graph theoretic sideof graphical models provides both an intuitively appealing interfaceby which humans can model highly-interacting sets of variables as wellas a data structure that lends itself naturally to the design ofefficient general-purpose algorithms.

Many of the classical multivariate probabalistic systems studied infields such as statistics, systems engineering, information theory,pattern recognition and statistical mechanics are special cases of thegeneral graphical model formalism -- examples include mixture models,factor analysis, hidden Markov models, Kalman filters and Isingmodels. The graphical model framework provides a way to view all ofthese systems as instances of a common underlying formalism. This viewhas many advantages -- in particular, specialized techniques that havebeen developed in one field can be transferred between researchcommunities and exploited more widely. Moreover, the graphical modelformalism provides a natural framework for the design of new systems."--- Michael Jordan, 1998.

This tutorial

We will briefly discuss the following topics.
  • , or, what exactly is agraphical model?
  • , or, how can we use these modelsto efficiently answer probabilistic queries?
  • , or, what do we do if we don't know what themodel is?
  • , or, what happens when itis time to convert beliefs into actions?
  • , or, what's this all good for, anyway?

Note: (a version of) this page is availablein pdf formathere.Also, Marie Stefanova has madea Swedish translationhere.

Articles in the popular press

The following articles provide less technical introductions.

Other sources of technical information

.)This can be used as a guide to construct the graph structure.In addition, directed models can encode deterministicrelationships, and are easier to learn (fit to data).In the rest of this tutorial, we will only discuss directed graphicalmodels, i.e., Bayesian networks.

In addition to the graph structure, it is necessary to specify theparameters of the model.For a directed model, we must specifythe Conditional Probability Distribution (CPD) at each node.If the variables are discrete, this can be represented as a table(CPT), which lists the probability that the child node takes on eachof its different values for each combination of values of itsparents. Consider the following example, in which all nodes are binary,i.e., have two possible values, which we will denote by T (true) andF (false).

We see that the event "grass is wet" (W=true) has two possible causes: either the water sprinker is on (S=true) or it israining (R=true).The strength of this relationship is shown in the table.For example, we see that Pr(W=true | S=true, R=false) = 0.9 (secondrow), andhence, Pr(W=false | S=true, R=false) = 1 - 0.9 = 0.1, since each rowmust sum to one.Since the C node has no parents, its CPT specifies the priorprobability that it is cloudy (in this case, 0.5).(Think of C as representing the season:if it is a cloudy season, it is less likely that the sprinkler is onand more likely that the rain is on.)

The simplest conditional independence relationship encoded in a Bayesiannetwork can be stated as follows:a node is independent of its ancestors given its parents, where theancestor/parent relationship is with respect to some fixed topologicalordering of the nodes.

By the chain rule of probability,the joint probability of all the nodes in the graph above is

P(C, S, R, W) = P(C) * P(S|C) * P(R|C,S) * P(W|C,S,R)
By using conditional independence relationships, we can rewrite this as
P(C, S, R, W) = P(C) * P(S|C) * P(R|C)   * P(W|S,R)
where we were allowed to simplify the third term because R isindependent of S given its parent C, and the last term because W isindependent of C given its parents S and R.

We can see that the conditional independence relationshipsallow us to represent the joint more compactly.Here the savings are minimal, but in general, if we had n binarynodes, the full joint would require O(2^n) space to represent, but thefactored form would require O(n 2^k) space to represent, where k isthe maximum fan-in of a node. And fewer parameters makes learning easier.

Are "Bayesian networks" Bayesian?

Despite the name,Bayesian networks do not necessarily imply a commitment to Bayesianstatistics. Indeed, it is common to use frequentists methods to estimate the parameters of the CPDs.Rather, they are so called because they useBayes' rule forprobabilistic inference, as we explain below.(The term "directed graphical model" is perhaps more appropriate.)Nevetherless, Bayes nets are a useful representation for hierarchicalBayesian models, which form the foundation of applied Bayesianstatistics(see e.g., the BUGS project).In such a model, the parameters are treated like any other randomvariable, and becomes nodes in the graph.

Inference

The most common task we wish to solve using Bayesian networks isprobabilistic inference. For example, consider the water sprinklernetwork, and suppose we observe thefact that the grass is wet. There are two possible causes for this:either it is raining, or the sprinkler is on. Which is more likely?We can use Bayes' rule to compute the posterior probability of eachexplanation (where 0==false and 1==true).


where


is a normalizing constant, equal to the probability (likelihood) ofthe data.So we see that it is more likely that the grass is wet becauseit is raining:the likelihood ratio is 0.7079/0.4298 = 1.647.

try this little BNT demo!)

Top-down and bottom-up reasoning

In the water sprinkler example, we had evidence of an effect (wet grass), andinferred the most likely cause. This is called diagnostic, or "bottomup", reasoning, since it goesfrom effects to causes; it is a common task in expert systems.Bayes nets can also be used for causal, or "top down",reasoning. For example, we can compute the probability that the grasswill be wet given that it is cloudy.Hence Bayes nets are often called "generative" models, because theyspecify how causes generate effects.

Conditional independence in Bayes Nets

In general,the conditional independence relatio
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Software Packages for Graphical Models / Baye...
CRAN Task View: gRaphical Models in R
机器学习资料大汇总
【干货】(代码 教程)贝叶斯深度学习
Bayesian Graphical VAR(BGVAR)
Statistical Data Mining Tutorials
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服