A Brief Introduction to Graphical Models and Bayesian Networks

By Kevin Murphy, 1998.

"Graphical models are a marriage between probability theory andgraph theory. They provide a natural tool for dealing with two problemsthat occur throughout applied mathematics and engineering --uncertainty and complexity -- and in particular they are playing anincreasingly important role in the design and analysis of machinelearning algorithms. Fundamental to the idea of a graphical model isthe notion of modularity -- a complex system is built by combiningsimpler parts. Probability theory provides the glue whereby the partsare combined, ensuring that the system as a whole is consistent, andproviding ways to interface models to data. The graph theoretic sideof graphical models provides both an intuitively appealing interfaceby which humans can model highly-interacting sets of variables as wellas a data structure that lends itself naturally to the design ofefficient general-purpose algorithms.

Many of the classical multivariate probabalistic systems studied infields such as statistics, systems engineering, information theory,pattern recognition and statistical mechanics are special cases of thegeneral graphical model formalism -- examples include mixture models,factor analysis, hidden Markov models, Kalman filters and Isingmodels. The graphical model framework provides a way to view all ofthese systems as instances of a common underlying formalism. This viewhas many advantages -- in particular, specialized techniques that havebeen developed in one field can be transferred between researchcommunities and exploited more widely. Moreover, the graphical modelformalism provides a natural framework for the design of new systems."--- Michael Jordan, 1998.

This tutorial

We will briefly discuss the following topics.

, or, what exactly is agraphical model?
, or, how can we use these modelsto efficiently answer probabilistic queries?
, or, what do we do if we don't know what themodel is?
, or, what happens when itis time to convert beliefs into actions?
, or, what's this all good for, anyway?

Note: (a version of) this page is availablein pdf formathere.Also, Marie Stefanova has madea Swedish translationhere.

Articles in the popular press

The following articles provide less technical introductions.

LA times article (10/28/96)about Bayes nets.
Economist article (3/22/01)about Microsoft's application of BNs.

Other sources of technical information

My tutorial on Bayes rule
AUAI homepage (Association for Uncertainty in Artificial Intelligence)
The UAImailing list
UAI proceedings.
My list of .
Bayes Net softwarepackages
My Bayes Net Toolbox forMatlab
Tutorial slides on graphical modelsand BNT, presented to the Mathworks, May 2003
Listof other Bayes net tutorials

.)This can be used as a guide to construct the graph structure.In addition, directed models can encode deterministicrelationships, and are easier to learn (fit to data).In the rest of this tutorial, we will only discuss directed graphicalmodels, i.e., Bayesian networks.

In addition to the graph structure, it is necessary to specify theparameters of the model.For a directed model, we must specifythe Conditional Probability Distribution (CPD) at each node.If the variables are discrete, this can be represented as a table(CPT), which lists the probability that the child node takes on eachof its different values for each combination of values of itsparents. Consider the following example, in which all nodes are binary,i.e., have two possible values, which we will denote by T (true) andF (false).

We see that the event "grass is wet" (W=true) has two possible causes: either the water sprinker is on (S=true) or it israining (R=true).The strength of this relationship is shown in the table.For example, we see that Pr(W=true | S=true, R=false) = 0.9 (secondrow), andhence, Pr(W=false | S=true, R=false) = 1 - 0.9 = 0.1, since each rowmust sum to one.Since the C node has no parents, its CPT specifies the priorprobability that it is cloudy (in this case, 0.5).(Think of C as representing the season:if it is a cloudy season, it is less likely that the sprinkler is onand more likely that the rain is on.)

The simplest conditional independence relationship encoded in a Bayesiannetwork can be stated as follows:a node is independent of its ancestors given its parents, where theancestor/parent relationship is with respect to some fixed topologicalordering of the nodes.

By the chain rule of probability,the joint probability of all the nodes in the graph above is

P(C, S, R, W) = P(C) * P(S|C) * P(R|C,S) * P(W|C,S,R)

By using conditional independence relationships, we can rewrite this as

P(C, S, R, W) = P(C) * P(S|C) * P(R|C)   * P(W|S,R)

where we were allowed to simplify the third term because R isindependent of S given its parent C, and the last term because W isindependent of C given its parents S and R.

We can see that the conditional independence relationshipsallow us to represent the joint more compactly.Here the savings are minimal, but in general, if we had n binarynodes, the full joint would require O(2^n) space to represent, but thefactored form would require O(n 2^k) space to represent, where k isthe maximum fan-in of a node. And fewer parameters makes learning easier.

Are "Bayesian networks" Bayesian?

Despite the name,Bayesian networks do not necessarily imply a commitment to Bayesianstatistics. Indeed, it is common to use frequentists methods to estimate the parameters of the CPDs.Rather, they are so called because they useBayes' rule forprobabilistic inference, as we explain below.(The term "directed graphical model" is perhaps more appropriate.)Nevetherless, Bayes nets are a useful representation for hierarchicalBayesian models, which form the foundation of applied Bayesianstatistics(see e.g., the BUGS project).In such a model, the parameters are treated like any other randomvariable, and becomes nodes in the graph.

Inference

The most common task we wish to solve using Bayesian networks isprobabilistic inference. For example, consider the water sprinklernetwork, and suppose we observe thefact that the grass is wet. There are two possible causes for this:either it is raining, or the sprinkler is on. Which is more likely?We can use Bayes' rule to compute the posterior probability of eachexplanation (where 0==false and 1==true).

where

is a normalizing constant, equal to the probability (likelihood) ofthe data.So we see that it is more likely that the grass is wet becauseit is raining:the likelihood ratio is 0.7079/0.4298 = 1.647.

try this little BNT demo!)

Top-down and bottom-up reasoning

In the water sprinkler example, we had evidence of an effect (wet grass), andinferred the most likely cause. This is called diagnostic, or "bottomup", reasoning, since it goesfrom effects to causes; it is a common task in expert systems.Bayes nets can also be used for causal, or "top down",reasoning. For example, we can compute the probability that the grasswill be wet given that it is cloudy.Hence Bayes nets are often called "generative" models, because theyspecify how causes generate effects.

"Causality: Models,Reasoning and Inference",Judea Pearl, 2000, Cambridge University Press.
"Causation,Prediction and Search", Spirtes, Glymour and Scheines, 2001 (2nd edition), MIT Press.
"Causeand Correlation in Biology", Bill Shipley, 2000,Cambridge University Press.
"Computation, Causation and Discovery", Glymour and Cooper (eds),1999, MIT Press.

Conditional independence in Bayes Nets

In general,the conditional independence relatio

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。