Intro to AI - Uncertainty
ai knowledge cs50Definitions
probability | Probability is the likelihood of a certain event \$a\$ to happen.
Every possible outcome of the event can be thought of as a world
\$omega\$ with its probability \$P(omega)\$. |
unconditional probability; marginal probability | Unconditional probability or marginal probability is the degree of believe
in a proposition in absence of any other evidence. |
conditional probability | Conditional probability is the degree of believe in a proposition
given some evidence has already been revealed. |
joint probability | Joint probability is the probability of event a and b. |
random variable | A random variable in probability theory is a variable with a domain of possible (discrete) values it can take on, each possible value might have a different probability. |
probability distribution | Probability distribution is a function that gives the probability for each possible value of a random variable, in the case of discrete values this can be just a vector of probabilities for each possible value of the variable. |
independence | Independence is the knowledge that the outcome of one event doesn’t affect the
outcome of another event.
For two independent event the following is true: |
Bayes' Rule | Using the joint probability formulas
\$P(a wedge b) = P(a|b) \ P (b) \$ |
Bayesian Networks | A Bayesian Network is a directed graph, each node represents a random variable. Each edge from node \$X\$ to \$Y\$ represent a parent child relationship. The probability distribution of \$Y\$ depends on the value of \$X\$. Each node \$Y\$ has probability distribution \$P(Y | Parents(Y))\$. |
negation | \$P(not a) = 1 - P(a)\$ |
inclusion-exclusion | \$P(a vee b) = P(a) + P(b) - P(a wedge b)\$ |
marginalization | for a simple boolean value b: |
conditioning | for a single boolean value b: |
sample | To sample means randomly picking a value for a variable according to its probability distribution. |
markov assumption | The Markov assumption is an assumption that the current state depends on only a finite fixed number of previous states. |
markov chain | A Markov chain is a sequence of random variables where the distribution of each variable follows the Markov assumption. That is, each event in the chain occurs based on the probability of the event before it. |
transition model | the transition model of a markov chain creates the probability distributions of the next event based on the possible values of the current event. |
hidden markov model | A hidden Markov model is a type of Markov model for a system with hidden states that generate some observed event. This means that sometimes, the AI has some measurement of the world but no access to the precise state of the world. |
Example calculations
Using joint probabilities, to deduce conditional probability
given:
C | R | probability |
\$text{clouds}\$ | \$text{rain}\$ | \$P(C = text{clouds}, R = text{rain}) = 0.08\$ |
\$text{clouds}\$ | \$not \ text{rain}\$ | \$P(C = text{clouds}, R = not text{rain}) = 0.32 \$ |
\$not \ text{clouds}\$ | \$text{rain}\$ | \$P(C = not text{clouds}, R = text{rain}) = 0.02 \$ |
\$not \ text{clouds}\$ | \$not \ text{rain}\$ | \$P(C = not text{clouds}, R = not text{rain}) = 0.58 \$ |
We can deduce the probability distribution for clouds given we have rain:
\$P(C = clouds | text{rain})
= (P(C = clouds , text{rain})) / (P(text{rain})) \$
with \$alpha = P(text{rain})\$:
\$ = alpha P(C = clouds , text{rain})) \$
with cloud distribution for \$R = text{rain}\$:
\$ = alpha ((0.08), (0.02)) \$
with \$sum_(c in C) P(c) = 1\$:
\$ = ((0.8), (0.2)) \$
Inference in Bayesian networks
inference by enumeration
Inference by enumeration is a process of finding the probability distribution
of variable \$X\$ given observed evidence \$e\$ and some hidden variables
\$P(X \| e) = alpha P(X , e) = alpha sum_y P(X, e, y)\$.
We sum the normalized probability of \$X\$, \$e\$, and \$y\$,
where \$y\$ takes each time a different
value of the hidden variables \$Y\$.
drawback is a high computational effort needed
approximate inference by sampling
Starting to sample the variables in the nodes of a Bayesian network, starting at the top-level node, using the probability distributions of the nodes according to the variable values of the parent nodes. We get a number of samples that can be used for inference.
- \$P(A = a)\$
can be calculated by counting the samples with the value of \$A = a\$ divided by the total number of collected samples. - \$P(A = a \| B = b)\$
can be calculated by counting the number of samples with the value for \$A = a\$ and \$B = b\$ divided by the number of samples with the value \$B = b\$
drawback is that for a low probability of \$B = b\$ we lose a lot of samples
likelihood weighting
- Start by fixing the values for evidence variables.
- Sample the non-evidence variables using conditional probabilities in the Bayesian network.
- Weight each sample by its likelihood: the probability of all the evidence occurring.