According to Wikipedia:

M&M’s are multi-colored button-shaped chocolates, each of which has the letter “m” printed in lower case in white on one side, consisting of a candy shell surrounding a filling which varies depending upon the variety of M&M’s. The original candy has a semi-sweet chocolate filling which, upon introduction of other variations, was branded as the “plain, normal” variety.

The candy originated in the United States in 1941. They are produced in different colors, some of which have changed over the years. The candy-coated chocolate concept was inspired by a method used to allow soldiers in the Spanish Civil War (1936–1939) to carry chocolate in warm climates without it melting. A traditional milk chocolate M&M weighs about 0.91 grams and has about 4.7 calories of food energy’’.

Let’s start this practical session by making inference about the proportion of red M$\&$M’s in a very huge box. If you are following this practice in person, the instructor will give you a bag of M$\&$M’s to make the practice more fun even if we have to be a bit flexible about the theoretical conditions of the experiment. If you are working on line it will be a bit more boring but we will comply better with the theoretical conditions of the experiment.

For those of us who physically have the bag of M$\&$M’s, the most appropriate experiment would be to take candies out of the bag one at a time and replace them, recording whether their colour is red or not, until the number $n$ of candies defined in advance is reached. We consider $n$=20.

In case you are on line or you don’t want to carry out the experiment physically, you can use the following result of the experiment:

`mm_sample <- data.frame(n = 20, r = 4) `

Let $Y$ be the random variable that describes the number of red M$\&$M’s (successes) out of the $n=20$ sampled. Given the proportion $\theta$ of red candies in the bag, $Y$ is a binomial random variable with parameters $n=20$ and probability $\theta$ associated with drawing a red candy.

$Y \mid \theta \sim \mbox{Bi}(n, \theta)$

whose conditional probability function, expectation and variance are:

- $P(Y=r \mid \theta)=\binom{n}{r}\,\theta^r\, (1-\theta)^{n-r}, \,\,r=0,1,\ldots, n$
- $E(Y\mid \theta)=n\,\theta$,
- $Var(Y\mid \theta)=n\, \theta\, (1-\theta)$

Recall that the beta distribution is conjugate with respect to the binomial probability model. If we elicit a prior beta distribution Be$(\alpha_0, \beta_0)$ for $\theta$, its density is

$\pi(\theta \mid \alpha_0, \beta_0)=\frac{\Gamma(\alpha_0+\beta_0)}{\Gamma(\alpha_0)\,\Gamma(\beta_0)}\, \theta^{\alpha_0-1}\,(1-\theta)^{\beta_0-1}, \quad 0<\theta<1$ with expectation and variance

- $\mathbb{E}(\theta \mid \alpha_0, \beta_0)=\alpha_0 / (\alpha_0+\beta_0)$,
- Var$(\theta \mid \alpha_0, \beta_0)= \alpha_0\,\beta_0 / [(\alpha_0+\beta_0)^2\,(\alpha_0+\beta_0+1)]$

We will start by working with the non-informative prior distribution Be$(\alpha_0=0.5, \beta_0=0.5)$ whose prior expectation and standard deviation is $\mathbb{E}(\theta)=0.5$ and $\mathbb{SD}(\theta)=0.3535$, respectively. Its graphic is obtained via

```
curve(
dbeta(x, 0.5, 0.5),
col = "dodgerblue", lwd = 4,
ylim = c(0, 4),
xlab = expression(paste("Proportion of red candies ", theta)),
ylab = 'prior'
)
```

The likelihood function of $\theta$ is a function of $\theta$ whose expression requires the observed data $\mathcal D = \{n, r\}$. It is defined as follows

$L(\theta \mid \mathcal D) = P(Y=r \mid \theta, n) = \binom{n}{r}\, \theta^r\,(1- \theta)^{n-r},$

where
$r$
is the *observed* number of successes in the sample (here, the
number of red candies) of size
$n$.
In our first experiment, the likelihood function of
$\theta$
for the data
$\mathcal D_1 = \{r_1 = 4 \text{ red candies in } n_1 = 20 \text{ sampled}\}$
is

$L(\theta \mid \mathcal D_1) = \binom{20}{4}\, \theta^4\,(1- \theta)^{16},$

```
r <- mm_sample$r
n <- mm_sample$n
curve(
choose(n, r) * x^r * (1 - x)^(n - r),
col = "darkorange", lwd = 4,
xlab = expression(paste("Proportion of red candies ", theta)),
ylab = 'likelihood'
)
```

The posterior distribution of $\theta$ is also a beta distribution with parameters

$\pi(\theta \mid \mathcal{D})= \mbox{Be}(\alpha=\alpha_0+r, \beta= \beta_0+n-r).$

In our case, the posterior distribution of the proportion of red candies is

$\pi(\theta \mid \mathcal{D_1})= \mbox{Be}(\alpha=4.5, \beta= 16.5)$ whose graphic is

```
curve(
dbeta(x, mm_sample$r + 0.5, mm_sample$n - mm_sample$r + 0.5),
col = "darkgreen", lwd = 4,
xlim = c(0, 1), ylim = c(0, 5),
xlab = expression(paste("Proportion of red candies ", theta)),
ylab = 'posterior'
)
```

The posterior mean and posterior standard deviation of the proportion of red candies is

$\mathbb{E}(\theta \mid \mathcal D)=\frac{4.5}{4.5+16.5}=0.2143, \quad\quad \mathbb{SD}(\theta \mid \mathcal D)=\sqrt{\frac{4.5 \cdot 16.5}{(4.5+16.5)^2 \cdot (4.5+16.5+1)}}=0.0875.$ A 95$\%$ credible interval for $\theta$ is

and the posterior probability that the proportion of red candies is between 0.1 and 0.3 is

You can see these posterior summaries in the Figure below

Recall that the posterior predictive distribution of the number of successes $Y^{\prime}$ in a new sample of size $n^{\prime}$ is a beta-binomial distribution $\text{Bb}(\alpha,\, \beta,\, n^{\prime})$ with parameters $\alpha$ and $\beta$ from the posterior distribution $\pi(\theta \mid \mathcal D)$ and the size $n^{\prime}$ of the new experiment. Its probability function is

$P(Y^{\prime} = r^{\prime} \mid \mathcal{D}) = \binom{n^\prime}{r^\prime}\, \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \,\Gamma(\beta)} \,\frac{\Gamma(\alpha+r^{\prime})\,\Gamma(\beta+n^{\prime}-r^{\prime})}{\Gamma(\alpha+\beta+n^{\prime})}, \,\, r^{\prime}=0,1,\ldots, n^{\prime}$

with expectation and variance,

- $\mathbb{E}(Y^{\prime}\mid \mathcal{D})=n^{\prime}\,\frac{\alpha}{(\alpha+\beta)},$
- $\mathbb{V}(Y^{\prime}\mid \mathcal{D})=n^{\prime}\,\frac{\alpha\,\beta\,(\alpha+\beta+n^{\prime})}{(\alpha+\beta)^{2}\,(\alpha+\beta+1)}.$

Suppose now that we are going to randomly draw $n^{\prime}=10$ new M$\&$M’s from the bag. Before performing the experiment, the posterior predictive distributions quantifies the relative plausibility of each possible outcome. It is a beta-binomial distribution with parameters $\alpha=4.5$ and $\beta=16.5$ from the posterior distribution $\pi(\theta \mid \mathcal D)$, and $n^{\prime}=10$, which is the size of the new experiment.

```
library(extraDistr)
curve(
dbbinom(x, size = 10, mm_sample$r + 0.5, mm_sample$n - mm_sample$r + 0.5),
col = "purple", lwd = 4, type = 'h', n = 11,
xlim = c(0, 10), ylim = c(0, 0.4),
xlab = expression(paste("Number of red candies")),
ylab = 'probability'
)
```

We observe that the most likely numbers of red candies in the new sample are 1 and 2. The posterior mean and posterior standard deviation of the number of red candies is this new sample is

$\mathbb{E}(Y^{\prime} \mid \mathcal D)= 2.14, \quad\quad \mathbb{SD}(Y^{\prime} \mid \mathcal D)= 1.54.$

We propose below an individual exercise that pursues that we can
consolidate the basic concepts that we have learned in the previous
theoretical session and that we have been practising in this session.
You can code the exercise yourself in R using the examples above, or use
`vibass_app(1)`

to launch the interactive application
provided in the `vibass`

R-package.

**EXERCISE**

You have two friends, Arnau and Mary, who know a little bit about Bayesian inference and a lot about M$\&$M’s. Their opinions about the proportion $\theta$ of red candies are expressed in terms of the following prior distributions: $\begin{align*} \mbox{Arnau's prior distribution: } & \pi^{(A)}(\theta) = \mbox{Be}(4,8) \\ \mbox{Mary's prior distribution: } & \pi^{(M)}(\theta) = \mbox{Be}(5,95) \end{align*}$

How different are your two friends’ opinions on $\theta$? A good idea to answer this question would be to plot both densities and calculate the mean and standard deviation of the subsequent prior distributions.

From the results of the previous experiment ($r= 4$ red candies out of a total of $n=20$ sampled), compare the posterior distribution for $\theta$ that Mary and Arnau would obtain.

Compare both posterior distributions with the one we have obtained in Subsection 2.5 of the practical through their density curves, posterior means, standard deviations and 95% credibility intervals.

Compute the posterior predictive distribution for the new experiment described in Subsection 2.6 (to randomly draw $n^{\prime}=10$ new M$\&$M’s from the bag) and compare both posterior predictive distributions with the one we have obtained in Subsection 2.6 of the practical through their density curves, posterior means and standard deviations.