Introducing M&\&M’s

According to Wikipedia:

M&M’s are multi-colored button-shaped chocolates, each of which has the letter “m” printed in lower case in white on one side, consisting of a candy shell surrounding a filling which varies depending upon the variety of M&M’s. The original candy has a semi-sweet chocolate filling which, upon introduction of other variations, was branded as the “plain, normal” variety.

The candy originated in the United States in 1941. They are produced in different colors, some of which have changed over the years. The candy-coated chocolate concept was inspired by a method used to allow soldiers in the Spanish Civil War (1936–1939) to carry chocolate in warm climates without it melting. A traditional milk chocolate M&M weighs about 0.91 grams and has about 4.7 calories of food energy’’.

A pile of peanut M&M's candies. Photo by Victor Roda.

A pile of peanut M&M’s candies. Photo by Victor Roda.

Bayesian inference for the proportion of red M&\&M’s

The experiment: Counting red M&\&M’s

Let’s start this practical session by making inference about the proportion of red M&\&M’s in a very huge box. If you are following this practice in person, the instructor will give you a bag of M&\&M’s to make the practice more fun even if we have to be a bit flexible about the theoretical conditions of the experiment. If you are working on line it will be a bit more boring but we will comply better with the theoretical conditions of the experiment.

Representation of the sampling experiment.

Representation of the sampling experiment.

For those of us who physically have the bag of M&\&M’s, the most appropriate experiment would be to take candies out of the bag one at a time and replace them, recording whether their colour is red or not, until the number nn of candies defined in advance is reached. We consider nn=20.

In case you are on line or you don’t want to carry out the experiment physically, you can use the following result of the experiment:

mm_sample <- data.frame(n = 20, r = 4) 

The sampling model is Binomial

Let YY be the random variable that describes the number of red M&\&M’s (successes) out of the n=20n=20 sampled. Given the proportion θ\theta of red candies in the bag, YY is a binomial random variable with parameters n=20n=20 and probability θ\theta associated with drawing a red candy.

YθBi(n,θ)Y \mid \theta \sim \mbox{Bi}(n, \theta)

whose conditional probability function, expectation and variance are:

  • P(Y=rθ)=(nr)θr(1θ)nr,r=0,1,,nP(Y=r \mid \theta)=\binom{n}{r}\,\theta^r\, (1-\theta)^{n-r}, \,\,r=0,1,\ldots, n
  • E(Yθ)=nθE(Y\mid \theta)=n\,\theta,
  • Var(Yθ)=nθ(1θ)Var(Y\mid \theta)=n\, \theta\, (1-\theta)
Binomial probability function for two values of $\theta$ and $n=20$ observations.

Binomial probability function for two values of θ\theta and n=20n=20 observations.

A prior distribution for θ\theta

Recall that the beta distribution is conjugate with respect to the binomial probability model. If we elicit a prior beta distribution Be(α0,β0)(\alpha_0, \beta_0) for θ\theta, its density is

π(θα0,β0)=Γ(α0+β0)Γ(α0)Γ(β0)θα01(1θ)β01,0<θ<1\pi(\theta \mid \alpha_0, \beta_0)=\frac{\Gamma(\alpha_0+\beta_0)}{\Gamma(\alpha_0)\,\Gamma(\beta_0)}\, \theta^{\alpha_0-1}\,(1-\theta)^{\beta_0-1}, \quad 0<\theta<1 with expectation and variance

  • 𝔼(θα0,β0)=α0/(α0+β0)\mathbb{E}(\theta \mid \alpha_0, \beta_0)=\alpha_0 / (\alpha_0+\beta_0),
  • Var(θα0,β0)=α0β0/[(α0+β0)2(α0+β0+1)](\theta \mid \alpha_0, \beta_0)= \alpha_0\,\beta_0 / [(\alpha_0+\beta_0)^2\,(\alpha_0+\beta_0+1)]

We will start by working with the non-informative prior distribution Be(α0=0.5,β0=0.5)(\alpha_0=0.5, \beta_0=0.5) whose prior expectation and standard deviation is 𝔼(θ)=0.5\mathbb{E}(\theta)=0.5 and 𝕊𝔻(θ)=0.3535\mathbb{SD}(\theta)=0.3535, respectively. Its graphic is obtained via

curve(
  dbeta(x, 0.5, 0.5),
  col = "dodgerblue", lwd = 4,
  ylim = c(0, 4),
  xlab = expression(paste("Proportion of red candies ", theta)),
  ylab = 'prior'
)

The likelihood function of θ\theta

The likelihood function of θ\theta is a function of θ\theta whose expression requires the observed data 𝒟={n,r}\mathcal D = \{n, r\}. It is defined as follows

L(θ𝒟)=P(Y=rθ,n)=(nr)θr(1θ)nr,L(\theta \mid \mathcal D) = P(Y=r \mid \theta, n) = \binom{n}{r}\, \theta^r\,(1- \theta)^{n-r},

where rr is the observed number of successes in the sample (here, the number of red candies) of size nn. In our first experiment, the likelihood function of θ\theta for the data 𝒟1={r1=4 red candies in n1=20 sampled}\mathcal D_1 = \{r_1 = 4 \text{ red candies in } n_1 = 20 \text{ sampled}\} is

L(θ𝒟1)=(204)θ4(1θ)16,L(\theta \mid \mathcal D_1) = \binom{20}{4}\, \theta^4\,(1- \theta)^{16},

r <- mm_sample$r
n <- mm_sample$n
curve(
  choose(n, r) * x^r * (1 - x)^(n - r),
  col = "darkorange", lwd = 4,
  xlab = expression(paste("Proportion of red candies ", theta)),
  ylab = 'likelihood'
)

The posterior distribution of θ\theta

The posterior distribution of θ\theta is also a beta distribution with parameters

π(θ𝒟)=Be(α=α0+r,β=β0+nr).\pi(\theta \mid \mathcal{D})= \mbox{Be}(\alpha=\alpha_0+r, \beta= \beta_0+n-r).

In our case, the posterior distribution of the proportion of red candies is

π(θ𝒟1)=Be(α=4.5,β=16.5)\pi(\theta \mid \mathcal{D_1})= \mbox{Be}(\alpha=4.5, \beta= 16.5) whose graphic is

curve(
  dbeta(x, mm_sample$r + 0.5, mm_sample$n - mm_sample$r + 0.5),
  col = "darkgreen", lwd = 4,
  xlim = c(0, 1), ylim = c(0, 5), 
  xlab = expression(paste("Proportion of red candies ", theta)),
  ylab = 'posterior'
)

The posterior mean and posterior standard deviation of the proportion of red candies is

𝔼(θ𝒟)=4.54.5+16.5=0.2143,𝕊𝔻(θ𝒟)=4.516.5(4.5+16.5)2(4.5+16.5+1)=0.0875.\mathbb{E}(\theta \mid \mathcal D)=\frac{4.5}{4.5+16.5}=0.2143, \quad\quad \mathbb{SD}(\theta \mid \mathcal D)=\sqrt{\frac{4.5 \cdot 16.5}{(4.5+16.5)^2 \cdot (4.5+16.5+1)}}=0.0875. A 95%\% credible interval for θ\theta is

qbeta(c(0.025, 0.975), 4.5, 16.5)
#> [1] 0.07152005 0.40822576

and the posterior probability that the proportion of red candies is between 0.1 and 0.3 is

pbeta(0.3, 4.5, 16.5) - pbeta(0.1, 4.5, 16.5)
#> [1] 0.7571733

You can see these posterior summaries in the Figure below

Posterior summaries for $\theta$.

Posterior summaries for θ\theta.

The posterior predictive distribution for the results of a new experiment

Recall that the posterior predictive distribution of the number of successes YY^{\prime} in a new sample of size nn^{\prime} is a beta-binomial distribution Bb(α,β,n)\text{Bb}(\alpha,\, \beta,\, n^{\prime}) with parameters α\alpha and β\beta from the posterior distribution π(θ𝒟)\pi(\theta \mid \mathcal D) and the size nn^{\prime} of the new experiment. Its probability function is

P(Y=r𝒟)=(nr)Γ(α+β)Γ(α)Γ(β)Γ(α+r)Γ(β+nr)Γ(α+β+n),r=0,1,,nP(Y^{\prime} = r^{\prime} \mid \mathcal{D}) = \binom{n^\prime}{r^\prime}\, \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \,\Gamma(\beta)} \,\frac{\Gamma(\alpha+r^{\prime})\,\Gamma(\beta+n^{\prime}-r^{\prime})}{\Gamma(\alpha+\beta+n^{\prime})}, \,\, r^{\prime}=0,1,\ldots, n^{\prime}

with expectation and variance,

  • 𝔼(Y𝒟)=nα(α+β),\mathbb{E}(Y^{\prime}\mid \mathcal{D})=n^{\prime}\,\frac{\alpha}{(\alpha+\beta)},
  • 𝕍(Y𝒟)=nαβ(α+β+n)(α+β)2(α+β+1).\mathbb{V}(Y^{\prime}\mid \mathcal{D})=n^{\prime}\,\frac{\alpha\,\beta\,(\alpha+\beta+n^{\prime})}{(\alpha+\beta)^{2}\,(\alpha+\beta+1)}.

Suppose now that we are going to randomly draw n=10n^{\prime}=10 new M&\&M’s from the bag. Before performing the experiment, the posterior predictive distributions quantifies the relative plausibility of each possible outcome. It is a beta-binomial distribution with parameters α=4.5\alpha=4.5 and β=16.5\beta=16.5 from the posterior distribution π(θ𝒟)\pi(\theta \mid \mathcal D), and n=10n^{\prime}=10, which is the size of the new experiment.

library(extraDistr)
curve(
  dbbinom(x, size = 10, mm_sample$r + 0.5, mm_sample$n - mm_sample$r + 0.5),
  col = "purple", lwd = 4, type = 'h', n = 11,
  xlim = c(0, 10), ylim = c(0, 0.4), 
  xlab = expression(paste("Number of red candies")),
  ylab = 'probability'
)

We observe that the most likely numbers of red candies in the new sample are 1 and 2. The posterior mean and posterior standard deviation of the number of red candies is this new sample is

𝔼(Y𝒟)=2.14,𝕊𝔻(Y𝒟)=1.54.\mathbb{E}(Y^{\prime} \mid \mathcal D)= 2.14, \quad\quad \mathbb{SD}(Y^{\prime} \mid \mathcal D)= 1.54.

Time for individual work

We propose below an individual exercise that pursues that we can consolidate the basic concepts that we have learned in the previous theoretical session and that we have been practising in this session. You can code the exercise yourself in R using the examples above, or use vibass_app(1) to launch the interactive application provided in the vibass R-package.

EXERCISE

You have two friends, Arnau and Mary, who know a little bit about Bayesian inference and a lot about M&\&M’s. Their opinions about the proportion θ\theta of red candies are expressed in terms of the following prior distributions: Arnau’s prior distribution: π(A)(θ)=Be(4,8)Mary’s prior distribution: π(M)(θ)=Be(5,95)\begin{align*} \mbox{Arnau's prior distribution: } & \pi^{(A)}(\theta) = \mbox{Be}(4,8) \\ \mbox{Mary's prior distribution: } & \pi^{(M)}(\theta) = \mbox{Be}(5,95) \end{align*}

  1. How different are your two friends’ opinions on θ\theta? A good idea to answer this question would be to plot both densities and calculate the mean and standard deviation of the subsequent prior distributions.

  2. From the results of the previous experiment (r=4r= 4 red candies out of a total of n=20n=20 sampled), compare the posterior distribution for θ\theta that Mary and Arnau would obtain.

  3. Compare both posterior distributions with the one we have obtained in Subsection 2.5 of the practical through their density curves, posterior means, standard deviations and 95% credibility intervals.

  4. Compute the posterior predictive distribution for the new experiment described in Subsection 2.6 (to randomly draw n=10n^{\prime}=10 new M&\&M’s from the bag) and compare both posterior predictive distributions with the one we have obtained in Subsection 2.6 of the practical through their density curves, posterior means and standard deviations.