Mathematica Binomial Pdf Evaluate At X

WILD 502 schedule

The binomial distribution is a finite discrete distribution. The binomial distribution arises in situations where one is observing a sequence of what are known as Bernoulli trials. A Bernoulli trial is an experiment which has exactly two possible outcomes: success and failure. Further the probability of success is a fixed number \(p\) which does not change no matter how many times we conduct the experiment. A binomial distributed variable counts the number of successes in a sequence of \(N\) independent Bernoulli trials. The probability of a success (head) is denoted by \(p\). For \(N\) trials we can obtain between \(0\) and \(N\) successes.

Binomial probability density function

A representative example of a binomial probability density function (pdf) is plotted below for a case with \(p = 0.3\) and \(N = 12\), and provides the probability of observing -1, 0, 1, …, 11, or 12 heads. Note, as expected, there is 0 probability of obtaining fewer than 0 heads or more than 12 heads. The probability density is at a peak for the case where we observe 3 or 4 heads in 12 trials. For this discrete distribution, the heights of the bars represent the probability of observing each of the outcomes and sum to 1.

            x1  <- -1:13 df <- data.frame(x = x1, y = dbinom(x1, 12, 0.3)) ggplot(df, aes(x = x, y = y)) +    geom_bar(stat = "identity", col = "gray", fill = "gray") +    scale_y_continuous(expand = c(0.01, 0)) +    xlab("") +    ylab("Probability Density Function") +    scale_x_continuous("", limits = c(-1, 13), breaks = seq(-1, 13, by = 1)) +   theme_bw()

The probability density function for a quantity having a binomial distribution is given by:

\[f(y|N,p) = {N \choose y} p^y(1-p)^{N-y}\] Here \(y\) and \(N\) are positive integers and \(p\) is a probability, i.e., \(0<p<1\). The numbers \(N\) and \(p\) determine the distribution: the number of trials and the probability of success in each trial, respectively, and \(f(y|N,p)\) represents the probability of observing exactly \(y\) successes out of \(N\) trials. Here, I've used \(y\) for the number of successes but we might see others use \(k\) or \(n\) instead. The Binomial Coefficient, \({N \choose y}\), calculates how many ways a sample size of \(y\) successes can be obtained in \(N\) trials without replacement, and \(p^y(1-p)^{N-y}\) calculates how probable such an outcome is.

So, if we know that adult female red foxes in the Northern Range of Yellowstone National Park have a true underlying annual survival rate of 0.65, we can calculate the probability that different numbers of females will survive 1 year. Imagine that there are 25 adult females in the population at the start of the year. What's the probability that \(0, 1, 2, \ldots, 25\) will survive? In R, we can calculate this easily using the dbinom function. Notice that \(25 \cdot 0.65\) = 16.25 and that the maximum probabilities in the table are associate d with either 16 or 17 survivors. Also, the probabilities do sum to 1.

            survivors <- seq(0, 25) p.d.f <- dbinom(survivors, 25, 0.65) # dbinom(y, N, p) out <- cbind(survivors, p.d.f) kable(out, digits =  3) %>%   kable_styling(bootstrap_options = c("striped", "condensed"))

survivors	p.d.f
0	0.000
1	0.000
2	0.000
3	0.000
4	0.000
5	0.000
6	0.000
7	0.000
8	0.001
9	0.002
10	0.006
11	0.016
12	0.035
13	0.065
14	0.103
15	0.141
16	0.163
17	0.161
18	0.133
19	0.091
20	0.051
21	0.022
22	0.008
23	0.002
24	0.000
25	0.000

Generate random values from the binomial distribution

In lecture 2, the notes pointed out that when working with probability distributions, one can work with either of 2 different questions: the probability question or the estimation question. Here, we are working with the probability question: given a probability distribution with known parameters, how likely are various outcomes. Specifically, we'll declare the parameter values for the binomial pdf to generate data and evaluate properties of those data.

We can readily simulate outcomes for this population in R using rbinom command and providing it values for the number of trials to conduct (e.g., the number of fox studies to replicate), the size of each trial (e.g., the number of foxes in each study), and the probability of success (\(p\) or, in the fox example, the annual survival rate): rbinom(1, 25, 0.65) yields the outcome of 1 random trial of size 25 where all 25 have a probability of success of 0.65. We expect to observe 16.25 survivors on average but, due to sampling error, results will vary and be distributed in accordance with the binomial pdf shown above. Here, when executed, we obtained the following result: 18 survivors. Executing the command a 2nd time yielded 18 survivors. Executing the command a 3rd time yielded 19 survivors.

We can efficiently obtain the results of many trials, i.e., replicate samples, by issuing a command like rbinom(100, 25, 0.65), which provides the results of 100 different studies of size 25 where each outcome has a probability of success of 0.65. We can calculate summary statistics of interest and/or plot a histogram of the results. Here, the code simply outputs the results of the 100 studies, which lets us see that most of the 100 studies had results centered around 16 successes but that there was variation in the number of survivors. Here, when executed, the code produced the following results:

19, 20, 14, 19, 16, 15, 16, 17, 10, 21, 18, 17, 17, 12, 18, 16, 15, 17, 15, 19, 18, 15, 14, 16, 18, 16, 21, 18, 15, 17, 15, 15, 15, 15, 18, 15, 15, 18, 16, 15, 18, 13, 16, 14, 16, 19, 19, 13, 15, 21, 18, 17, 14, 22, 16, 19, 19, 10, 14, 15, 18, 17, 18, 13, 18, 20, 17, 16, 17, 15, 17, 12, 18, 16, 21, 11, 19, 16, 19, 13, 18, 14, 13, 19, 14, 17, 18, 12, 19, 20, 18, 16, 15, 17, 14, 19, 18, 15, 20, 19.

For those 100 trials, the mean was 16.5.

Estimate the parameters of the binomial distribution

Next, we'll address the estimation question: given observed data, what are the most likely parameters for the distribution that we think (hypothesize) is a good approximation of the process that generated the data. Now, rather than generating data from the distribution when its parameter values are known, we are estimating the parameters from data using Maximum Likelihood Estimation. For this simple example (model) with a single underlying probability of success that's common to all individuals in the study, we're estimating how likely various values of \(p\) are given the observed data. To do this, we use the Binomial likelihood, which is:

\[\mathcal{L}(p|N,y) = {N \choose y} p^y(1-p)^{N-y}\] Compare this to the Binomial pdf from above: \[f(y|N,p) = {N \choose y} p^y(1-p)^{N-y}\]

Notice that we've switched the left-hand side of the equation in 2 ways. It now reads 'the likelihood (\(\mathcal{L}\)) of survival probability \(p\) given that \(N\) individuals were released and that \(y\) survived'. Note: the vertical bar in the left-hand side of the equation is read as "given". Also, note that the right-hand side is identical to what we had in the Binomial pdf.

Imagine that you radio-collared 20 female foxes, studied their survival for 1 year, and found that 6 of 20 survived, i.e., 6/20 or 0.3. Let's work with the likelihood equation to see if it produces 0.3 as the Maximum Likelihood Estimate (MLE) of survival given the model (a common survival rate for all foxes) and the data (\(N=20, y=6\)). The code block below works out the \(\mathcal{L}\) for a sequence of values of \(p\) and plots the relationship between the 2 and indicates that the most likely estimate of \(p\), i.e., \(\hat{p}\), is 0.30.

            p <- seq(from = 0, to = 1, by = 0.01) N <- 20 y <- 6 # the function 'choose' is used to calculate the binomial coefficient L <- choose(20, 6) * p^y * (1-p)^(N-y) plot(p, L) abline(v = 0.3, col = "blue", lty = 3)

The peakedness of the likelihood surface also gives us information on how likely other values of \(\hat{p}\) are. The steeper the curve is as it approaches the MLE, the more certain we are of the MLE, i.e., the smaller the variance. Sample size affects that steepness as shown below for an example where we studied 80 foxes and had 24 survive (4x as much data). Notice that more of the area under the curve is concentrated around the MLE, e.g., between 0.2 and 0.4, than it was for the smaller sample. As expected, as sample size increases, our uncertainty decreases, which will result in smaller values for \(SE(\hat{p})\) and narrower confidence intervals for \(\hat{p}\).

            p <- seq(from = 0, to = 1, by = 0.01) N <- 80 y <- 24 # the function 'choose' is used to calculate the binomial coefficient L <- choose(80, 24) * p^y * (1-p)^(N-y) plot(p, L) abline(v = 0.3, col = "blue", lty = 3)

In almost all of the estimation work we'll do this semester, we'll rely on software to find the MLE's for us. This is because most of the likelihoods we'll be using are more complex than this one and won't have simple, closed-form solutions for estimating the parameters like those presented in your reading for the binomial distribution this week, i.e., \(\hat{p} = y/N\) and \(var(\hat{p}) = \frac{\hat{p}\cdot(1-\hat{p})}{N}\). Also, we will typically work with log-Likelihoods (\(ln\mathcal{L}\)) largely because of analytic advantages that come about when we take logs of these equations. For the binomial likelihood, if we work on the natural logarithm scale, we obtain: \[ln\mathcal{L}(p|y,N)= ln(C) + y\cdot ln(p) + (N-y)\cdot ln(1-p),\] where C is a constant representing the binomial coefficient's value which is often removed to simplify calculations.

The method of maximum likelihood also extends to problems where we're trying to estimate >1 parameter at a time. For example, when estimating survival from resightings of marked animals in a situation where some marked animals might go undetected on some surveys, we have to estimate both detection probability (\(p\)) and apparent survival rate (\(\phi\)). For a model that constrains \(p\) and \(\phi\) to be constant through time, we estimate those 2 parameters by finding the pair of values for \(p\) and \(\phi\) that are most likely. Visually, the likelihood surface can be presented as a contour map with likelihood values represented by the contour lines and colors (warmer colors represent higher values) and various values of detection probability and survival rate appearing along the x and y axes as shown below in a copy of a figure in your assigned reading for this week of chapter 1 of Cooch and White.

            include_graphics("MLE-phi-p_Cooch-White-ch1.png")

The plot reveals another interesting feature of the estimation. Notice that the likelihood contours have an elliptical shape that's tilted downwards. This indicates that there is a negative covariance in the estimates for the 2 parameters. This makes sense: if animals are not seen again after being released, if detection of survivors is very high in the study, then it's very unlikely that you missed any survivors and so estimated survival rate is lower. In contrast, if detection is quite low, then it's quite possible that animals that you didn't observe again were actually alive and missed. Much more on this in the weeks ahead!

WILD 502 schedule