If we have sequence of random variables \(X_1, X_2, ... X_n\), we can define Sample Mean \(\bar X\) as their average:

$$\bar X = {1 \over n}(X_1 + X_2 + ... + X_n)$$

The mean and variance of \(\bar X\) are easily calculated by using basic theorems.

So, let \(X_1, X_2, ... X_n\) be a random sample from a distribution with mean μ and variance \(σ^2\) and \(\bar X\) - sample mean. Then \(E(\bar X) = μ\) and \(Var(\bar X) = σ^2 / n\).

**Proof:** If we know for Y = aX + b, we have E(Y) = aE(X) + b and \(E(X_1 + ... + X_n) = E(X_1) + ... + E(X_n)\), it follows that:

$$E(\bar X) = {1 \over n}\sum_{i=1}^n E(X_i) = {1 \over n} * nμ = μ.$$

Furthermore, since these random variables are independent and we know for Y = aX + b, we have \(Var(X) = a^2Var(X)\) and \(Var(X_1 + ... + X_n) = Var(X_1) + ... + Var(X_n)\), it follows that:

$$Var(\bar X) = {1 \over n^2}Var(\sum_{i=1}^n X_i)$$

$$= {1 \over n^2}\sum_{i=1}^n Var(X_i) = {1 \over n^2} * nσ^2 = {σ^2 \over n}.$$

In words, the mean of \(\bar X\) is equal to the mean of the distribution from which the random sample was drawn, but the variance of \(\bar X\) is only \({1 \over n}\) times the variance of that distribution. It follows that the probability distribution of \(\bar X\) will be more concentrated around the mean value than was the original one. In other words, the sample mean \(\bar X\) is more likely to be close to the mean than is the value of just a **single** observation \(\bar X\) from the given distribution.

By applying Chebyshev inequality to \(\bar X\), we get:

$$P(|\bar X - μ| \ge t) \le {σ^2 \over nt^2}.$$