These are some of my notes on statistics from the Udemy Data Science Bootcamp. The Python code associated with this section is available here.

# Data Distributions

## Distribution

• Shows possible values a random variable can take and how frequently they occur.

### Frequency distribution

• Discrete data values repeated with various frequencies.
• Pre-established intervals of possible values with frequencies corresponding to the numbers of values in the intervals.

### Relative frequency distribution

• Each frequency of the frequency distribution is divided by the total number of data points in the distribution.

# Random or Probability experiments

• These have a finite number of possible outcomes.
• A particular set of outcomes is called an event.
• Every event has a probability.
• Intersection and union of two or more events is possible.
• Two events can be mutually exclusive if their intersection is 0.
• A random variable is used to describe the outcome of a random experiment in numerical form.
• A discrete random variable has a finite number of possible values because the random experiment has a finite number of possible outcomes.
• A continuous random variable has values that form a continuous interval of real numbers because the random experiment has an uncountable number of possible outcomes.
• A probability distribution shows all the probabilities of all the values of a random variable.
• The probability of a continuous random variable is the area of the region under the curve of a probability distribution function, bounded by the X-axis on one side and by the maximum and minimum values of the random variable interval, for that probability, on the other sides.
• The area of the region under the entire probability distribution curve is 1.
• The mean of a continuous random variable is the point on the X-axis at which the region under the distribution curve would balance perfectly on a fulcrum.
• The median of a continuous random variable is the point on the X-axis at which the area under the distribution curve is split into two regions of equal area.
• In probability theory, the expected value of a random variable is the probability-weighted average.

# Probability distributions

• Probability function: It is a function that assigns a probability to each distinct outcome in the sample space.
• Probability distribution: It is a collection of probabilities for each possible outcome. \begin{alignedat}{1} Y &= Actual \space outcome \\ y &= One \space of \space the \space possible \space outcomes \\ P(Y = y) &= P(y) \end{alignedat}
• Population and sample data:
Population Sample
Mean $\mu$ $\bar{x}$
Variance $\sigma^2$ $s^2$
Standard Deviation $\sigma$ $s$
• Certain distributions share characteristics. So, they are separated into types:
• Discrete distributions
• Continuous distributions

## Discrete distributions

• These have a finite number of outcomes.
• Can add individual values to determine probability of an interval.
• Can be expressed with a table, graph, or a piece-wise function.
• Expected values might be unattainable. $P(Y \leq y) = P(Y < y + 1)$

### Uniform Distribution

• Notation: $Y \text{\textasciitilde} U(a, b) \\ Y \text{\textasciitilde} U(a)$
• All outcomes are equally likely.
• The expected value and variance have no predictive power.
• Example and uses:
• Outcomes of rolling a single die.
• Often used in shuffling algorithms due to its fairness.
• Graph: ### Bernoulli Distribution

• Notation: $Y \text{\textasciitilde} Bern(p) \\ Y \text{\textasciitilde} B(1, p)$
• Consists of a single trial.
• The trial has two possible outcomes.
• Expected value and variance: \begin{alignedat}{1} E(Y) &= \sum_y y. P(Y = y) \\ &= 1 . p + 0 . (1 - p) \\ &\boxed{E(Y) = p} \\ Var(Y) &= E\{[Y - E(Y)]^2\} \\ &= (1 - p)^2 . p + (0 - p)^2 . (1 - p) \\ &= p - p^3 -2p^2 + p^2 - p^3 \\ &= p - p^2 \\ &\boxed{Var(Y) = p.(1 - p)} \end{alignedat}
• Example and uses:
• Guessing a single True / False question.
• Often used when trying to determine what we expect to get out of a single trial of an experiment.
• Graph: ### Binomial Distribution

• Notation: $Y \text{\textasciitilde} B(n, p)$
• Sequence of identical Bernoulli events.
• Measures the frequency of occurrence of one of the possible outcomes over $n$ trials.
• Probability: $P(Y = y) = C(y, n).p^y.(1 - p)^{n-y}$
• Expected value and variance: \begin{alignedat}{1} E(Y) &= n.p \\ Var(Y) &= n.p.(1 - p) \end{alignedat}
• Example and uses:
• Used in determining how many times to expect a heads if a coin is flipped 10 times.
• Used to predict how likely an event is to occur over a series of trials.
• Graph: ### Poisson Distribution

• Notation: $Y \text{\textasciitilde} Po(\lambda)$
• Used to determine the likelihood of a certain event occurring over a given interval of time or distance.
• Measures the frequency over an interval of time or distance (only non-negative values).
• Probability: $P(Y = y) = \frac{\lambda^y.e^{-\lambda}}{y!}$
• Expected value and variance: \begin{alignedat}{1} E(Y) &= \displaystyle\sum_{y = 0}^\infty y.\frac{e^{-\lambda}.\lambda^y}{y!} \\ &= \displaystyle\sum_{y = 1}^\infty \cancel{y}.\frac{e^{-\lambda}.\lambda^y}{_{(y - 1)!}^{\cancel{y!}}} \space \because \space when \space y = 0, E(y) = 0\\ &= \lambda.e^{-\lambda}.\displaystyle\sum_{y = 1}^\infty \frac{\lambda^{y-1}}{(y - 1)!} \\ E(Y) &= \lambda.e^{-\lambda}.\displaystyle\sum_{y = 0}^\infty \frac{\lambda^y}{y!} \space \because \space when \space y = y' - 1, \space then \space the \space limit \space becomes \space y = 0 \\ &= \lambda.\cancel{e^{-\lambda}}.\cancel{e^\lambda} \space \because for \space any \space constant \space "c", \space \sum_{x = 0}^\infty\frac{c^x}{x!} = e^c \\ \boxed{E(Y) = \lambda} \\ \\ Var(Y) &= E(Y^2) - E(Y)^2 \\ &= E((Y).(Y - 1) + Y) - E(Y)^2 \\ &= E((Y).(Y - 1)) + E(Y) - E(Y)^2 \\ Var(Y) &= E((Y).(Y - 1)) + (\lambda - \lambda^2) \\ E((Y).(Y - 1)) &= \displaystyle\sum_{y = 0}^\infty y.(y - 1).\frac{e^{-\lambda}.\lambda^y}{y!} \\ \therefore \space Var(Y) &= \displaystyle\sum_{y = 0}^\infty y.(y - 1).\frac{e^{-\lambda}.\lambda^y}{y!} + (\lambda - \lambda^2) \\ &= \displaystyle\sum_{y = 2}^\infty y.(y - 1).\frac{e^{-\lambda}.\lambda^y}{y!} + (\lambda - \lambda^2) \space \because Var(y) = 0 \space for \space y = 0 \space and \space 1 \\ &= \displaystyle\sum_{y = 2}^\infty \cancel{y.(y - 1)}.\frac{e^{-\lambda}.\lambda^y}{_{(y - 2)!}^{\cancel{y!}}} + (\lambda - \lambda^2) \\ &= \lambda^2.e^{-\lambda}.\displaystyle\sum_{y = 2}^\infty \frac{\lambda^{y - 2}}{(y - 2)!} + (\lambda - \lambda^2) \\ &= \lambda^2.e^{-\lambda}.\displaystyle\sum_{y = 0}^\infty \frac{\lambda^y}{y!} + (\lambda - \lambda^2) \space \because \space when \space y = y' - 2, \space then \space the \space limit \space becomes \space y = 0 \\ &= \lambda^2.\cancel{e^{-\lambda}}.\cancel{e^\lambda} + (\lambda - \lambda^2) \space \because for \space any \space constant \space "c", \space \sum_{x = 0}^\infty\frac{c^x}{x!} = e^c \\ &= \cancel{\lambda^2} + \lambda - \cancel{\lambda^2} \\ \boxed{Var(Y) = \lambda} \end{alignedat}
• Example and uses:
• Used to determine how likely a specific outcome is, knowing how often the event usually occurs.
• Often incorporated in marketing analysis to determine whether above average visits are out of the ordinary or not.
• Graph: ## Continuous distributions

• These have infinitely many consecutive possible values.
• Cannot add the individual values that make up the interval because there are infinitely many of them.
• Can be expressed with a graph or a continuous function. Cannot be expressed with a table.
• Integrals are used to calculate the likelihood of an interval.
• The cumulative distribution functions are important.
• Probability: \begin{alignedat}{1} P(Y = y) &= 0 \\ P(Y < y) &= P(Y \leq y) \end{alignedat}

### Normal distribution / Gaussian distribution

• Notation: $Y \text{\textasciitilde} N(\mu, \sigma^2)$
• Its graph is a bell-shaped, symmetric curve with thin tails.
• Most natural events follow this distribution.
• 68% of the observations are within 1 +/- standard deviation of the mean: $(\mu - \sigma, \mu + \sigma)$.
• 95% of the observations are within 2 +/- standard deviations of the mean: $(\mu - 2\sigma, \mu + 2\sigma)$.
• 99.7% of the observations are within 3 +/- standard deviations of the mean: $(\mu - 3\sigma, \mu + 3\sigma)$.
• Can be standardized to use the Z-table.
• These approximate a wide variety of random variables.
• Distributions of sample means with large enough sample sizes can be approximated to normal distributions.
• All computable statistics are elegant.
• Decisions based on normal distribution insights have a good track record.
• Mean, median, and mode are equal.
• It has no skew.
• Expected value and variance: \begin{alignedat}{1} Probability \space Distribution \space Function \space f(y) &= \frac{1}{\sigma.\sqrt{2\pi}}.e^{\frac{-(y - \mu)^2}{2\sigma^2}} \\ \\ E(Y) &= \displaystyle\int_{-\infty}^\infty y.\frac{1}{\sigma.\sqrt{2\pi}}.e^{\frac{-(y - \mu)^2}{2\sigma^2}}dy \\ E(Y) &= \frac{1}{\sigma.\sqrt{2\pi}}.\displaystyle\int_{-\infty}^\infty y.e^{\frac{-(y - \mu)^2}{2\sigma^2}}dy \\ Let \space t &= \frac{y - \mu}{\sqrt{2}.\sigma} \\ \therefore \space y &= \mu + \sqrt{2}.\sigma.t \\ \frac{dy}{dt} &= \sqrt{2}.\sigma \\ \therefore \space dy &= \sqrt{2}.\sigma dt \\ \therefore \space E(Y) &= \frac{1}{\sigma.\sqrt{2\pi}}.\displaystyle\int_{-\infty}^\infty (\mu + \sqrt{2}.\sigma.t).e^{-t^2}.\sqrt{2}.\sigma dt \\ &= \frac{\cancel{\sqrt{2}}.\cancel{\sigma}}{\cancel{\sigma}.\sqrt{\cancel{2}\pi}}.\displaystyle\int_{-\infty}^\infty (\mu + \sqrt{2}.\sigma.t).e^{-t^2} dt \\ &= \frac{1}{\sqrt{\pi}}.\displaystyle\int_{-\infty}^\infty (\mu + \sqrt{2}.\sigma.t).e^{-t^2} dt \\ &= \frac{1}{\sqrt{\pi}}.\Biggr[\mu\displaystyle\int_{-\infty}^\infty e^{-t^2} dt + \sqrt{2}.\sigma\displaystyle\int_{-\infty}^\infty t.e^{-t^2} dt\Biggr] \\ &= \frac{1}{\sqrt{\pi}}.\Biggr[\mu.\sqrt{\pi} + \sqrt{2}.\sigma.\Biggr(-\frac{1}{2}.e^{-t^2}\Biggr)_{-\infty}^\infty\Biggr] \\ &= \frac{1}{\sqrt{\pi}}.[\mu.\sqrt{\pi} + 0] \space \because \space the \space exponential \space tends \space to \space 0 \\ &= \frac{\mu.\cancel{\sqrt{\pi}}}{\cancel{\sqrt{\pi}}} \\ \boxed{E(Y) = \mu} \\ \\ Var(Y) &= E(Y^2) - [E(Y)]^2 \\ &= E(Y^2) - \mu^2 \\ &= \displaystyle\int_{-\infty}^\infty y^2.\frac{1}{\sigma.\sqrt{2\pi}}.e^{\frac{-(y - \mu)^2}{2\sigma^2}}dy - \mu^2 \\ Var(Y) &= \frac{1}{\sigma.\sqrt{2\pi}}.\displaystyle\int_{-\infty}^\infty y^2.e^{\frac{-(y - \mu)^2}{2\sigma^2}}dy - \mu^2 \\ Let \space t &= \frac{y - \mu}{\sqrt{2}.\sigma} \\ \therefore \space y &= \mu + \sqrt{2}.\sigma.t \\ \frac{dy}{dt} &= \sqrt{2}.\sigma \\ \therefore \space dy &= \sqrt{2}.\sigma dt \\ Var(Y) &= \frac{1}{\sigma.\sqrt{2\pi}}.\displaystyle\int_{-\infty}^\infty (\sqrt{2}.\sigma.t + \mu)^2.e^{-t^2}.\sqrt{2}.\sigma dt - \mu^2 \\ &= \frac{\cancel{\sqrt{2}}.\cancel{\sigma}}{\cancel{\sigma}.\sqrt{\cancel{2}\pi}}.\displaystyle\int_{-\infty}^\infty (\sqrt{2}.\sigma.t + \mu)^2.e^{-t^2} dt - \mu^2 \\ &= \frac{1}{\sqrt{\pi}}.\displaystyle\int_{-\infty}^\infty (\sqrt{2}.\sigma.t + \mu)^2.e^{-t^2} dt - \mu^2 \\ &= \frac{1}{\sqrt{\pi}}.\Biggr[\displaystyle\int_{-\infty}^\infty (2.\sigma^2.t^2 + 2.\sqrt{2}.\sigma.\mu.t + \mu^2).e^{-t^2} dt\Biggr] - \mu^2 \\ &= \frac{1}{\sqrt{\pi}}.\Biggr[2.\sigma^2.\displaystyle\int_{-\infty}^\infty t^2.e^{-t^2} dt + 2.\sqrt{2}.\sigma.\mu.\displaystyle\int_{-\infty}^\infty t.e^{-t^2} dt + \mu^2.\displaystyle\int_{-\infty}^\infty e^{-t^2} dt\Biggr] - \mu^2 \\ &= \frac{1}{\sqrt{\pi}}.\Biggr[2.\sigma^2.\displaystyle\int_{-\infty}^\infty t^2.e^{-t^2} dt + 2.\sqrt{2}.\sigma.\mu.0 + \mu^2.\sqrt{\pi}\Biggr] - \mu^2 \\ &= \frac{1}{\sqrt{\pi}}.\Biggr[2.\sigma^2.\displaystyle\int_{-\infty}^\infty t^2.e^{-t^2} dt\Biggr] + \frac{1}{\cancel{\sqrt{\pi}}}.\mu^2.\cancel{\sqrt{\pi}} - \mu^2 \\ &= \frac{1}{\sqrt{\pi}}.\Biggr[2.\sigma^2.\displaystyle\int_{-\infty}^\infty t^2.e^{-t^2} dt\Biggr] + \mu^2 - \mu^2 \\ &= \frac{2.\sigma^2}{\sqrt{\pi}}.\displaystyle\int_{-\infty}^\infty t^2.e^{-t^2} dt \\ &= \frac{2.\sigma^2}{\sqrt{\pi}}.\Biggr(\Biggr[-\frac{t}{2}.e^{-t^2}\Biggr]_{-\infty}^\infty + \frac{1}{2}.\displaystyle\int_{-\infty}^\infty e^{-t^2} dt\Biggr) \\ &= \frac{2.\sigma^2}{\sqrt{\pi}}.\Biggr(0 + \frac{1}{2}.\displaystyle\int_{-\infty}^\infty e^{-t^2} dt\Biggr) \\ &= \frac{\cancel{2}.\sigma^2}{\sqrt{\pi}}.\frac{1}{\cancel{2}}.\displaystyle\int_{-\infty}^\infty e^{-t^2} dt \\ &= \frac{\sigma^2}{\sqrt{\pi}}.\displaystyle\int_{-\infty}^\infty e^{-t^2} dt \\ &= \frac{\sigma^2}{\cancel{\sqrt{\pi}}}.\cancel{\sqrt{\pi}} \\ \boxed{Var(Y) = \sigma^2} \end{alignedat}
• Example and uses:
• Often observed in the distribution of size of animals in the wilderness.
• Most biological measures are normally distributed like height; length of arms, legs, nails; blood pressure; thickness of tree barks, etc.
• IQ tests.
• Stock market information.
• Heavily used in regression analysis.
• Graph: #### Z-tables / Z-score tables

• Transformation:
• It is a way to alter every element of a distribution to get a new distribution.
• Adding a number to every element of a Normal distribution moves the graph of the distribution to the right. The standard deviation remains unchanged.
• Subtracting a number from every element of a Normal distribution moves the graph of the distribution to the left. The standard deviation remains unchanged.
• Multiplying a number to every element of a Normal distribution causes the graph of the distribution to shrink in width. The mean remains unchanged.
• Dividing a number from every element of a Normal distribution causes the graph of the distribution to expand in width. The mean remains unchanged.
• Controlling for the standard deviation:
• Decreasing the mean moves the graph to the left. This is the same effect as subtracting a number from every element.
• Increasing the mean moves the graph to the right. This is the same effect as adding a number to every element.
• Controlling for the mean:
• Decreasing the standard deviation makes the graph taller by increasing the amount of data in the middle and thinning the tails. This is the same as multiplying a number to every element.
• Increasing the standard deviation makes the graph shorter by decreasing the amount of data in the middle and fattening the tails. This is the same as dividing a number from every element.
• Standardizing:
• A special transformation that converts the $E(X) \space to \space 0$ and the $Var(X) \space to \space 1$.
• After standardizing, a Normal distribution is called a Standard Normal distribution.
• Subtracting the $\mu$ of the original distribution from every element causes the mean of the distribution to change to $0$, thereby moving the mean of the graph to the origin.
• Dividing every resulting element by the $\sigma$ of the original distribution causes the standard deviation to change to $1$, thereby standardizing the peak and tails of the graph
• Reasons for standardizing:
• To compare different normally distributed datasets.
• To detect normality.
• To detect outliers.
• To create confidence intervals.
• To test hypotheses.
• To perform regression analysis.
• A Z-score table is the table of the Cumulative Distribution function of a Normal distribution after it has been standardized.
• Notation of a Standard Normal distribution: $Z \text{\textasciitilde} N(0, 1)$
• Z-value: $z = \frac{y - \mu}{\sigma}$
• In the z-value mentioned above, the numerator transforms the $E(X)$ to $0$ and the denominator transforms the STD(X) to $1$.  #### Students’ T Distribution

• Notation: $Y \text{\textasciitilde} t(k \space or \space v) \space where \space k \space or \space v = Degrees \space of \space freedom$
• Represents a small sample size approximation of a Normal Distribution.
• Its graph is a bell-shaped, symmetric curve with fat tails and a low peak. This accounts for the higher value of uncertainity caused by the small sample size.
• It accounts for extreme values better than a Normal Distribution.
• Expected value and variance: \begin{alignedat}{1} If \space k > 2:& \\ E(Y) &= \mu \\ Var(Y) &= S^2.\frac{k}{k - 2} \\ t_{v, \alpha} &= \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}} \end{alignedat}
• Example and uses:
• Often used in anlysis when examining a small sample of data that usually follows a Normal Distribution.
• Frequently used when conducting statistical analysis
• Used for hypothesis testing with limited data
• CDF table = T-table
• Graph: #### Sampling Distribution

• Notation: $Y \text{\textasciitilde} N(\mu, \frac{\sigma^2}{n})$
• A sampling distribution is a collection of a measure of central tendendy of various samples of a population.
##### Central Limit Theorem
• No matter the underlying distribution, the sampling distribution of the means approximates a normal distribution.
• The more number of samples that are used, the closer the approximation to the population mean. Number of samples equal to or more than $30$ is better.
• The more number of samples, the closer the approximation to the Normal distribution.
• The bigger the samples, the closer the approximation to the Normal distribution.
• This theorem allows us to perform tests, solve problems, and make inferences using the Normal distribution, even when the population is not normally distributed.
• The mean of the sampling distribution is very close to the mean of the original distribution.
• The variance of the sampling distribution is $n$ times smaller, where $n$ is the size of size of the samples.
• Used whenever we have the sum or average of many variables.
• Notation: $Y \text{\textasciitilde} N(\mu, \frac{\sigma^2}{n})$

### Chi-squared Distribution

• Notation: $Y \text{\textasciitilde} \chi^2(k) \space where \space k = Degrees \space of \space freedom$
• This distribution is the square of the t-distribution.
• Its graph is asymmetric and skewed to the right. It has a fat tail to the right and no tail to the left.
• Expected value and variance: \begin{alignedat}{1} E(Y) &= k \\ Var(Y) &= 2k \end{alignedat}
• Example and uses:
• Often used to test goodness of fit.
• Often used for hypothesis testing.
• Often used when computing confidence intervals.
• Contains a table of known values for its Cumulative Distribution function called the $\chi^2$-table.
• Graph: • Chi-Square Probability Table: ### Exponential Distribution

• Notation: $Y \text{\textasciitilde} Exp(\lambda) \space where \space \lambda = Scale$
• Usually observed in events that significantly change early on.
• The Probability Distribution Function and the Cumulative Distribution Function plateau after a certain point.
• The scale parameter determines the rate at which the Probability and Cumulative distribution functions plateau.
• The scale parameter also determines the spread of the graph.
• Natural logarithm is often used to transform the values of such distributions since a table of known values is not available. This transformation results in a Normal distribution.
• Expected value and variance: \begin{alignedat}{1} E(Y) &= \frac{1}{\lambda} \\ Var(Y) &= \frac{1}{\lambda^2} \end{alignedat}
• Example and uses:
• Used with dynamically changing values, like online website traffic or radioactive decay.
• Graph: ### Logistic Distribution

• Notation: $Y \text{\textasciitilde} Logistic(\mu, S) \space where \space S = Scale \space and \space \mu = Location \space or \space Mean$
• Observed when trying to determine how continous variable inputs can affect the probability of a binary outcome.
• The Cumulative Distribution Function picks up near the mean.
• The smaller the scale parameter, the quicker it reaches values close to $1$.
• Expected value and variance: \begin{alignedat}{1} E(Y) &= \mu \\ Var(Y) &= \frac{S^2.\pi^2}{3} \end{alignedat}
• Example and uses:
• Often used in sports to anticipate how a player’s performance can determine the outcome of the match.
• Graph: # Measures of Relationship between variables

## Coefficient of Variance

• Also known as relative standard deviation.
• Comparing the standard deviation of two different datasets gives no meaningful information but comparing the coefficient of variance of two different datasets makes sense.
• Coefficient of Variance: \begin{alignedat}{1} C_V &= \frac{\sigma}{\mu} \\ \widehat{C_V} &= \frac{s}{\bar{x}} \end{alignedat}

## Covariance

• Used to determine relation between two variables.
• Can be positive, zero, or negative.
• Covariance: \begin{alignedat}{1} \sigma_{xy} &= \frac{\sum_{i = 1}^N(x_i - \mu_x) * (y_i - \mu_y)}{N} \\ s_{xy} &= \frac{\sum_{i = 1}^n(x_i - \bar{x}) * (y_i - \bar{y})}{n - 1} \end{alignedat}

## Linear Correlation Coefficient

• Correlation adjusts the covariance so that the relation between two variables becomes easy and intuitive to interpret.
• Correlation does not imply causation.
• Correlation of 1 means strong positive correlation - an increase in one variable is accompanied by a similar increase in the second variable.
• Correlation of 0 means the two variables are independent of each other.
• Correlation of -1 means strong negative correlation - an increase in one variable is accompanied by a similar decrease in the second variable.
• Linear Correlation Coefficient: \begin{alignedat}{1} &= \frac{\sigma_{xy}}{\sigma_x.\sigma_y} \\ &= \frac{s_{xy}}{s_x.\sigma_y} \end{alignedat}

# Estimators and Estimates

## Estimators

• An estimator is a mathematical function that approximates a population parameter depending only on sample information.
• Examples:
Term Estimator Parameter
Mean $\bar{x}$ $\mu$
Variance $s^2$ $\sigma^2$
Correlation $r$ $\rho$
• Important properties:
• Bias:
• Expected value of an unbiased estimator is the population parameter. The bias in this case is $0$.
• If the expected value of an estimator is $(parameter + b)$, then the bias is $b$.
• Efficiency:
• The most efficient estimator is the one with the smallest variance.

## Estimates

• This is the output that you get from an estimator.

### Types of estimates

#### Point estimate

• Single value.
• Examples: 1, 5, 122.67, 0.32

#### Confidence Intervals

• An interval within which we are confident (with a certain percentage of confidence) that the population parameter will fall.
• The confidence interval is built around the point estimate.
• Confidence intervals are more precise than point estimates.
• If ME is the margin of error, then the confidence interval is given by the following formula: $[\bar{x} -$ME, \bar{x} + ME]
• Examples: (1, 5), (12, 33), (221.78, 745.66), (-0.71, 0.11)
##### Level of confidence:
• This is represented by $(1 - \alpha)$
• We are $(1 - \alpha) * 100\%$ that the population parameter will fall within the specified confidence interval.
• Common values of $\alpha$:
• 0.01
• 0.05
• 0.1
##### Margin of Error
• Formula: \begin{alignedat}{1}ME &= Reliability \space Factor * \frac{Standard \space Deviation}{\sqrt{Sample \space Size}} \\ &= Z_{\frac{\alpha}{2}} * \frac{\sigma}{\sqrt{n}} \\ &= t_{v,\frac{\alpha}{2}} * \frac{s}{\sqrt{n}} \end{alignedat}
##### Effect on Confidence Interval
Term Effect on width of CI
$(1 - \alpha) \uparrow$ $\uparrow$
$\sigma \uparrow$ $\uparrow$
$n \uparrow$ $\downarrow$

# Formulas for calculating test statistics and confidence intervals

No. of Populations Population variance Samples Statistic Variance Test Statistic Formula CI Formula
One Known - $z$ $\sigma^2$ $Z = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{n}}}$ $\bar{x} \pm z_{\frac{\alpha}{2}} . \frac{\sigma}{\sqrt{n}}$
One Unknown - $t$ $s^2$ $T = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}$ $\bar{x} \pm t_{n - 1, \frac{\alpha}{2}} . \frac{s}{\sqrt{n}}$
Two - Dependent $t$ $s_{difference}^2$ $T = \frac{\bar{d} - \mu_0}{\frac{s_d}{\sqrt{n}}}$ $\bar{d} \pm t_{n - 1, \frac{\alpha}{2}} . \frac{s_d}{\sqrt{n}}$
Two Known Independent $z$ $\sigma_x^2, \sigma_y^2$ $Z = \frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{\sigma_x^2}{n_x} + \frac{\sigma_y^2}{n_y}}}$ $(\bar{x} - \bar{y}) \pm z_{\frac{\alpha}{2}} . \sqrt{\frac{\sigma_x^2}{n_x} + \frac{\sigma_y^2}{n_y}}$
Two Unknown, assumed equal Independent $t$ $s_p^2 = \frac{(n_x - 1).s_x^2 + (n_y - 1).s_y^2}{n_x + n_y - 2}$ $T = \frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{s_p^2}{n_x} + \frac{s_p^2}{n_y}}}$ $(\bar{x} - \bar{y}) \pm t_{n_x + n_y - 2, \frac{\alpha}{2}}.\sqrt{\frac{s_p^2}{n_x} + \frac{s_p^2}{n_y}}$
Two Unknown, assumed different Independent $t$ $s_x^2, s_y^2$ - $(\bar{x} - \bar{y}) \pm t_{v, \frac{\alpha}{2}}.\sqrt{\frac{s_x^2}{n_x} + \frac{s_y^2}{n_y}}$

# Scientific Method

It is a procedure that consists of systematic observation, measurement, experiment, and formulation, testing, and modification of hypotheses.

## Steps in Data-Driven Decision Making

1. Formulate a hypothesis.
2. Find the right test.
3. Execute the test.
4. Make a decision.

## Hypotheses

• A hypothesis is an idea that can be tested.
• It is a supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation.

### Null Hypothesis

• Notation: $H_0$
• It is the hypothesis to be tested.
• It is the status-quo: The belief that we are contesting with our test.
• It is similar to the notion: Innocent until proven guilty.
• In statistics, it is the statement that we are trying to reject.

### Alternative Hypothesis

• Notation: $H_1$ or $H_A$
• It is the change or innovation that is contesting the status-quo.
• The act of performing a test shows that have doubts about the truthfulness of the null hypothesis.
• In general, the researcher’s opinion is contained in the alternative hypothesis.

## Hypotheses Testing

• During testing, we can either accept the null hypothesis or reject the null hypothesis.
• In a two-tailed test:
• Rejection region: The tails of the distribution show when we reject the null hypothesis.
• Acceptance region: Everything that remains in the middle shows when we accept the null hypothesis. # Level of Significance

• Notation: $\alpha$
• It is the probability of rejecting a null hypothesis that is true. So, it is the probability of making this error.
• Common significance levels: 0.10, 0.05, 0.01

# Types of tests

## Two-sided (two-tailed) test

It is used when the null hypothesis contains an equality ($=$) or an inequality sign ($\not =$). ## One-sided (one-tailed) test

It is used when the null hypothesis doesn’t contain an equality or inequality sign ($<, >, \le, \ge$). # Types of errors while testing

## Type I errors

• These errors are also called False Positive errors.
• This occurs when you reject a true null hypothesis.
• The probability of making this error is $\alpha$ - the level of significance.
• Since you choose the alpha, the responsibility of making this error lies solely with you.

## Type II errors

• These errors are also called False Negative errors.
• This occurs when you accept a false null hypothesis.
• The probability of making this error is $\beta$.
• Beta depends solely on the sample size and the population variance. So, if your topic is hard to test due to difficulty in sampling or high variability of the data, it is more likely to make this type of error.
• So, this type of error is not your fault.

## Power of the test

• The goal of a test is to reject a false null hypothesis.
• It’s probability is denoted by $1 - \beta$ and is called the power of the test.

## Rejecting the Null hypothesis

• The law in most countries states that a person is “innocent until proven guilty”.
• It comes from the Latin phrase: Ei incumbit probatio, qui dicit, non qui negat; cum per rerum naturam factum negantis probatio nulla sit.
• This translates to: The proof lies upon him who affirms, not upon him who denies; since, by the nature of things, he who denies a fact cannot produce any proof.
• So, here, the null hypothesis is that “he is innocent”.
• If we reject the null hypothesis, then we are saying that the person is guilty. But if the person is really innocent, then we have committed a Type I error.
• If we accept the null hypothesis, then we are saying that the person is innocent. But if the person is really guilty, then we have committed a Type II error.
• Example $H_0$: The person is innocent.
$H_0 \diagdown The \space truth$ $H_0$ is true $H_0$ is false
Accept $H_0$ $\checkmark$ Type II error (False Negative)
Reject $H_0$ Type I error (False Positive) $\checkmark$

# p-value

• It is the smallest level of significance at which we can reject the null hypothesis, given the observed sample statistic.

## Notable p-values

• 0.000: This indicates that we reject the null hypothesis at all significance levels.
• 0.05:
• Often called the cut-off line, this indicates that we would accept the null hypothesis if our p-value is higher than $0.05$. Otherwise, we reject reject the null hypothesis.
• This is equivalent to testing at $5\%$ significance level.

### Decision Rules

• Reject the null hypothesis if:
• |test statistic| > |critical value|
• The p-value is less than some significance level like $0.05$.

### Formulae to calculate p-value

• One-tailed test:
• Used when the null hypothesis includes either the $<$, $\le$, $>$, or $\ge$ sign.
• Formula: 1 - cdf(test statistic)
• Two-tailed test:
• Used when the null hypothesis includes $=$ or $\ne$ sign.
• Formula: 2 x (1 - cdf(|test statistic|))