functions Math & Statistics

Basic Theory Notebook

Being a researcher in AI, apart from coding skills, you really need a solid background in math and stats. Nobody knows everything, but you need enough intuition to understand how an algorithm works and where the theoretical guarantees come from.

Books and Background

  • Casella for bounds, distributions, and where random variables come from.
  • Strang for linear algebra.
  • Bishop, Statistical Learning, and Boyd for ML and optimization foundations.
  • Linear Algebra and its Essence, plus probability, statistics, and Calculus, are non-negotiable refreshers.

From experience, that background is the minimum needed to understand what is really going on in ML rather than just copying recipes.

What This Page Covers

  • Expected value, variance, and covariance
  • Common families of distributions
  • Useful inequalities
  • Convergence of sequence
  • Regression

References in the original notes included 36705 Notes by Siva Balakrishnan and 36410 Notes by Siva Balakrishnan.

Expected Value, Variance, and Covariance

These are the questions you should be able to recover quickly if you claim to know statistics.

Covariance and Correlation derivation

  • Linearity of expectation: $$\mathbb{E}\left(\sum_{j=1}^k c_j g_j(X)\right)=\sum_{j=1}^k c_j \mathbb{E}(g_j(X))$$
  • Variance: $${\sf Var}(X)=\mathbb{E}((X-\mu)^2)=\mathbb{E}(X^2)-\mu^2$$
  • Covariance: $${\sf Cov}(X,Y)=\mathbb{E}(XY)-\mu_X\mu_Y$$ and correlation stays between $-1$ and $1$.
  • Conditional expectation, the law of total expectation, and the law of total variance are basic tools that keep coming back.
  • Sampling reminders matter too: sample mean, sample variance with the n-1 correction, inverse CDF sampling as in LeetCode 528, sample to draw, sampling in 2D, and reparameterization done right.

Distributions

  • Normal, chi-squared, Bernoulli, binomial, Poisson, exponential, multinomial, and gamma all appear throughout the original notes.
  • One reason to keep these close is that so many estimators and concentration arguments reduce back to these standard families.
  • The moment generating function is a useful unifying tool because equal mgfs imply equal distributions around zero.

Useful Inequalities

Inequality cheatsheet, concentration inequality video, and concentration inequality slides.

  • Markov and Chebyshev are the elementary starting points.
  • Chernoff, Hoeffding, and Bernstein help formalize concentration near the mean and in the tails.
  • Jensen, union bounds, and McDiarmid are worth remembering because they appear everywhere.
  • The theory notes also keep reminders on sub-Gaussian intuition, Lipschitz concentration, and U-statistics.

Convergence

  • Almost sure convergence
  • Convergence in probability and the weak law of large numbers
  • Convergence in quadratic mean
  • Convergence in distribution

The practical stack is still: know the definitions, know the intuition, and know what consistency means when you talk about an estimator.

Regression