In fact most properties of the PGF come from the monent-generating/characteristic function. Including why the second derivative is related to the variance. The second derivative of the moment generating function is the second moment E[X^2]. The second derivative of the logarithm of the MGF is the variance by definition.
The one property that's somewhat unique to the PGF is how composition relates to drawing a randomly-sized sample, which I can see could be useful.
The definition of MGF of a random variable with PDF f(x) is
E[e^{sX}] = int_{-inf}^{inf} f(x) e^{sx} dx
The definition of Laplace Transform of a signal f(t) is
F(s) = int _{-inf}^{inf} f(t) e^{-st} dt
Hence MGF is 'flipped' Laplace transform
Now for we know that the MGF of sum independent RVs is the product of their MGFs. So if we take the inverse Laplace transform, the density of the sum is convolution of the individual densities.
Similarly, if we take derivative in frequency domain, that is same as multiplying in time domain: So M'_X(s) is the 'flipped Laplace transform' of x f(x) and its value at s=0 is the 'DC-gain' of the signal.
And so on... the properties are all immediate consequence of the definition of MGF and since the definition is essentially the same as that of a Laplace transform , there is an equivalent property in signals and systems as well.
There's certain forms like that that have well known values that they converge to as you continue adding terms into infinity. Sometimes that convergence is only possible if your domain is limited, eg. [0,1].
Great article. For more, I really recommend Analytic Combinatorics:
In my own experience teaching teaching probability theory to physicists and engineers, establishing this connection is often a good way of helping people build intuition for why characteristic functions are so useful, why they crop up everywhere in probability theory, and why we can extract so much useful information about a distribution by looking at the characteristic function (since this group of students tends to already be rather familiar with Fourier-transforms).
I mean, it was useful for me to think about like a translation between sets and logic (this variable x is in the set xor not) into functions (a function f(x) that returns 1 or true whenever x is in set S)
how the heck is that a fourier transform!??
- The characteristic function of a random variable X is defined as the function that maps t --> ExpectedValue[ exp( i * t * X ) ]
- Computing this expected value is the same as regarding t as a constant and integrating the function x --> exp( i * t * x) with respect to the distribution of X, i.e. if X has the density f, we compute the integral of f(x) * exp( i * t * x) with respect to x over the domain of f.
- on the other hand: computing the Fourier transform of f (here representing the density of X) and evaluating it at point t (i.e. computing (F(f))(t) if F represents the Fourier transform) is the same as fixing t and computing the integral of f(x) * exp( -i * t * x) with respect to x.
- Rearranging the integrand in the previous expression to f(x) * exp( i * -t * x), we see that it is the same as the integrand used in the characteristic function, only with a -t instead of a t.
Hope that helps :)
What you described is more often referred to as an “indicator function” these days, with “characteristic functions” denoting the transform (Fourier, laplace, z - depending on context). Closely related to “moment generating functions” to the point of being almost interchangeable.
but the new and improved 21st century characteristic functions are n-variable and have a full continious spectrum of variables between zero (false) and one (true) but only potentially lest infinite realizes itself (which would make the theories illogical).
this way of thinking about this makes sense to me, even if it's ever so slighly wrong by some nitpickable point https://en.wikipedia.org/wiki/Moment-generating_function
If you don't say that this is what you are doing then it all seems quite mysterious.
(Probably obvious to everyone reading, but the variables should be independent.)
And the generating function of the cumulants is the logarithm of the generating function of the distribution (Fourier transform).
…or if you flip the vector and use x=10:
6284
https://courses.cs.washington.edu/courses/cse312/20su/files/...
Well, you can. But then you essentially get the real coordinate space, and them being polynomials and not just real-valued vectors is just an extra story that you slapped on top, but that has no relevance.
Polynomial vector spaces become useful, when we treat polynomials as functions, and the vector space of polynomials as a subspace of some other space of functions. And with function spaces the inner product is defined as an integral of the product of two functions (possibly with a weight function thrown in).
Or maybe the coefficient-based inner product spaces of polynomials have some uses, too. But they are less common than the function-based (integral) inner product.
> When de Moivre invented much of modern probability in the mid-1700s, he didn’t have vectors! Vectors are an 1800s invention.
Doesn't explain why we still teach them 300 years later though. Thats what the second half of the article covers.
Does an encoding of a sequence in a given Gödel numbering, also somehow "retrievably" encode the probability space of the sequence's terms?
> I’m not yet good enough to intuitively get why the curvature of the probability-generating would be related to variance, but I’d be happy to receive pointers here.
Here’s my intuition for this.
The characteristic function is the Fourier transform of the density.
If the density is in “t” units, the ch.f. is in f= 1/t units. It is the “inverse domain.” (I’m using “f” to suggest frequency, ie the Fourier coordinate.)
Of course it is not a simple coordinate transformation! But some intuition does carry over.
This is reflected in all sorts of ways. It’s one reason why the IFT formula is so functionally close to the FT formula.
Anyway.
Because of this, the behavior of the FT (ch.f.) very close to the origin (“f=0”) tells about the tails of the distribution (t = 1/f is large).
In particular, high curvature around the origin tells you the tails are heavy. That’s the variance.
This extends to the fourth moment. You can get even sharper curvature around the origin of the FT (ch.f. at f=0) with a large coefficient on the fourth order term. This corresponds to a large fourth moment of the pdf, or a high kurtosis.
It’s useful to recall that, because of analytic continuation, knowing all the derivatives at the one point f=0 determines the ch.f. everywhere, and thereby determines the complete density. This corresponds to the fact that knowing all the moments determines the full density.
So in a very real sense, you only need the ch.f. in a tight neighborhood of the origin!
(Provided all moments are finite.)