Description
The unitary operator on a separable Hilbert space is a parametrization of any conditional probability measure in a standard measure space. We also show that this result has consequences for gauge symmetries.
We show that any conditional probability measure in a standard measure space is parametrized by a unitary operator on a separable Hilbert space. We propose unitary inference, a generalization of Bayesian inference. We study implications for classical statistical mechanics.
We show that any conditional probability measure in a standard measure space is parametrized by a unitary operator on a separable Hilbert space. We propose unitary inference, a generalization of Bayesian inference. We study implications for classical statistical mechanics and machine learning.
In Scientific Research as in Bayesian inference, there is always a prior probability distribution, and there is no prior which is better for all cases[1]: we always have to make assumptions. For instance, if we insist we can assume all objects in the night sky revolve around the Earth using epicycles and even consider that this is a very successful theory because it was a precursor of Fourier series and these have widespread applications today[2].
The Emperor realized that the people were right but could not admit to that. He though it better to continue the procession under the illusion that anyone who couldn’t see his clothes was either stupid or incompetent. And he stood stiffly on his carriage, while behind him a page held his imaginary mantle.
—Hans Christian Andersen (1837)
Thus, data, evidence and even mathematical proofs are not enough for us to abandon prior beliefs, there must be a moment when in order for us to achieve something we want we have to abandon our prior beliefs: in the case of assuming all objects in the night sky revolve around the Earth, it would make many relevant calculations in modern astronomy or in the Global Positioning System impossible.
In the case of Quantum Mechanics, it was always possible to see its mathematical formalism as a mere (but very useful) parametrization of probability and classical information theories[3]. But, it was also possible to describe it (although in a forceful way) as a new and exotic alternative to (or generalization of) classical information theory, one that shook our sense of reality. It is not hard to believe that the mystery surrounding the exotic point of view can often convince the society to spend large amounts of money in research, at least in the short term.
The exotic point of view was sustained by a relatively constant flow of discoveries of new phenomena at higher and higher energies, which kept the mystery alive. But now, even optimists say that if the Large Hadron Collider finds nothing new, it will be harder to convince the governments of the world to build the next bigger, more expensive collider to keep research in High Energy Physics as we know it[4]. Thus, if the exotic point of view is not that mysterious nowadays and there exists a parametrization point of view, is there another way to convince the society to spend large amounts of money in research?
The answer is yes, there is another way. Because the parametrization point of view allows the mathematical formalism of Quantum Mechanics to be applied to probability and classical information theory (and then to artificial intelligence) not just to edge cases, but to the core of these theories. This would not be possible with the exotic point of view, which considered that Quantum Mechanics shook our sense of reality and thus must be an alternative to (or a generalization of) classical information theory and thus there would be no reason to expect applications to the core of classical information theory.
We call to the process of updating a state of knowledge with respect to new evidence as Probability updates[5]. There is little doubt that the Bayes rule should be applied to Probability updates, whenever possible[5]. But there is also little doubt that the Bayes rule is often not applicable[5] and thus Probability updates is a generalization of Bayesian inference (and of statistical inference). There is no consensus on what to do when the Bayes rule is not applicable.
Machine learning[6] (including numerical analysis in general[7]) and Quantum Mechanics[8] can be defined as a particular case of Probability updates. Thus, it is no surprise that there is a broad consensus that Machine learning and Quantum Mechanics are a generalization of Bayesian inference in some sense, but little consensus beyond that.
A Bayesian model[9] is defined by a regular conditional probability between standard measure spaces, also called a Markov kernel or likelihood. If we concatenate Bayesian models, such that the data (output) of one Bayesian model are the parameters (input) of another Bayesian model we create a Markov process.
Markov processes cannot produce an arbitrary function of time, because there is an ordering (related with the concept of entropy) with respect to which all continuous-time Markov processes are monotonic[10]. Thus, Bayesian inference is irreversible.
In this article we propose reversible statistical models and a method of statistical inference defined by a quantum time-evolution which is reversible (we call it Unitary inference). The Bayes rule is applied to the final likelihood which results from the concatenation of several reversible models followed by the application of the Born rule.
Thus, such reversible inference is fully compatible with Bayesian inference, that is: given a Bayesian model, a prior probability and data that allows to produce a posterior probability through Bayesian inference, there is a reversible model that produces the same posterior probability as a function of the prior probability (see next section). Moreover, a quantum time-evolution can be used to define Probability updates, Machine Learning and Quantum Mechanics.
The crucial advantage of Unitary inference compared to Bayesian inference, besides the fact that it is more general, is that any prior wave-function can be transformed into any other prior wave-function through a change of basis, which also affects the unitary statistical models. In Bayesian inference, there are priors which are too wide, in the sense that the relevant sample space is too big to allow for a numerical integration. At first sight, unitary inference makes this problem worse because instead of one numerical integration we have to do many numerical integrations because we are dealing with matrices now. But it turns out that it also allows for basis changes and thus a prior wave-function that is too wide in one basis, can become extremely simple in another basis. Since, the basis change also affects the unitary statistical models, it is only the relation between the unitary statistical model and the prior probability distribution that is relevant. In any case, any approximation (Monte-Carlo, Neural networks, variational inference, etc.) can be mapped to the formalism of unitary inference (because it is the most general formalism), where the approximation can be seen as the choice of a favorable basis and prior probability distribution with respect to an approximation (through truncation, for instance) of the unitary statistical model.
Note that probabilities can always be seen as incomplete information, which in turn can be measured by entropy as defined in classical information theory. We claim that reversible models and unitary inference are not in themselves a source of entropy, but they still must deal with the entropy coming from other sources. This is guaranteed by the fact that unitary inference is fully compatible with Bayesian inference. Thus, we do not claim that unitary inference eliminates the entropy that comes from other sources.
Measure spaces (in particular, algebraic measure theory as defined below) have many applications in classical mechanics and dynamical systems[11], when non-linear equations and/or complex sample spaces become an obstacle.
Any measure space can be defined as a probability space: a probability space is a measure space where a probability density was chosen, which is any measurable function normalized such that its measure is 1. In the historical (that is, Kolmogorov’s) probability theory, a probability space has three parts: a sample space (which is the set of possible states of a system, also called phase space); a complete Boolean algebra of events (where each event is a subset of the set of possible states); and a probability measure which assigns a probability to each event. The probability is a map from complex random events to abstract random events, shifting all ambiguity with the notion of randomness to the abstract random events as described in the following: that the probability of an event is
On the other hand, a standard measure space is isomorphic (up to sets with null measure) to the real Lebesgue measure in the unit interval or to a discrete (finite or countable) measure space or to a mixture of the two. Thus, topological notions such as dimension do not apply to standard measure spaces. Most probability spaces with real-world applications are standard measure spaces. Equivalently, a standard measure space can be defined such that the following correspondence holds: every commutative von Neumann algebra on a separable (complex or real) Hilbert space is isomorphic to
To be sure, the algebraic measure theory is based on commutative algebras, thus it is not a non-commutative generalization of probability or information theory (see section 4). That is, there is no need for a conceptual or foundational revolution such as qubits replacing bits when switching from the historical to the algebraic probability theory [16]. Moreover, this is a common procedure in mathematics, as illustrated in the following quote [19] (note that a probability measure is related to integration):
The fundamental notions of calculus, namely differentiation and integration, are often viewed as being the quintessential concepts in mathematical analysis, as their standard definitions involve the concept of a limit. However, it is possible to capture most of the essence of these notions by purely algebraic means (almost completely avoiding the use of limits, Riemann sums, and similar devices), which turns out to be useful when trying to generalize these concepts[...]
T. Tao (2013)[19]
The relation of the algebraic measure theory with probability theory is the following: any joint probability density
We have
Since
Note that the Cauchy-Schwarz inequality implies
Since
Then
Since the Hilbert space is separable, there is an orthonormal discrete basis, and we can then enlarge the discrete part of the sample space
The converse also holds. Given a bounded operator
The linearity of the commutative algebra; avoiding a fixed sample space a priori; and the fact that we can map complex random phenomena to an abstract random process unambiguously, are obvious advantages for algebraic measure theory when we want to compare probability theory with Quantum Mechanics, where the linearity of the canonical transformations is guaranteed by Wigner’s theorem (it is essentially a consequence of the Born’s rule applied to a non-commutative algebra of operators[20][22][23]); the Hilbert space of wave-functions replaces the sample space; and the canonical transformations are non-deterministic.
The algebraic measure theory is also different from defining the sample space as a reproducing kernel Hilbert space [24][25], since no sample space (whether it is a Hilbert space or not) is defined a priori. Note that defining the sample space as a Sobolev Hilbert space is common in classical field theory[26], but defining a general probability measure in such space is still an open problem.
The correspondence between geometric spaces and commutative algebras is important in algebraic geometry1. It is usually argued that the phase space in quantum mechanics corresponds to a non-commutative algebra, and thus it is a non-commutative geometric space in some sense[27]. It is a fact that Quantum Mechanics may inspire a non-commutative generalization of probability theory, since the wave-function could also assign a probability to non-diagonal projections, these non-diagonal projections would generate a non-commutative algebra[8]. However, after the wave-function collapse, only a commutative algebra of operators remains. Thus, the phase space in quantum mechanics is a sample space of a standard measure space and the standard spectral theory (where the correspondence between geometric spaces and commutative algebras plays a main role [28]) suffices.
Consider for instance the projection
But due to the wave-function collapse, Quantum Mechanics is not a non-commutative generalization of probability theory despite the appearances: the measurement of the momentum is only possible if a physical transformation of the statistical ensemble also occurs, as we show in the following.
Suppose that
If we consider a unitary transformation
Where
Where
Thus,
Without collapse, we would have
At first sight, our result that the wave-function is merely a parametrization of any probability measure, resembles Gleason’s theorem [31][32]. However, there is a key difference: we are dealing with commuting projections and consequently with the wave-function, while Gleason’s theorem says that any probability measure for all non-commuting projections defined in a Hilbert space (with dimension
We can check the difference in the 2-dimensional real case. Our result is that there is always a wave-function
However, if we consider non-commuting projections and a diagonal constant density matrix
Our result implies that there is a pure state, such that:
(e.g.
And there is another possibly different pure state, such that:
(e.g.
But there is no
On the other hand, Gleason’s theorem implies that there is a
Gleason’s theorem is relevant if we neglect the wave-function collapse, since it attaches a unique density matrix to non-commuting operators. However, the wave-function collapse affects differently the density matrix when different non-commuting operators are considered, so that after measurement the density matrix is no longer unique. In contrast, without the wave-function collapse, the wave-function parametrization of a probability measure would not be possible.
Another difference is that our result applies to standard probability theory, while Gleason’s theorem applies to a non-commutative generalization of probability theory.
In Bayesian inference there is always a prior probability distribution, and there is no prior which is better for all cases[1]: we always have to make assumptions. For instance, if we choose a uniform prior in a continuous sample space then the maximum likelihood coincides with the maximum of the posterior (resulting from the prior once the data is taken into account). However, such maximum has null measure and thus no particular meaning. If we take a sample based in the posterior, we expect the sample to be somewhere near the maximum but never exactly at the maximum. Overfitting means that the inference process produced a sample which is inconsistent with our prior beliefs and thus could not be produced by Bayesian inference with an appropriate prior. Thus, overfitting means we need to choose another prior more consistent with our prior beliefs.
In Bayesian inference, the likelihood of the output data including correlations and variances fully determines the statistical model; then all statistical models can be seen as a particular case of one general statistical model for particular prior knowledge about the parameters of the general statistical model[33]. Thus, there is a probability distribution (the prior) of a probability distribution (the likelihood of the output data).
But functions (such as the likelihood of the output data) are in general infinite-dimensional spaces, so it makes sense to look for measures in infinite-dimensional spaces. While the Lebesgue measure cannot be defined in a Euclidean-like infinite-dimensional space[34][35][36][37], it is well known since many decades that a uniform (Lebesgue-like) measure of an infinite-dimensional sphere can be defined using the Gaussian measure and the Fock-space (the Fock-space is a separable Hilbert space used in the second quantization of free quantum fields)[38]. Such a space can parametrize (we call it the free field parametrization) the probability distribution of another probability distribution, which is exactly what we need: the infinite-dimensional sphere parametrizes the space of all likelihoods of the output data, while the wave-function whose domain is the sphere parametrizes a measure on the sphere.
In the free field parametrization, the uniform prior over the sphere defines a vector of the Hilbert space which when used as the prior for Bayesian inference with arbitrary data generates an orthogonal basis for the whole Fock-space. Such basis is related with a point process, with the number of points with a given feature corresponding to the number of modifications to the uniform prior (in the part of the sample space corresponding to such feature). Since Bayesian inference with any other prior can be seen as a combination of the results of different Bayesian inferences with the uniform prior for different data (eventually an infinite amount of data for the cases with null measure), then the uniform prior in the free field parametrization is in many cases (not in all cases[1]) appropriate for Bayesian inference in the absence of any other information.
Moreover, the frequentist view of probabilities is viable because many statistical problems can be solved by a Bayesian model where the parameters defining probabilities are inferred from Binomial processes with uniform prior (for instance, the Binomial converges to the Normal distribution under some conditions), which implies that it is often possible to incorporate prior knowledge efficiently by modifying the counting of events. This does not exclude of course just using an arbitrary prior (with methods such as preconditioned Crank–Nicolson MCMC for high-dimensional problems), but that might be computationally inefficient in some cases. In both cases (counting or MCMC), the free-field parametrization is crucial to the implementation of many constrained problems, this allows the uniform prior to be used in many cases where we are just interested in a subset of the sample space with null measure (as we discuss elsewhere).
Thus, ensemble forecasting[39]—with many applications and where an ensemble of different statistical models is built—can be seen as sampling from a Bayesian posterior corresponding to a particular Bayesian prior which selects which models constitute the ensemble.
This leads us to classical statistical mechanics: whatever system we study, we need a probability measure on the phase-space of such system corresponding to an ensemble which defines a Bayesian prior. When the system we are studying is itself an ensemble, and thus it is defined by another probability distribution, then we can use the free field parametrization.
Quantum statistical mechanics is not different, since the Hilbert space on which the density matrix lives is merely a parametrization of a probability, due to the wave-function collapse. When the density matrix is not pure, then the probability defining the ensemble is a joint probability distribution of the initial and final states of the systems. We can always define the density matrix through a diagonal operator rotated by a unitary operator, with the diagonal operator defining the marginal probability of the initial state and the unitary operator defining the conditioned probability of the final state conditioned by the initial state.
We can also consider a statistical model which acts in the prior wave-function in a non-linear way, by redefining the Hilbert-space of the prior as a tensor product of two Fock-spaces and the corresponding statistical model as unitary (and thus linear), this is in principle possible for most non-linear actions we can think of, because a conditional probability measure includes all such actions. Note that every formalism has limits, so there will be always some exotic statistical model which cannot be redefined or at least not in a useful way.
This allows us to treat (classical or quantum) statistical processes as classical dynamical systems (where the system is itself an ensemble).
The free field parametrization defined in the previous section depends on the fact that the sphere is infinite dimensional corresponding to a sample space with infinite degrees of freedom. But often we are interested in a tensor product of sample spaces, some of which have finite degrees of freedom.
Of course, we can always treat finite degrees of freedom as a subset of infinite degrees of freedom. But there might be an alternative as well.
We consider the sample space
If we consider now the sample space
We need the space
The advantage is that there are certain symmetry transformations (such as the rotation for different spins in the presence of a flavor symmetry[42]) which can only be represented by bosonic free fields while others can only be represented by fermionic free fields. This is what leads to the spin-statistics theorem[42] or to the fact that ghost fields verify an unusual spin-statistics correspondence. So, we better have an alternative to the bosonic free field parametrization discussed in the previous section.
Any conditional probability measure (in a standard measure space) which is function of a one-dimensional parameter (which we call time, without loss of generality) can be parametrized by a quantum process: a time-ordered product integral of unitary operators, defined by a time-dependent Hamiltonian operator, at the cost of choosing a larger sample space such that the corresponding isometry can be unitary. Such parametrization using a time-dependent Hamiltonian is particularly useful in systems where the words “time-dependent” have a clear meaning. For instance, a molecule in an ideal gas is constrained both theoretically (through its position operator and the effect that the Poincare group has on it) and experimentally to have conditional probability measures defined by a time-independent Hamiltonian operator; also, in a deterministic time-evolution, a time-independent classical Hamiltonian corresponds to a time-independent Hamiltonian operator. Nevertheless, such parametrization is always possible in a standard measure space, including for Markov processes; therefore quantum processes are not intrinsically exotic or weird, they are simply more general than Markov processes.
Moreover, as it was discussed in the previous two sections, a non-linear transformation of the wave-function can be redefined as a unitary operator (and thus linear) on a Fock-space. This is in principle possible for most non-linear transformations we can think of, because the Hilbert space is infinite-dimensional.
Note that Markov processes cannot produce an arbitrary function of time, because there is an ordering (related with the concept of entropy) with respect to which all continuous-time Markov processes are monotonic[10]. Moreover, a Markov process is a semigroup which is much harder to translate into an infinitesimal structure than the unitary operators which form a group, as discussed in reference[43]:
“The basic feature of Lie theory is that of using the group structure to translate global geometric and analytic problems into local and infinitesimal ones. These questions are solved by Lie algebra techniques which are essentially linear algebra and then translated back into an answer to the original problem. Surprisingly enough it is possible to follow this strategy to a large extend also for semigroups, but things become more intricate. Because of the missing inverses one has not only to deal with linear algebra but also with convex geometry at the infinitesimal level.”
A quantum process can be classified by all unitary transformations of the wave-function that may occur at each time (called canonical transformations). When a subset of canonical transformations forms a group, such subset is always a linear unitary representation of a group which is then called symmetry group and the canonical transformations are then called symmetry transformations.
In conclusion, conditional probability measures defined by a linear unitary representation of a symmetry group are certainly natural and common.
A conservative transformation
If there is a locally compact abelian group
For all
In particular, a deterministic automorphism (defined in the next section) can be made conservative without modifying its effect on
A particular case of a deterministic conservative automorphism is a measure-preserving transformation
Crucially, the symmetry transformations include all the deterministic transformations, which will be defined in the following. Thus, the symmetry transformations are a generalization of the deterministic transformations.
A deterministic transformation
In the particular case of a one-parameter continuous group, the deterministic transformations must be automorphisms. Thus, a deterministic automorphism
We conclude that an automorphism
The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. — Eugene Wigner (1960) [46]
There is no uniform countable measure. Thus, the rationals are not enough for Probability Theory. A standard probability space (which has countable and continuous measures) seems irreducible.
However, the rules to update the probability measure are not at all obvious, because we can always consider a probability space of probability spaces (ensemble forecasting). Assuming a quantum time-evolution (that is, linear) for the ensemble forecasting leads to no relevant restriction on the time-evolution of the probability spaces (it may be non-linear, although every formalism has a domain of application, so we still need to exclude a pathological time-evolution).
The Bayes rule applies only when we do a measurement, but this is a tautology. Since, a measurement is not defined, then a measurement is applying the Bayes rule to update a probability: destroy the uncertainty (and thus the probability distribution) by converting probabilities into events, unpredictably and reproducing the probability distribution in the limit of infinite measurements. The limit of infinite measurements creates a joint probability distribution, with the results of the infinite measurements and the probability update after each measurement took place. It is not different from an unbiased sampling process taken over infinite time. Note that slightly biased sampling processes are common, but these can be converted to unbiased by enlarging the probability space (using ensemble forecasting, for instance).
Using the free field parametrization of ensemble forecasting[39] (see Sections 5, 6), we can diagonalize the quantum time-evolution, decomposing a non-linear infinite-dimensional model in a direct integral of linear models with only one boolean variable. This only solves part of the problem, since a direct integral can still be uncomputable. We have to show how to decompose an eventually continuous spectrum of the quantum time-evolution into a direct sum of sufficiently small intervals of energy, each of these intervals described by few variables. This is not obvious, since any continuous interval, no matter how small, is still uncountable. We will prove this in another article. But it is not unreasonable to assume it is possible to do it (for instance, adapting the Krylov subspace methods for computations of matrix exponentials[47]), as we do from now on.
Since the wave-function defines an ensemble of deterministic transformations (that is, functions), we can easily study and calculate each possible function when we are dealing with few variables. This justifies why deterministic logic and deterministic mathematics find application in a world full of uncomputable functions.
This gives a mathematical definition on the renormalization process, which has been empirically observed to be possible to do in a wide range of problems (related or unrelated to Physics)[48][49]. That is, for reasonable initial conditions (because we do not have access to an infinite range of energies in the real-world, using man-made experimental apparatus) we can always approximately predict the time-evolution of a system using a model with few variables, using different unrelated models for different energy ranges.
The renormalization process is universal, and it applies to any statistical model (which defines a unitary transformation, playing the role of a quantum time-evolution, the “time” is abstract here). Thus, it allows efficient machine learning using Bayesian priors in the few most relevant variables, helping align models and incorporate prior knowledge.
In the previous section, the few variables are related with the energy-momentum space, not with the coordinate space. Here we discuss the evidence about the necessary conditions to discard probabilities in the coordinate space, recovering Classical Mechanics. Note that these conditions are not rigorous, that is, often it works but not always, thus Classical Mechanics is never enough at any energy scale, Quantum Mechanics is always needed.
As far as we know the world is compatible with a continuum space-time, with rational functions of rational variable obtained through step functions and the partition of a continuum space-time. For instance, we can “see” that we have “five” fingers in our hand, but exactly why we can do this in a continuum space-time is not obvious. The mainstream mathematics depends on the fact that step, polynomials and smooth functions are relevant for applications. But it is important to look to the existing evidence on why these functions are relevant for applications.
Step, polynomial or smooth functions are dense in the
Note that Chaos and singularities in Ordinary differential equations are phenomena of computable functions (computable at least for a small enough but finite time, thus they still admit the partition of a large domain in numerical approximations), despite that these are clues for the fact that many functions are uncomputable. Uncomputable functions are uncomputable for any finite time, unlike Chaos or singularities.