The wave-function parametrization is so important because it allows us to define mathematically a general probability measure for an infinite number of random variables. This renders Bayesian inference as a general and theoretically grounded framework for Machine Learning.
In Bayesian inference there is always a prior probability distribution, and there is no prior which is better for all cases[1]: we always have to make assumptions. For instance, if we choose a uniform prior in a continuous sample space then the maximum likelihood coincides with the maximum of the posterior (resulting from the prior once the data is taken into account). However, such maximum has null measure and thus no particular meaning. If we take a sample based in the posterior, we expect the sample to be somewhere near the maximum but never exactly at the maximum. Overfitting means that the inference process produced a sample which is inconsistent with our prior beliefs and thus could not be produced by Bayesian inference with an appropriate prior. Thus, overfitting means we need to choose another prior more consistent with our prior beliefs.
In Bayesian inference, the likelihood of the output data including correlations and variances fully determines the statistical model; then all statistical models can be seen as a particular case of one general statistical model for particular prior knowledge about the parameters of the general statistical model[2]. Thus, there is a probability distribution (the prior) of a probability distribution (the likelihood of the output data).
But functions (such as the likelihood of the output data) are in general infinite-dimensional spaces, so it makes sense to look for measures in infinite-dimensional spaces. While the Lebesgue measure cannot be defined in an Euclidean-like infinite-dimensional space[3][4], it is well known since many decades that a uniform (Lebesgue-like) measure of an infinite-dimensional sphere can be defined using the Gaussian measure and the Fock-space (the Fock-space is a separable Hilbert space used in the second quantization of free quantum fields)[5]. Such a space can parametrize (we call it the free field parametrization) the probability distribution of another probability distribution, which is exactly what we need: the infinite-dimensional sphere parametrizes the space of all likelihoods of the output data, while the wave-function whose domain is the sphere parametrizes a measure on the sphere.
In the free field parametrization, the uniform prior over the sphere defines a vector of the Hilbert space which when used as the prior for Bayesian inference with arbitrary data generates an orthogonal basis for the whole Fock-space. Such basis is related with a point process, with the number of points with a given feature corresponding to the number of modifications to the uniform prior (in the part of the sample space corresponding to such feature). Since Bayesian inference with any other prior can be seen as a combination of the results of different Bayesian inferences with the uniform prior for different data (eventually an infinite amount of data for the cases with null measure), then the uniform prior in the free field parametrization is in many cases (not in all cases[1]) appropriate for Bayesian inference in the absence of any other information.
Thus ensemble forecasting[6]—with many applications and where an ensemble of different statistical models is built—can be seen as sampling from a Bayesian posterior corresponding to a particular Bayesian prior which selects which models constitute the ensemble.
This leads us to classical statistical mechanics: whatever system we study, we need a probability measure on the phase-space of such system corresponding to an ensemble which defines a Bayesian prior. When the system we are studying is itself an ensemble and thus it is defined by another probability distribution, then we can use the free field parametrization.
Quantum statistical mechanics is not different, since the Hilbert space on which the density matrix lives is merely a parametrization of a probability, due to the wave-function collapse. When the density matrix is not pure, then the probability defining the ensemble is a joint probability distribution of the initial and final states of the systems. We can always define the density matrix through a diagonal operator rotated by a unitary operator, with the diagonal operator defining the marginal probability of the initial state and the unitary operator defining the conditioned probability of the final state conditioned by the initial state.
This allows us to treat classical or quantum statistical processes as classical dynamical systems (where the system is itself an ensemble).
Before defining the probability distribution of a classical field (a function of a space, say
Since the second-quantization procedure can be applied recursively, we can assume without loss of generality that the Hamiltonian is quadratic in the creation/annihilation operators and thus no further regularization is needed (the wave-function parametrization can be considered a regularization by itself). For instance, in case the Hamiltonian acting in the Fock-space is not quadratic in the creation/annihilation operators, then we can consider instead a new Fock-space where the base Hilbert space is the original Fock-space. The new Hamiltonian is quadratic in the creation/annihilation operators.
We can also assume the Hamiltonian to be local, at the cost of enlarging the Fock-space. This is consistent with the fact that any non-Markov stochastic process can be described as a Markov stochastic process where some variables defining the state of the system are hidden (i.e. unknown)[7][8]. That we can assume the Hamiltonian to be local and quadratic is consistent with the fact that a complete physical system is also a free system. The free field associated to a free system can be made an orthogonal(fermion)/symplectic(boson) complex representation of the Poincare group depending on whether its spin is semi-integer/integer respectively, regardless of the interactions occurring within the free system[9][10]. Thus in field theory the wave-function parametrization includes a free field parametrization. The remaining question now is how to define the notion of a field within the Fock-space (a separable Hilbert space).
As it was discussed before, every commutative von Neumann algebra on a separable complex Hilbert space is isomorphic to
While there is a mathematically rigorous definition of classical field theory [12], so far the definition of a (classical) statistical field theory [13] is tied to the definition of a quantum field theory [14], which involves a lattice spacing necessary to regularize and renormalize the ultraviolet divergences of the field theory. The notion of continuum limit in a discrete lattice is that for a large enough energy scale the predictions of the theory are independent of the type of discrete regularization used [14], thus the lattice in the regularized theory is always discrete. The regularization and renormalization are related to the decomposition of a field defined in the continuum through discrete wavelets and it is roughly the translation of the products of fields into products of wavelet components[15] (a related approach involves a semivariogram [16]). Such translation of products of fields only allows polynomial Hamiltonians; in particular, when using the Fock space without regularization (which is a possible way to implement a continuous tensor product[17]), only quadratic Hamiltonians are allowed (i.e. for free fields).
This excludes a rigorous definition of the classical statistical version of many classical field theories (such as General Relativity) since so far there is no reason why the Hamiltonian of a classical field theory should be polynomial in the fields, not to mention the problems with Quantum Gravity [18]. This is unacceptable: for most classical field theories, the definition of the corresponding classical statistical field theories should be straightforward, because the real-world measurements are never fully accurate.
The above is an indication that an alternative definition of Statistical Field Theory which allows the definition of non-polynomial Hamiltonians should not be too hard to find. Indeed, the essential obstruction to an infinite-dimensional Lebesgue measure is its
The way out is that the basis of the theory is the Fock space (a separable Hilbert space, thus associated to a standard measure space) which crucially, it does not include deterministic functionals. A probability distribution with sample space given by the direct product of the base space and the field can model a non-deterministic function since we can always choose a measure (the Gaussian measure for instance) which covers the whole base space, with the non-deterministic function evaluated at each point of the base-space given by the conditional probability distribution conditioned on each point of the base-space. Note that due to the relation between Fock-spaces and tensor products, we can consider regions of the sample space as small as we want and there is null measure for a null marginal probability in each one of these regions, and for each component of the Fourier series corresponding to such region; thus the conditional probability distribution conditioned on each point of the base-space is well defined everywhere except in sets with null measure.
Thus, the free field parametrization is appropriate for non-deterministic functions but not for deterministic functions, since the prior (uniform measure of an infinite-dimensional sphere, we call it the uniform prior from now on) attributes null measure to deterministic functions. If we would be interested in deterministic functions, we would be in trouble[3][4]. This does not come for free either, since we need to sacrifice deterministic functionals in general and thus classical field theory (which may be welcome from a quantum foundations point of view). That includes sacrificing the evaluation of a function in a point and thus replacing a classical field by a quantum field (in the Fock-space sense). Without the evaluation of a function in a point, continuity and differentiability notions must be redefined for quantum fields, including the generalized Stokes theorem (or fundamental theorem of calculus). Fortunately, as we will see in the following, this is relatively straightforward because the smooth wave-functions are dense in the Hilbert space.
In the quantum version, the integration domain in the Stokes theorem is replaced by a uniform measure or any other probability measure, while a set of constraints define a non-deterministic geometry which is not significantly more complex than differential geometry (which is great for High Energy Physics and Gravity).
On the other hand, differential geometry and most of the machine learning methods are in the dual space of a commutative von Neumann algebra but not in the pre-dual space (which corresponds to the Fock-space). Such dual space has some good properties but not all those necessary to define an information theory (that is, some consistent way of defining how much we know and not know about anything; on the other hand, the Fock-space has all required properties). For instance, differential geometry would be of little use whenever there is uncertainty about the value of a parameter of a system (which happens most of the time), if there would not exist another similar version of differential geometry which can be applied to non-deterministic systems.
The (bosonic or fermionic) Fock-space can be used as the wave-function parametrization of an arbitrary probability distribution of a classical field over space (for instance the 3D space), in the following way. For a Hamiltonian involving at most first-order derivatives of the fields, the propositions about a (non-free) field at an arbitrary discrete finite number
Above, we assume no coinciding points such as
In (quantum or classical) statistical field theory, the problem we want to solve is about a probability distribution so it is about an eigenvalue problem and diagonalizing a time-evolution operator, because the eigenfunction needs not even exist and we use ideals instead. This allows us to know approximately the probability distribution of the initial and final state through numerical methods for partial differential equations and regression (Gaussian process regression [19] or statistical finite element method [20], for instance). On the other-hand, in classical field theory (including in numerical calculations such as the finite element method [21][22]) it is about the fields themselves and so the solution must be part of an Hilbert space (because completeness of the space is crucial for the existence proofs) and we need an alternative to the
The cost of the free field parametrization is that we need to implement derivatives and coordinates in continuum space as an extra structure at the local level using constraints, which allows well-defined products of fields and its derivatives at the same point in the continuum space. For an Hamiltonian which depends on the field derivatives up to first order, the constraints are
Crucially, due to the Bianchi identity and that the Hamiltonian commutes with the constraints, we have:
The Hamiltonians that depend on the second order (or more, even infinite order) derivative, can be redefined as Hamiltonians that depend on the first order derivative of the fields, by introducing more fields and constraints which will relate some fields with the derivative of other fields. Also, a space-dependent Hamiltonian can be redefined as space-independent one, by introducing extra fields with space-derivatives which are constrained to be the unit in one of the space coordinates.
Note that in classical Lagrangian (and Hamiltonian) Field Theory there are also derivatives of fields (through jets of modules [26]), which are consistent with the commutation relations [27] for operator fields and constraints defined above.
We can easily check that these constraints are verified for a wave-function with one creation operator and the field derivatives are given by
Note that in classical Lagrangian (and Hamiltonian) Field Theory there are also derivatives of fields (through jets of modules [26][27]), which are consistent with the commutation relations [27] for operator fields defined in the following.
Thus we can define the Hamiltonian dependent on only the first-oder derivative of the field
These constraints still allow for spontaneous symmetry breaking, which is about the correlation of the fields at two points in space separated by an infinite distance [29].
The resulting time-evolution operator is translation-invariant. There is still no infinite-dimensional, translation-invariant and
We may now define Hamiltonians which are non-polynomial functions of the field
The momentum constraint also generates a gauge symmetry, once we consider a spectral measure where the fields are functions of space. Then the momentum constraint always modifies the spectral measure and so we have a complete unconstrained gauge-fixing.
Since all operators must be invariant under a translation in space, how can we define local operators? In rigor we can’t, but we can define operators which are linear combinations of local operators, which behave effectively as local operators. Consider the operator
The (symmetric) Weyl ordering of operators [30][31] conserves the exponential of operators, unlike normal-ordering [32]. This is an important property of the ordering, because we use often the Trotter product formula [33][34] (e.g. in the time-evolution operator).
However, the Weyl ordering is not appropriate for the second-quantization of an infinite-dimensional Hilbert space, since it leads to infinite vacuum expectation values of quadratic operators. Thus, we introduce the Quadratic Ordering which is a kind of “incomplete” normal ordering, which conserves the exponential of operators. That is, we only order operators which conserve the number operator such that these operators are a product of quadratic operators (where each quadratic operator conserves the number operator and has null vacuum expectation value). All other operators can be generated from these products of quadratic operators and operators with exclusively creation (or exclusively annihilation) operators which are automatically ordered.
The quadratic operators are normalized by normal operators acting in the Hilbert space
In the following, we solve the hardest part of the Millenium Prize problem “Yang-Mills existence and mass gap”, by showing that if the Hamiltonian of the Yang-Mills theory is positive-definite, then Yang-Mills theory theory can be reformulated with or without a mass gap without any observable consequences. To solve the remaining part of the Millenium Prize problem “Yang-Mills existence and mass gap”, we need to define with mathematical rigor the Yang-Mills theory within our formalism, prove that it exists and it is non-trivial and show that the Hamiltonian of the Yang-Mills theory is positive-definite, all this we will do in another chapter.
The mass gap is the eigenvalue of the Hamiltonian which is closer to the eigenvalue null corresponding to the vacuum state. We now show that the mass gap (in a field theory defined through the wave-function parametrization as above) cannot be physically measured and it can be modified at will. That is, a bounded from below Hamiltonian with a null mass gap (the free Electromagnetic field, for instance) can be modified to a bounded from below Hamiltonian with an arbitrary mass gap without any observable consequence.
This is so because the commutative algebra of observables is generated by hermitian operators
Note that in our formalism as in Quantum Mechanics, the Hamiltonian can be bounded from below because it is defined a priori with a correct (Quadratic) ordering. On the other hand, already in the second quantization of free fields (and in perturbative Quantum Field Theory in general) the Hamiltonian cannot be bounded from below in an infinite space-time because it is defined using space coordinates where it cannot have a correct ordering of operators[36] (only in momentum coordinates can normal ordering be applied) and some kind of renormalization is needed.
The Hilbert space is the tensor product of the symmetric and antisymmetric Fock spaces
The
We define:
where
The Hamiltonian for the Navier-Stokes equations is defined as:
where
We can check that since
In conclusion, we proved that the solution to the Navier-Stokes differential equation in a standard probability space exists and it is unique. Since a smooth deterministic field can be approximated with an uncertainty as small as we want by the initial conditions in a standard probability space, then we solved the problem of existence and uniqueness of the Navier-Stokes equations (at least as far as it can be solved).