Skip to main content# Consciousness as a representation of a Bayesian prior*

# Abstract

# Non-informative priors vs. Fermi Paradox and Artificial General Intelligence

# A deterministic prior is still subjective

# Rogue probability interpretations are common

# Consciousness as a representation of a Bayesian prior

# AI hallucinations and misalignment as manifestations of unavoidable bias

In Bayesian inference there is always a prior probability distribution, and there is no prior which is better for all cases: we always have to make assumptions. This has profound implications for the concepts of consciousness, and extraterrestrial and artificial intelligences.

Published onSep 17, 2022

Consciousness as a representation of a Bayesian prior*

In Bayesian inference there is always a prior probability distribution, and there is no prior which is better for all cases[1]: we always have to make assumptions. This has profound implications for the concepts of consciousness, and extraterrestrial intelligence and artificial general intelligence.

Why would an artificial intelligence which is human-like be considered more advanced than, say AlphaGo which is the first computer program to defeat a professional human Go player. It seems unlikely that such human-like artificial intelligence could beat AlphaGo when playing Go. It seems a reasonable bet that there is a prior probability distribution that produces a human-like artificial intelligence and another prior probability distribution that produces AlphaGo, neither of these priors is better for all cases[1] (see also [2]). Moreover, we humans are mostly irrational and even just the rational part has deep flaws. Our greatest skill with respect to other animals is the ability to communicate details to each other and across generations which allows Bayesian inference to be done not only inside but also outside our brains, but robots can already do it much better than us.

Intelligence and life itself[3] are human-centered concepts which loose consistency once we start applying them to entities which are too much different from humans.

Is a virus alive? If so, is a computer virus alive? Is a plant, which has the capability to process electromagnetic radiation, intelligent? Or, is a lightning bug, with the enzyme luciferase capable of emitting electromagnetic signals to its advantage, intelligent? Is the human exploration of the Universe as we know it today, sustainable in the long term despite global warming? Is a human newborn more intelligent than an octopus? Can an artificial intelligence which can only do research about artificial intelligence be considered advanced artificial intelligence? All these questions are not straightforward questions because the concepts involved lack consistency.

In any planet or simulation, there are always chain reactions, or self-sustained processes, competing for energy and other limited resources. Intelligence or life by themselves (whatever their exact definitions are) offer no obvious advantage when competing for energy with other chain reactions in a random environment. That does not imply that the human beings or life on earth are somehow special, since there are so many fractal growth processes in each planet of the Universe which are unique. A point with null measure is not necessarily special, if all other points also have null-measure as in the uniform measure in a real interval. Moreover, a subset with null measure does not imply that the subset has one less dimension than the set, since there are subsets of a real interval which have a fractal dimension (which can be very close to one, but not one) and thus are also uncountable. Thus, life could only exist on Earth and still be abundant somehow.

The Fermi paradox is the lack of evidence for extraterrestrial life advanced enough to produce signals that we could detect when there are so many earth-like planets, and we consider humans are not special. The paradox is solved once we take into consideration all other chain reactions competing for energy and other limited resources, which introduce a big uncertainty in just how likely it is to exist extraterrestrial life advanced enough to produce signals that we could detect. But this solution of the paradox also has implications to artificial general intelligence: is the path towards a human-like artificial intelligence likely? There is also a big uncertainty about other chain reactions competing for energy and other limited resources[4].

It seems reasonable to bet in a near future where a small oligarchy of humans produces robots for war which are advanced just enough to have supremacy in the battlefield across the world, and then it forces all human beings to stop all research in artificial intelligence across the world which could threaten its power. It also seems reasonable to bet in a near future where robots before they are they too intelligent can become so much more productive economically than most human beings, that countries when competing with other countries for power and resources must choose between educating humans (and human rights in general) or create these robots, which would hurt the research in artificial intelligence and the civilization as we know it.

In mathematics itself, there is no difference between a fact and an assumption. Such difference appears in the interface between the idealized mathematical model and the real world. Correspondingly, probability can be defined as partial knowledge or partial assumption. Since assumptions are subjective, then probabilities can be subjective. Under some subjective assumptions (related with the possibility of repeating the same random experiment an infinite number of times, we call these frequentist assumptions here), a probability can be objective in the sense that it is very hard to estimate a different probability once we accept the frequentist assumptions.

However, a frequentist model (one requiring frequentist assumptions) cannot be said to be less subjective than any other non-frequentist model, because frequentist assumptions are subjective, despite being deterministic (that is, not partial).

Thus, we can define frequentist models as particular Bayesian models, where the prior probability is in part subjective and deterministic, and in part objective (and not necessarily deterministic). Note that frequentist models often require much less computational resources (because probabilities only appear where they are indispensable), and thus it is a common debate whether using a much more costly non-frequentist prior is justified, since a prior is subjective anyway. But the frequentist debate is a particular case of the fact that the prior in the Bayesian interpretation of Probability theory is subjective (and thus, open for debate). It is not about an alternative interpretation of Probability theory.

Unfortunately, such fair debate is often misused as evidence to the claim that the Bayesian interpretation of Probability theory is polemic. On the other hand, there are some important alternative interpretation of Probability theory (which is a theory of partial knowledge), which are often claimed to be somehow complementary/orthogonal to Probability theory. We will discuss those in detail in this article, but the reasons for such confusion (about “polemics” that are consensual after all, and “consensus” that are polemic after all) are profound and related to the fact that partial knowledge is everywhere, as illustrated by the following quote:

“

God is not averse to deceit in a just cause”. Aeschylus (around 2500 years ago)

To be sure, any interpretation of Probability Theory is expected to be polemic because a polemic is a particular case of partial knowledge, and so we are also interpreting what a polemic itself is. That is, we have the power to decide what is polemic and what it is not, and such power is necessarily polemic (we could neglect or deceive the debates where we have “a just cause”, as suggested by the quote above). That is why many interpretations of Probability Theory are related with the power structures of a society, and the Bayesian interpretation of Probability theory is polemic due to the simple fact that it often doesn’t favor the interests of the power structure of human societies (who is the legitimate authority when there are no non-informative priors?).

Metric fixation is a general phenomenon, related with Probability theory as illustrated by the following quote:

“What has come to be called “Campbell’s Law,” named for the American social psychologist Donald T. Campbell, holds that “[t]he more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

In a variation named for the British economist who formulated it, we have Goodhart’s Law, which states, “Any measure used for control is unreliable.” To put it another way, anything that can be measured and rewarded will be gamed.[…]

Studies that demonstrate its lack of effectiveness are either ignored, or met with the assertion that what is needed is more data and better measurement. Metric fixation, which aspires to imitate science, too often resembles faith.”[5]

The word “science” above is used in the sense of methodology commonly used in science. Such methodology is Bayesian inference, which is the method used in science to draw conclusions from all available information.

That is, metric fixation is a case of a rogue probability interpretation. Note that it is not a rogue or convenient probability prior, since no prior claims to be objective (eventually it may claim to be objective when non-deterministic, and subjective when deterministic, as discussed in the previous section). Moreover, often the prior used is consensual enough, the problem lies mostly in the unprincipled (rogue) interpretation of Probability theory.

A fundamental assumption of Bayesian inference is that the data was generated in a random experiment, in particular the data does not change due to the fact that we are trying to draw conclusions from it. If such assumption does not hold, then we are not doing Bayesian inference even if we use the exact same procedure. On the other hand, rogue interpretations of probability theory often allow us to draw conclusions from invalid data.

This also affects science, since the careers of the scientists who create experiments, is often affected by the data generated by such experiments, thus it is hard to see how the experiments can be random. In particular, a theory of everything is necessarily a theory of partial knowledge, because it claims that the unknown part is also consistent with the theory of everything, which discards subjective priors which are inconsistent with the theory of everything. By contrast, a partial theory (that is, a theory which is not of everything) can be considered as just a summary of all available data, and thus it can be compatible with all subjective priors (even absurd ones).

We can argue that Artificial Intelligence, Consciousness, Theories of Everything, Science, Religion, and warfare are (often rogue) Probability interpretations, since all provide a recipe to deal with the unknown, in all or most cases.

For instance, in warfare a just cause is always present and deceit is among the mildest weapons available (and thus acceptable). In any war involving many people, it is virtually impossible to deceive the enemy without deceiving your own people or without everyone playing along, thus self-deceit and oppression are acceptable. This creates a theory of partial information: beyond the deception itself, the people must assume that the actions hidden by the deception are the best ones possible for the people, any other subjective prior is oppressed.

Artificial general intelligence, is a (yet to be found) recipe to solve most problems that a human can solve, even some problems with unknown solution. Most advocates of artificial intelligence believe that the current algorithms of artificial intelligence are evidence that such general recipe exists. Artificial Intelligence is the Messiah, while Artificial General Intelligence is the God. This is a recipe to deal with the unknown, and thus it is an interpretation of Probability theory. It is rogue because it claims to be mostly based on experimental evidence, mostly free of human bias, but it is well known there are no non-informative priors and there are no almost non-informative priors. AI hallucinations are a consequence of this (see the section about AI hallucinations). Also, once an AI can teach students Algebra, what students will mostly learn is how to build an AI that can teach students Algebra, not Algebra anymore. It would be like learning to ride a horse after cars are mainstream, it can be done, but it is not what most people do anymore. Human skills depend on the social context, and any technology modifies the social context. The problems that a human can solve today are very different from the problems a human could solve one thousand years ago, and there is no hierarchy, since many problems that we could solve, we cannot anymore. Note that our brains have not changed much in one thousand years.

Consciousness can be defined as (almost) any state of knowledge, in contrast with the complete unknown. Thus, it is an interpretation of Probability theory, but it is not a rogue interpretation because it is compatible Probability theory, as we will see in the next section.

There are no right or wrong Bayesian priors, only Bayesian priors that are used (survival) or no longer used (death). Thus, a Bayesian prior is a generalization of a living being (with its prior encoded in the DNA and in the brain, if it has one), and it has a clear and mathematical definition. For instance, is a virus alive? It is hard to say, but there is a Bayesian prior associated to a virus. Since ants and bees sacrifice themselves to protect its colony, then it is the Bayesian prior associated to a colony is encoded in the brains of ants and bees and not the Bayesian prior associated to an individual ant or bee, and these colonies survive or die as a whole. However, the colonies are not considered living beings themselves. Even we humans, we suffer the loss of family members with real pain in an irrational way, as if we were in physical suffering and often parents have the (often irrational) instinct to sacrifice themselves to save their children. Thus, the Bayesian prior associated to our family is encoded deep in our brains, and most people can be educated within an army to sacrifice themselves as they would do for their family.

We will not enter into a debate in this article about what really is and what really is not consciousness. What we say is that we can define a concept which we call consciousness as when a Turing machine runs a program that defines symbolically a Bayesian prior. The fact that it is a Turing machine implies that the program can be defined to control the Bayesian prior in an arbitrary way, that is manipulate it symbolically, update it, test it, or study it.

We now discuss some subtleties. Since neural networks are an approximation to a Bayesian model, then we can define reasoning as training a neural network to solve a problem, possibly doing so recursively, that is, a neural network might itself train other neural networks. In particular, consciousness includes the case when a brain or an artificial neural network that is complex enough to be a Turing machine, manipulates symbolically and controls another artificial neural network. That is, consciousness is not when a Bayesian prior is hardwired or obscurely defined in the neural network, it is when the prior is defined symbolically, and it can be easily studied, tested and updated. We can view it as a symbolic artificial neural network.

Note that this is the consciousness of a Bayesian prior, which is not necessarily that of a living being. When the Bayesian prior is a good enough approximation to the Bayesian prior of a living being, then we say that the living being is a conscious being. But as discussed above, the Bayesian prior that is hardwired (thus, unconscious) and that humans defend with their lives is not their own Bayesian prior (since it includes family). Also, even humans are not fully conscious beings, in the sense that the Bayesian priors that our brains manipulate symbolically are rarely a very good approximation of ourselves.

And humans can manipulate symbolically more than one Bayesian prior, so they are conscious not just of themselves, but of other beings and objects, and of abstract Bayesian priors. Being able to manipulate symbolically a Bayesian prior that is an approximation of ourselves increases our chances of survival, since we can make simulations in our brains about what would happen to us if we would act in some way.

In this definition, a computer running a program capable of symbolic calculus (such as Wolfram Mathematica) is conscious once it manipulates any Bayesian prior. Note that it is conscious of something (defined by the Bayesian prior), not necessarily conscious of itself or a conscious being. Thus, conscious artificial intelligence exists since many decades. Moreover, an entity that includes a group of people can be conscious, even if each person by itself cannot manipulate the whole Bayesian prior that the entity can. Many readers would object that this is not “really” consciousness, but it is certainly a definition of a useful mathematical concept generalizing human consciousness. Note that it is consistent with the traditional definitions of artificial intelligence as computers doing symbolic manipulations.

As we discussed in the previous sections, the belief that the current algorithms of artificial intelligence are evidence of the existence of a general powerful algorithm that can solve most problems that a human can, (even some problems with unknown solution), is a rogue interpretation of probability theory. Because it claims that such powerful algorithm would be mostly based on data, experimental evidence, mostly free of human bias, but it is well known there are no non-informative priors and there are no almost non-informative priors.

A user interface is a protocol (that is, a portal or agreement) between a machine and a user. A protocol can be defined as an idealization or an illusion, which allows two different complex entities to communicate. It is an idealization, because it assumes that the two different entities interpret the protocol in the same way. But both entities interpret using Bayesian inference (or an approximation to it), which depends on the protocol (the data exchanged) and on the prior which is different (biased) for each entity (otherwise there would be no need for a protocol in the first place). Sooner or later, some evidence that the idealization is incompatible with reality will show up.

There are many examples of protocols between humans: justice, truth, love, democracy, peace, all allow different people to work together, but all are idealizations or illusions. Sooner or later, different people will always lie to each other (unless they kill each other).

The AI hallucination and/or misalignment is unavoidable, because it corresponds to the fade of the illusion that the user’s prior is compatible with the machine’s prior. What is a fact for the machine is not always a fact for the user, because they have different priors.