# Monoids, weighted automata and algorithmic philosophy of science

The Algorithmic Thinkers
original art by Auguste Rodin & Eric Joyner
modified by Kate Zen

If pressed to find a passion and unifying theme behind my interests, I would say that my goal is to emancipate theoretical computer science from the current tyranny of technology and engineering, and restore it to its original position of asking and helping find answers for fundamental questions in science and philosophy. I’ve already written on progress toward an algorithmic theory of biology, wherein I permitted myself to foray into the philosophy of science. I want to continue the expedition with this post because I think that cstheory can be painlessly integrated into philosophy as an extension of analytic philosophy — algorithmic philosophy.

A precise definition of analytic philosophy is difficult to pin down, and a cheeky characterization might be as “the philosophic tradition that has its roots in British thought of the late 19th and early 20th centuries and is currently practiced mostly in American departments of philosophy.” Although largely true, there is a certain deeper ‘family resemblance’ between analytic philosophers in seeking clarity of argument through the application of (often formal) logic and a respect for mathematics and the natural sciences. At times this focus might seem as silly as mathematical physicist John Baez writes:

[I]n the US many academic philosophers have “mathematics envy”, just as economists have “physics envy”. They admire the precision and rigor of mathematics, especially mathematical logic. But ask them to compute the cohomology of a CW complex, or solve an elliptic PDE, and they get cold feet.﻿

But it is this envy that cstheorists can leverage to enter into modern philosophy. This is not to say that I am comfortable with cohomology or elliptic PDEs — heck, even a large system of ODEs will have me reaching for a warm pair of socks — but a theory oriented computer science education leaves one relatively comfortable with math. More importantly, the experience of regularity conversing with a computer is great preparation for formalizing the ‘obvious’.

Of course, I am not the first person to push for this sort of philosophy. As I alluded in the opening, I believe the early work of computational pioneers like Church, Godel, Post, Turing, von Neumann and others was fundamentally scientific and philosophical in nature. However, I believe that much of this spirit was lost to the seduction of engineering and technology and is only now being recaptured with the work of thinkers like Aaronson, Chaitin, Deutsch, de Wolf, Papadimitrious, and Valiant (to name the few whose work I am familiar with). For some exemplary work, I recommend Ronald de Wolf’s master’s thesis Philosophical Applications of Computational Learning Theory (1997), and Scott Aaronson’s Why Philosophers Should Care About Computational Complexity (2011) and recent meditation on free-will (2013).

Traditional philosophers were not passive in their “mathematics envy”, either. German philosophers pioneered mathematical philosophy and pushed it to the point of creating the Munich Center for Mathematical Philosophy. They even have an introductory Coursera course starting on July 29th:

They are extending analytic philosophy past formal logic, and touching on domains of particular interest to me such as game theory. But, they are not using the insights of theoretical computer science, and only using the computational tools of simulation — a roadmap that I fear will lead their expedition to the curse of computing instead of its grail. Although I am looking forward to taking their course — and hope that you will join me in the G+ study group — I don’t think Leitgeb and Hartmann are guiding analytic philosophy in the same direction as a theoretical computer scientist would.

### Scientific inquiry as machine learning

To start exploring algorithmic philosophy of science, the last tenet of analytic philosophy that we need — and the only part of logical positivism that continues to resonate with me — is a deference to science on what is foundational or fundamental, and a focus on maintaining a minimally restrictive metaphysics. As such, if we want to look at the philosophy of science, we should begin with an instrumentalist or operationalist perspective. The latter stance originated with Percy Williams Bridgman (1927) The Logic of Modern Physics and most of its subsequent impact was in psychology (rather than physics) where it was introduced by Stevens (1935a,b) and Hull (1943) as a relaxation of Watson and Skinner’s behaviorism. In fact, in physics it is often blamed for the anti-philosophical stance of “shut-up and calculate”, although to me it seems essential to modern work on foundations of quantum mechanics. However, we will use it primarily because of its agnostic treatment of realism and leave the metaphysical questions of ‘what is really real‘ out of this philosophy of science.

Through the lens of operationalism, a scientist or naturalist sets up an experiment or takes note of conditions leading up to their observation and then measures — either actively or passively — some quantity. We can think of the experiment (or notes on conditions) as a string of symbols from some alphabet of basic experimental (or observational) steps. Let us assume that this alphabet is fixed and finite, but allow for arbitrarily long experimental descriptions; if you need a more concrete proxy then think of the alphabet as English words (or if that isn’t fixed enough, then letters) and the strings as the method sections of papers (although in reality we would need a slightly better behaved description of experiments; English is just too quirky). A measurement in just a real number that corresponds to some quantity of interest, however we will at times restrict our measurements to even simpler binary indicators.

In this setting, the world is just a mapping from experimental procedures to measurements. If our set of basic experimental observations is $\Sigma$ then any experiment is described by a word $w \in \Sigma^*$. Thus, the real world is the functions:

$f: \; \Sigma^* \rightarrow \mathbb{R}$

And we hope to learn some hidden structure underlying this mapping. Since I am only interested in mechanistic theories, I will assume that no magic is happening and our theory is expressible as terms in lambda calculus. Now suppose that this structure corresponds to some states that map from an initial configuration given by lambda-term $\alpha$, and is measured by some function $\omega$ to the reals. In other words (reading our operations from left to right as a programming language, not right to left as is typical for functions):

$f(w) = \alpha F(w) \omega$

and the goal of a theoryfull scientist is to learn $F$ and the state space it operates on (if that can be represented as something simpler than all possible lambda-terms). Note that this does not assume realism, since $F(w)$ does not need to describe a ‘real’ state of the world, but could just specify our description; if you are thinking in terms of quantum mechanics — $F(w)$ could just specify a transformation on the Hilbert space with no philosophical assumptions about the Hilbert space being ‘real’. However, we will ask these states to be well behaved in the sense that we can think of experimental procedures acting on them step by step. In particular, if our instructions $w = uv$ then $F(uv) = F(u)F(v)$. Our hidden world has some configuration after $u$ that is independent of if $u$ is followed by a measurement or other experimental manipulation $v$. We will also ask for a ‘nothing’ experimental operation $\epsilon$ that corresponds to the identity on our space $F(\epsilon) = \mathrm{\lambda x.x}$.

The above two assumptions on $F$ mean that it has the structure of a monoid. From representation theory, we know that the actions of a monoid with basis $\Sigma$ can be represented by a set of matrices $\{M_\sigma\}_{\sigma \in \Sigma}$ acting on a real vector space. Therefore, our hypothesis class is restricted the set of vectors $\alpha$, $\omega$ in some real vector space, and matrices $\{M_\sigma | \sigma \in \Sigma\}$ acting on that vector space. Since we can only describe a finite amount of information, we might as well make our last restriction: the underlying vector space is finite dimensional.

For a given alphabet $\Sigma$ and finite dimensional vector space $\mathbb{R}^n$ we can define a weighted automaton (Esik & Kuich 2009; Mohri, 2009) on it as the tuple:

$A = \langle \alpha, \omega \in \mathbb{R}^n, \{M_\sigma \in \mathbb{R}^{n \times n} | \sigma \in \Sigma \} \rangle$

Here $\alpha$ is the initial state, $\omega$ is a final measurement, and for each $\sigma \in \Sigma$ we have a corresponding transition matrix $M_\sigma$. The dimensionality of the underlying vector space is the size of the machine, i.e. $\text{size}(A) = n$. The world $f_A$ described by this machine is:

$f_A(\sigma_1...\sigma_m) = \alpha M_{\sigma_1} ... M_{\sigma_m} \omega$

Weighted automata are the perfect hypothesis class to describe an arbitrary real world within our assumptions. Monoids are central, but our last restriction on the existence of a finite dimensional representation means that only some monoids are valid hypotheses. Thankfully, we can qualify this condition in terms of the basic structure of our world $f$. Let us introduce the Hankel matrix $H_f: \Sigma^* \times \Sigma^* \rightarrow \mathbb{R}$, defined as $H(u,v) = f(uv)$, to give us the theorem:

$\mathrm{rank}(H_f) \leq n$ if and only if there exists a weighted automata representation of $f$ of size $n$.

Inherent in the proof of the above theorem is a way to construct the weighted automaton corresponding to $f$ from the Hankel matrix $H_f$ for $f$. Unfortunately, the Hankel matrix is an infinite object but its rank gives us finite handles with which to wrangle it in.

The best part about weighted automata is that they work not only over the reals, but any field (actually, any semiring in the most general treatments, but the linear algebra becomes more hairy). As such, if we want to compare to more standard models of computation then we can replace $\mathbb{R}$ by either the Boolean semiring or the field of two elements $\mathbb{GF}(2)$ ($\mathbb{Z}_2$ for short).

In this case, we can think of measurements as yes-no questions, and our real world as a function:

$f: \Sigma^* \rightarrow \{0,1\}$

And simply consider the subset $L_f \subseteq \Sigma^*$ on which $f$ is $1$. In that case, we can see what restrictions our common sense assumptions put on the operationalist theories we can represent: our functions can only be the indicators for regular languages. This might seem depressingly simple at first, but it actually takes us into a rich area of computational learning theory. Just like Valiant (2009) framed evolution (and ecorithms more generally) as a formal subset of machine learning, algorithmic philosophy allows us to look at the act of scientific inquiry as a formal subset of machine learning. This lends credibility to Valiant’s view of human activity as ecorithms. I will deal with implications, limitations, and critiques of this viewpoint in future posts. I hope that you will join me for the rest of this expedition!

### References

Aaronson, S. (2011). Why philosophers should care about computational complexity. arXiv preprint arXiv:1108.1791.

Aaronson, S. (2013). The Ghost in the Quantum Turing Machine. arXiv preprint arXiv:1306.0159.

de Wolf, R. M. (1997). Philosophical Applications of Computational Learning Theory: Chomskyan Innateness and Occzam’s Razor (Master’s thesis, Erasmus Universiteit Rotterdam).

Esik, Z., & Kuich, W. (2009). Finite Automata. Handbook of Weighted Automata, 69-104.

Hull, C. L. (1943). Principles of behavior: An introduction to behavior theory.

Mohri, M. (2009). Weighted automata algorithms Handbook of Weighted Automata, 213-254 DOI: 10.1007/978-3-642-01492-5_6

Stevens, S. S. (1935a). The operational basis of psychology. The American Journal of Psychology, 47(2), 323-330.

Stevens, S. S. (1935b). The operational definition of psychological concepts. Psychological Review, 42(6), 517.

Valiant, L.G. (2009) Evolvability. Journal of the ACM 56(1): 3.

From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

### 25 Responses to Monoids, weighted automata and algorithmic philosophy of science

1. Martino says:

I’m not familiar with much of the mathematics, but it sounds fascinating.
(Note: twice in the text there are formulas that fail to be interpreted as latex and are shown within ).

• Thank you!

I fixed the LaTeX typos, WordPress has this annoying feature where you must write dollar sign followed by ‘latex’ instead of just opening with a single dollar sign and when I type quickly I tend to forget.

2. ishanuc says:

“our functions can only be the indicators for regular languages.”–

are you referring to the fact that regular languages are decidable while CFLs are not?

Also, dont you think the notion of the physical experiment as $\Sigma^\star \rightarrow \mathbb{R}$ is too simplistic? What about experiments that yield probabilistic outcomes…. wont it be better to think of experiments mapping words to a probability measure?

• are you referring to the fact that regular languages are decidable while CFLs are not?

No, finite dimensional weighted automata recognize the set of regular languages (i.e. WA are just another way of writing finite automata). Of course, as you note, if we had a more powerful language class then most problems we would want to ask about theories would be undecidable. However, we are not getting this is just a convenient feature, and not a built in assumption. The actual assumptions that lead to this are the monoid structure on how we write down experimental procedures. One could easily object here that we should employ a much richer grammar for writing down experiments and I have no good response to that apart from saying: “but it works so much nicer if we use this, so why not start here?”

We could try to argue for using the free monoid on $\Sigma$ as our way of writing experiments by saying: “if we use something more complicated then answering most questions about theories will become impossible”. Unfortunately, I prefer not to take this route, since there is the old economist defense: “well, when I say that we are looking at a more complicated structure, I don’t mean any arbitrary (or adversarially) chosen members of that structure, but just the ones that ‘play nice’.” I call this the economist defense because it is how the argue that complexity theoretic results about equilibria in game theory and models of economies are irrelevant. Of course, economists are not the only (or even the first) to make these arguments. I am sure plenty of linguists argued something similar after Gold (and I would have joined them).

I don’t think $\Sigma^* \rightarrow \mathbb{R}$ is too simplistic, or at least I don’t think “lets put something more complicated than $\mathbb{R}$ on the RHS is a powerful objection. A probability is just another measurement: a real number that happens to be between 0 and 1, so WA over reals can work with them. But if you insist on the replacement then as I mentioned, WA are pretty robust and we can replace the right hand side by any finite set or infinite set that can form a module over our underlying semiring.

The more serious objection in my opinion though, is If you are getting a probability measurement out then you probably have an instruction like “repeat 1000 times” somewhere in your experiment. This means that your measurement is noisy (although maybe you can make the noise pretty small). Unfortunately, when we work with weighted automata, we are usually looking at properties like rank, and it is a simple fact that if we have a rank deficient matrix and we perturb its entries with random noise then the matrix jumps up to full rank with high probability (over the reals or any noncountable set, it would be with probability 1). Thus, the real issue is how to deal with noisy measurement, in practice we can just introduce some sort of regularization and ask: “is there a matrix of a significantly lower rank ‘near’ our measured data?”

Note: I hope you don’t mind, but I edited your comment to make your latex render. WordPress has this annoying feature where instead of just writing dollar-sign to open math-mode you need to write dollar-sign-‘latex’ all as one word. Closing math mode is still done with a single dollar-sign.

3. Starry D says:

A correction: “last tenant” should be “last tenet”.

A tenant occupies a rental property.

A tenet is a principal or belief.

4. Starry D says:

But you’re not alone: other examples of this eggcorn: http://eggcorns.lascribe.net/english/300/tenant/

5. isaacthanghoe says:

Reblogged this on Sleepless With My Endeavours and commented:
Very good post by Artem Kaznatcheev. :D

6. JohnnyB says:

Thank you, this was wonderful.