Monoids, weighted automata and algorithmic philosophy of science
June 26, 2013 23 Comments
A precise definition of analytic philosophy is difficult to pin down, and a cheeky characterization might be as “the philosophic tradition that has its roots in British thought of the late 19th and early 20th centuries and is currently practiced mostly in American departments of philosophy.” Although largely true, there is a certain deeper ‘family resemblance’ between analytic philosophers in seeking clarity of argument through the application of (often formal) logic and a respect for mathematics and the natural sciences. At times this focus might seem as silly as mathematical physicist John Baez writes:
[I]n the US many academic philosophers have “mathematics envy”, just as economists have “physics envy”. They admire the precision and rigor of mathematics, especially mathematical logic. But ask them to compute the cohomology of a CW complex, or solve an elliptic PDE, and they get cold feet.
But it is this envy that cstheorists can leverage to enter into modern philosophy. This is not to say that I am comfortable with cohomology or elliptic PDEs — heck, even a large system of ODEs will have me reaching for a warm pair of socks — but a theory oriented computer science education leaves one relatively comfortable with math. More importantly, the experience of regularity conversing with a computer is great preparation for formalizing the ‘obvious’.
Of course, I am not the first person to push for this sort of philosophy. As I alluded in the opening, I believe the early work of computational pioneers like Church, Godel, Post, Turing, von Neumann and others was fundamentally scientific and philosophical in nature. However, I believe that much of this spirit was lost to the seduction of engineering and technology and is only now being recaptured with the work of thinkers like Aaronson, Chaitin, Deutsch, de Wolf, Papadimitrious, and Valiant (to name the few whose work I am familiar with). For some exemplary work, I recommend Ronald de Wolf’s master’s thesis Philosophical Applications of Computational Learning Theory (1997), and Scott Aaronson’s Why Philosophers Should Care About Computational Complexity (2011) and recent meditation on free-will (2013).
Traditional philosophers were not passive in their “mathematics envy”, either. German philosophers pioneered mathematical philosophy and pushed it to the point of creating the Munich Center for Mathematical Philosophy. They even have an introductory Coursera course starting on July 29th:
They are extending analytic philosophy past formal logic, and touching on domains of particular interest to me such as game theory. But, they are not using the insights of theoretical computer science, and only using the computational tools of simulation — a roadmap that I fear will lead their expedition to the curse of computing instead of its grail. Although I am looking forward to taking their course — and hope that you will join me in the G+ study group — I don’t think Leitgeb and Hartmann are guiding analytic philosophy in the same direction as a theoretical computer scientist would.
Scientific inquiry as machine learning
To start exploring algorithmic philosophy of science, the last tenet of analytic philosophy that we need — and the only part of logical positivism that continues to resonate with me — is a deference to science on what is foundational or fundamental, and a focus on maintaining a minimally restrictive metaphysics. As such, if we want to look at the philosophy of science, we should begin with an instrumentalist or operationalist perspective. The latter stance originated with Percy Williams Bridgman (1927) The Logic of Modern Physics and most of its subsequent impact was in psychology (rather than physics) where it was introduced by Stevens (1935a,b) and Hull (1943) as a relaxation of Watson and Skinner’s behaviorism. In fact, in physics it is often blamed for the anti-philosophical stance of “shut-up and calculate”, although to me it seems essential to modern work on foundations of quantum mechanics. However, we will use it primarily because of its agnostic treatment of realism and leave the metaphysical questions of ‘what is really real‘ out of this philosophy of science.
Through the lens of operationalism, a scientist or naturalist sets up an experiment or takes note of conditions leading up to their observation and then measures — either actively or passively — some quantity. We can think of the experiment (or notes on conditions) as a string of symbols from some alphabet of basic experimental (or observational) steps. Let us assume that this alphabet is fixed and finite, but allow for arbitrarily long experimental descriptions; if you need a more concrete proxy then think of the alphabet as English words (or if that isn’t fixed enough, then letters) and the strings as the method sections of papers (although in reality we would need a slightly better behaved description of experiments; English is just too quirky). A measurement in just a real number that corresponds to some quantity of interest, however we will at times restrict our measurements to even simpler binary indicators.
In this setting, the world is just a mapping from experimental procedures to measurements. If our set of basic experimental observations is then any experiment is described by a word . Thus, the real world is the functions:
And we hope to learn some hidden structure underlying this mapping. Since I am only interested in mechanistic theories, I will assume that no magic is happening and our theory is expressible as terms in lambda calculus. Now suppose that this structure corresponds to some states that map from an initial configuration given by lambda-term , and is measured by some function to the reals. In other words (reading our operations from left to right as a programming language, not right to left as is typical for functions):
and the goal of a theoryfull scientist is to learn and the state space it operates on (if that can be represented as something simpler than all possible lambda-terms). Note that this does not assume realism, since does not need to describe a ‘real’ state of the world, but could just specify our description; if you are thinking in terms of quantum mechanics — could just specify a transformation on the Hilbert space with no philosophical assumptions about the Hilbert space being ‘real’. However, we will ask these states to be well behaved in the sense that we can think of experimental procedures acting on them step by step. In particular, if our instructions then . Our hidden world has some configuration after that is independent of if is followed by a measurement or other experimental manipulation . We will also ask for a ‘nothing’ experimental operation that corresponds to the identity on our space .
The above two assumptions on mean that it has the structure of a monoid. From representation theory, we know that the actions of a monoid with basis can be represented by a set of matrices acting on a real vector space. Therefore, our hypothesis class is restricted the set of vectors , in some real vector space, and matrices acting on that vector space. Since we can only describe a finite amount of information, we might as well make our last restriction: the underlying vector space is finite dimensional.
For a given alphabet and finite dimensional vector space we can define a weighted automaton (Esik & Kuich 2009; Mohri, 2009) on it as the tuple:
Here is the initial state, is a final measurement, and for each we have a corresponding transition matrix . The dimensionality of the underlying vector space is the size of the machine, i.e. . The world described by this machine is:
Weighted automata are the perfect hypothesis class to describe an arbitrary real world within our assumptions. Monoids are central, but our last restriction on the existence of a finite dimensional representation means that only some monoids are valid hypotheses. Thankfully, we can qualify this condition in terms of the basic structure of our world . Let us introduce the Hankel matrix , defined as , to give us the theorem:
if and only if there exists a weighted automata representation of of size .
Inherent in the proof of the above theorem is a way to construct the weighted automaton corresponding to from the Hankel matrix for . Unfortunately, the Hankel matrix is an infinite object but its rank gives us finite handles with which to wrangle it in.
The best part about weighted automata is that they work not only over the reals, but any field (actually, any semiring in the most general treatments, but the linear algebra becomes more hairy). As such, if we want to compare to more standard models of computation then we can replace by either the Boolean semiring or the field of two elements ( for short).
In this case, we can think of measurements as yes-no questions, and our real world as a function:
And simply consider the subset on which is . In that case, we can see what restrictions our common sense assumptions put on the operationalist theories we can represent: our functions can only be the indicators for regular languages. This might seem depressingly simple at first, but it actually takes us into a rich area of computational learning theory. Just like Valiant (2009) framed evolution (and ecorithms more generally) as a formal subset of machine learning, algorithmic philosophy allows us to look at the act of scientific inquiry as a formal subset of machine learning. This lends credibility to Valiant’s view of human activity as ecorithms. I will deal with implications, limitations, and critiques of this viewpoint in future posts. I hope that you will join me for the rest of this expedition!
Aaronson, S. (2011). Why philosophers should care about computational complexity. arXiv preprint arXiv:1108.1791.
Aaronson, S. (2013). The Ghost in the Quantum Turing Machine. arXiv preprint arXiv:1306.0159.
de Wolf, R. M. (1997). Philosophical Applications of Computational Learning Theory: Chomskyan Innateness and Occzam’s Razor (Master’s thesis, Erasmus Universiteit Rotterdam).
Esik, Z., & Kuich, W. (2009). Finite Automata. Handbook of Weighted Automata, 69-104.
Hull, C. L. (1943). Principles of behavior: An introduction to behavior theory.
Mohri, M. (2009). Weighted automata algorithms Handbook of Weighted Automata, 213-254 DOI: 10.1007/978-3-642-01492-5_6
Stevens, S. S. (1935a). The operational basis of psychology. The American Journal of Psychology, 47(2), 323-330.
Stevens, S. S. (1935b). The operational definition of psychological concepts. Psychological Review, 42(6), 517.
Valiant, L.G. (2009) Evolvability. Journal of the ACM 56(1): 3.