## Parallelism, products, and automata

I have an unreasonable love of alliterations, so I wish I knew a common word for automata that started with P. I even considered calling this post “Paralleilism, Products, Producers”, but feared introducing non-standard nomenclature and confusion into Professor Prakash Panangaden’s class. Hence, the current title; not too bad? If we count “and automata” as an alliteration then I can claim to have introduced an example of parallelism as used in rhetoric right in the title. Unfortunately, the post’s on parallelism in processing; sorry, having too much fun.

Proving that the left half of a regular language is regular was the hardest question on the first assignment. It was also challenge for me as a TA, because I couldn’t find many avenues for hints and advice that didn’t reveal the answer directly. From grading however, I was impressed by the number of students that solved it, but a bit disappointed by those that Googled “half of a regular language” and clicked on the first result like Ben Reichardt’s notes. Since there are solutions already online, I decided that I can give answers, too. Although Prakash provided a full solution set, I thought I would sketch a pedantic treatment of question 5.

One of the best tools from theory of computing is showing that potentially distant seeming languages or even models of computation actually have the same or similar complexity. The regular languages serve as a microcosm to learn and hone these tools. When you first see the boringly serial finite state machines presented, a natural question is: what if I run two DFAs in parallel, is that still a DFA? Well, pedantically no, it doesn’t match the definitions, but in reality — yes, we just have to be more precise.

## Limits on efficient minimization and the helpfulness of teachers.

Two weeks ago, I lectured on how we can minimize and learn deterministic finite state automata. Although it might not be obvious, these lectures are actually pretty closely related since minimization and learning often go hand-in-hand. During both lectures I hinted that the results won’t hold for non-deterministic finite state automata (NFA), and challenged the students to identify where the proofs break down. Of course, these is no way to patch the proofs I presented to carry over to the NFA case, in fact we expect that no efficient algorithms exist for minimizing or learning NFAs.

## How teachers help us learn deterministic finite automata

Many graduate students, and even professors, have a strong aversion to teaching. This tends to produce awful, one-sided classes that students attend just to transcribe the instructor’s lecture notes. The trend is so bad that in some cases instructors take pride in their bad teaching, and at some institutions — or so I hear around the academic water-cooler — you might even get in trouble for being too good a teacher. Why are you spending so much effort preparing your courses, instead of working on research? And it does take a lot of effort to be an effective teacher, it takes skill to turn a lecture theatre into an interactive environment where information flows both ways. A good teacher has to be able to asses how the students are progressing, and be able to clarify misconceptions held by the students even when the students can’t identify those conceptions as misplaced. Last week I had an opportunity to excercise my teaching by lecturing Prakash Panangaen’s COMP330 course.

## Minimizing finite state automata

Computer science is a strange mix of engineering, science, and math. This is captured well by the historic roots of deterministic finite state automata (DFAs). The first ideas that can be recognized as a precursor to DFAs can be found with Gilberth & Gilberth (1921) introducing flow process charts into mechanical and industrial engineering. Independently, McCullock & Pitts (1943) used nerve-nets as a model of neural activity. This preceded Turing’s 1948 entry into brain science with B-type neural networks, and Rosenblatt’s perceptrons (1957). Unlike Turing and Rosenblatt, McCullock & Pitts model did not incorporate learning. However, the nerve nets went on to have a more profound effect on ratiocination because — as Kleene (1951) recognized — they became the modern form of DFAs. Although DFAs are now in less vogue than they were a few decades ago, they are an essential part of a standard computer science curriculum due to their centrality in computer science. Yesterday, I had the privilege of familiarizing students with the importance of DFAs by giving a lecture of Prakash Panangaden’s COMP 330 course. Since Prakash already introduced the students to the theory of DFAs, regular expressions, and languages, I was tasked with explaining the more practical task of DFA minimization.

## Monoids, weighted automata and algorithmic philosophy of science

The Algorithmic Thinkers
original art by Auguste Rodin & Eric Joyner.

If pressed to find a passion and unifying theme behind my interests, I would say that my goal is to emancipate theoretical computer science from the current tyranny of technology and engineering, and restore it to its original position of asking and helping find answers for fundamental questions in science and philosophy. I’ve already written on progress toward an algorithmic theory of biology, wherein I permitted myself to foray into the philosophy of science. I want to continue the expedition with this post because I think that cstheory can be painlessly integrated into philosophy as an extension of analytic philosophy — algorithmic philosophy.

## Machine learning and prediction without understanding

Big data is the buzzword du jour, permeating from machine learning to hadoop powered distributed computing, from giant scientific projects to individual social science studies, and from careful statistics to the witchcraft of web-analytics. As we are overcome by petabytes of data and as more of it becomes public, it is tempting for a would-be theorist to simply run machine learning and big-data algorithms on these data sets and take the computer’s conclusions as understanding. I think this has the danger of overshadowing more traditional approaches to theory and the feedback between theory and experiment.

## Egalitarians’ dilemma and the cognitive cost of ethnocentrism

Ethnocentrism (or contingent altruism) can be viewed as one of many mechanisms for enabling cooperation. The agents are augmented with a hereditary tag and the strategy space is extended from just cooperation/defection to behaviour that can be contingent on if the diad share or differ in their tag. The tags and strategy are not inherently correlated, but can develop local correlations due to system dynamics. This can expand the range of environments in which cooperation can be maintained, but an assortment-biasing mechanism is needed to fuel the initial emergence of cooperation (Kaznatcheev & Shultz, 2011). The resulting cooperation is extended only towards the in-group while the out-group continues to be treated with the cold rationality of defection.

Suppose that circles are the in-group and squares the out-group. The four possible strategies and their minimal representations as finite state machines is given.

The four possible strategies are depicted above, from top to bottom: humanitarian, ethnocentric, traitorous, and selfish. Humanitarians and selfish agents do not condition their behavior on the tag of their partner, and do not require the cognitive ability to categorize. Although this ability is simple, it can still merit a rich analysis (see: Beer, 2003) by students of minimal cognition. By associating a small fitness cost $k$ with categorization, we can study how much ethnocentric (and traitorous) agents are willing to pay for their greater cognitive abilities. This cost directly changes the default probability to reproduce ($\text{ptr}$), with humanitarians and selfish agents having $\text{ptr} = 0.11$ and ethnocentrics and traitorous agents having $\text{ptr} = 0.11 - k$. During each cycle, the $\text{ptr}$ is further modified by the game interactions, with each cooperative action costing $c = 0.01$ and providing a benefit $b$ (that varies depending on the simulation parameters) to the partner. For more detailed presentation of the simulation and default parameter, or just to follow along on your computer, I made my code publicly available on GitHub. Pardon its roughness, the brunt of it is legacy code from when I first build this model in 2009 for Kaznatcheev (2010).

Number of agents by strategy versus evolutionary cycle. The lines represent the number of agents
of each strategy: blue — humanitarian; green — ethnocentric; yellow — traitorous; red — selfish. The width of the line corresponds to standard error from averaging 30 independent runs. The two figures correspond to different
costs of cognition. The left is k = 0.002 and is typical of runs before the cognitive cost phase transition. The right is k = 0.007 and is typical of runs after the cognitive cost phase transition. Figure is adapted from Kaznatcheev (2010).

The dynamics for low $k$ are about the same as the standard no cognitive cost model as can be seen from the left figure above. However, as $k$ increases there is a transition to a regime where humanitarians start to dominate the population, as in the right figure above. To study this, I ran simulations with a set $b/c$ ratio and increasing $k$ from 0.001 to 0.02 with steps of 0.001. You can run your own with the command bcRun(2.5,0.001*(1:20)); some results are presented below, your results might differ slightly due to the stochastic nature of the simulation.

Proportion of humanitarians (blue), ethnocentrics (red), and cooperative interactions (black) versus cognitive cost for b/c = 2.5. Dots are averages from evolutionary cycles 9000 to 10000 of 10 independent runs. The lines are best-fit sigmoids and the dotted lines mark the steepest point; I take take this as the point for the cognitive cost phase transition. Data generated with bcRun(2.5,0.001*(1:20)) and visualized with bcPlot(2.5,0.001*(1:20),[],1)

Each data-point is the average from the last 1000 cycles of 10 independent simulations. The points suggest a phase transition from a regime of few humanitarians (blue), many ethnocentrics (red), and very high cooperation (black) to one of many humanitarians, few ethnocentrics, and slightly less cooperation. To get a better handle on exactly where the phase transition is, I fit sigmoids to the data using fitSigmoid.m. The best-fit curves are shown as solid lines; I defined the point of phase transition as the steepest (or inflection) point on the curve and plotted them with dashed lines for reference. I am not sure if this is the best approach to quantifying the point of phase transition, since the choice of sigmoid function is arbitrary and based only on the qualitative feel of the function. It might be better to fit a simpler function like a step-function or a more complicated function from which a critical exponent can be estimated. Do you know a better way to identify the phase transition? At the very least, I have to properly measure the error on the averaged data points and propogate it through the fit to get error bounds on the sigmoid parameters and make sure that — within statistical certainty — all 3 curves have their phase transition at the same point.

The most interesting feature of the phase transition, is the effect on cooperation. The world becomes more equitable; agents that treat out-groups differently from in-group (ethnocentrics) are replaced by agents that treat everyone with equal good-will and cooperation (humanitarians). However, the overall proportion of cooperative interactions decreases — it seems that humanitarians are less effective at suppressing selfish agents. This is consistent with the free-rider suppression hypothesis that Shultz et al. (2009) believed to be implausible. The result is egalitarians’ dilemma: by promoting equality among agents the world becomes less cooperative. Should one favour equality and thus individual fairness over the good of the whole population? If we expand our moral circle to eliminate out-groups will that lead to less cooperation?

In the prisoners’ dilemma, we are inclined to favor the social good over the individual. Even though it is rational for the individual to defect (securing a higher payoff for themselves than cooperating), we believe it is better for both parties to cooperate (securing a better social payoff than mutual defection). But in the egalitarians’ dilemma we are inclined to favour the individualistic strategy (fairness for each) over the social good (higher average levels of cooperative interactions). We see a similar effect in the ultimatum game: humans reject unfair offers even though that results in neither player receiving a payoff (worse for both). In some ways, we can think of the egalitarians’ dilemma as the population analogue of the ultimatum game; should humanity favor fairness over higher total cooperation?

I hinted at some of these questions in Kaznatcheev (2010) but I restrained myself to just $b/c = 2.5$. From this limited data, I concluded that since the phase transition happens for $k$ less than any other parameter in the model, it must be the case that agents are not willing to invest much resources into developing larger brains capable of categorical perception just to benefit from an ethnocentric strategy. Ethnocentrism and categorical perception would not have co-evolved, the basic cognitive abilities would have to be in place by some other means (or incredibly cheap) and then tag-based strategies could emerge.

Value of k at phase transition versus b/c ratio. In blue is the transition in proportion of humanitarians, red — proportion of ethnocentrics, and black – proportion of cooperative interactions. Each data point is made from a parameter estimate done using a sigmoid best fit to 200 independent simulations over 20 values of k at a resolution of 0.001.

Here, I explored the parameter space further, by repeating the above procedure while varying the $b/c$ ratio by changing $b$ from 0.02 to 0.035 in increments of 0.0025 while keeping $c$ fixed at 0.01. Unsurprisingly, the transitions for proportion of ethnocentrics and humanitarians are indistinguishable, but without a proper analysis it is not clear if the transition from high to low cooperation also always coincides. For $b/c > 2.75$, agents are willing to invest more than $c$ before the phase transition to all humanitarians, this invalidates my earlier reasoning. Agents are unwilling to invest much resources in larger brains capable of categorical perception only for competitive environments (low $b/c$). As $b$ increases, the agents are willing to invest more in their perception to avoid giving this large benefit to the out-group. This seems consistent with explicit out-group hostility that Kaznatcheev (2010b) observed in the harmony game. However, apart from simply presenting the data, I can’t make much more sense from this figure. Do you have any interpretations? Can we learn something from the seemingly linear relationship? Does the slope (if we plot $k$ versus $b$ then it is about 0.5) tell us anything? Would you still conclude that co-evolution of tag-based cooperation and categorical perception is unlikely?

### References

Beer, Randall D. (2003). The Dynamics of Active Categorical Perception in an Evolved Model Agent. Adaptive Behavior. 11(4): 209-243.

Kaznatcheev, Artem (2010). The cognitive cost of ethnocentrism Proceedings of the 32nd annual conference of the cognitive science society

Kaznatcheev, A. (2010b). Robustness of ethnocentrism to changes in inter-personal interactions. Complex Adaptive Systems – AAAI Fall Symposium.

Kaznatcheev, A., & Shultz, T. R. (2011). Ethnocentrism maintains cooperation, but keeping one’s children close fuels it. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. 3174-3179.

Shultz, T. R., Hartshorn, M., & Kaznatcheev, A. (2009). Why is ethnocentrism more common than humanitarianism? Proceedings of the 31st Annual Conference of the Cognitive Science Society. 2100-2105.