How teachers help us learn deterministic finite automata
September 24, 2013 2 Comments
Many graduate students, and even professors, have a strong aversion to teaching. This tends to produce awful, one-sided classes that students attend just to transcribe the instructor’s lecture notes. The trend is so bad that in some cases instructors take pride in their bad teaching, and at some institutions — or so I hear around the academic water-cooler — you might even get in trouble for being too good a teacher. Why are you spending so much effort preparing your courses, instead of working on research? And it does take a lot of effort to be an effective teacher, it takes skill to turn a lecture theatre into an interactive environment where information flows both ways. A good teacher has to be able to asses how the students are progressing, and be able to clarify misconceptions held by the students even when the students can’t identify those conceptions as misplaced. Last week I had an opportunity to excercise my teaching by lecturing Prakash Panangaen’s COMP330 course.
Although it might be hard to guess from this blog — with all that discussion of biology, cancer, cognitive science, economics, evolution, philosophy, and social science — that the closest label to what I study officially is probably machine learning and automata theory (I am tempted to throw ‘theoretical’ in front of that, to save myself from writing actual code). As such, on Tuesday I explained a standard automata theory topic — minimizing DFAs, but on Thursday moved onto learning DFAs — a topic seldom covered in an undergraduate automata theory courses (or even in an introductory machine learning course). It was my round-about way of justifying my existence to the students: “I can provide you with counter-examples queries!”
Imagine that we are trying to learn about the world around us, and can run experiments described by strings over that yield yes-no answers. Under some philosophical assumptions or — unfortunately for the students — by fiat, we can suppose that this world is described by a regular language and an experiment just tells us if or not. As students of the world, we want to produce a DFA such that and we want to do this quickly — in time polynomial in the size of minimal DFA representing . If we are allowed only membership queries — questions of the form “is ” — then Angluin (1978) and Gold (1978) showed that this task in NP-hard. Even if we’re allowed to do improper learning and don’t have to output a DFA but can output any machine that will answers questions correctly, the task is at least as hard as cracking the RSA cryptosystem that protects our credit card information online (Kearns & Valiant, 1994; see Lev Reyzin’s blogpost for a more details on the proper-vs-improper learning distinction for automata). In other words, we believe that this is impossible.
Not so if we add a teacher. Angluin (1987) defines a minimal adequate teacher (MAT) as an oracle that can answer counterexample queries. The student can give a candidate DFA and if then the teacher provides a counterexample such that but (or vice versa); otherwise the teacher returns “correct” and the student aces her class. Angluin (1987) showed that in the MAT-model model, there exists an algorithm that runs in time polynomial in and the maximum size of a counterexample that learns any regular language . In other words, students be thankful: having a teacher saves you from a combinatorial explosion!
From observation tables to machines
We call a set prefix closed if for every if then , or — using notation from assignment 1 — if . We call a finite prefix closed set , finite set , and a function an observation table. For every we define as .
An observation table is closed if:
and the observation table is consistent if:
If an observation table is both closed and consistent then we can define a DFA based on it, with , , , . It might not be obvious that this DFA is well defined, and so it is useful to know two lemmas:
Proof: By induction on length of .
Proof: By induction on length of with Lemma 1 providing the base case.
Since the machine yields the right result on every entry of , we say that it is consistent with . The most important result for the learning algorithm is that this is the smallest DFA consistent with :
Lemma 3: Suppose that has states. If is consistent with and has or fewer states then is isomorphic to .
We will proceed by building an isomorphism between and , defined as . We will show this isomorphism is one-to-one and onto, thus the two machines have the same number of states, and that it preserves initial states, final states, and transitions.
Claim 3.1: is a bijection.
Proof: For all define as . Since is consistent with , we have for all and :
Thus, as ranges over all of , and thus with takes on distinct values. If then there exists some such that by consistency of , thus must have states, and maps every to a distinct , so the function is a bijection, with as its inverse.
Lemma 3.2: — the function preserves initial states.
Lemma 3.3: — the function preserves final states.
Proof: If then by definition but and so by construction.
If then since and , we have .
Lemma 3.4 — the function preserves transitions.
Proof: Note that , since is closed, there must exist some such that . Use that to see:
But , so the two states are the same.
Angluin-Schapire learning algorithm
Angluin (1987) provided a nice algorithm for learning DFAs using observation tables, but it is no longer the best known algorithm. In his thesis, Schapire (1991) presented an improved algorithm that I will sketch here. This algorithm will never have a non-consistent table, and will add only one element to per counterexample using a special subroutine find_extend.
- Initalize S & E to
- Ask membership queries for and each
- Construct the initial observation table (S,E,T)
- Repeat until teacher says “Correct”:
- While (S,E,T) is not closed:
- Find and such that .
- Add to and extend T using membership queries.
- Let and ask teach if M is correct.
- If the teacher replies with a counter-example z, then let , add to E and extend T using membership queries.
- While (S,E,T) is not closed:
- Return M and Halt
The last ingredient is find_extend, which will return a counter-example that can serve as a witness to separate two states we previously thought to be equivalent. In particular, it will return and such that:
- Using lazy evaluation, consider the partitions for every latex s_i$ be the state of M if you run it on , i.e. such that
- Notice that because z is a counter-example
- Since the two extreme values of have different results, there must be some such that . Find such a using binary search.
- Return .
I leave it to the reader to implement the binary search and achieve a number of membership queries equal to (where is the size of the largest counterexample) for each run of find_extend and to check that the returned value satisfies the condition we want. Since this returned is a witness to the difference of two previously equal states, we know that the observation table will no longer be closed and we will have to add at least one more state to the candidate machine. Since the candidate machine is always the smallest that is consistent with T, and since T always agrees with the finite subset of L that it has sampled, it must be the case that the algorithm stops doing counter-example queries when the number of states reaches the number of states in the minimal DFA representing the regular language to be learned. Thus, the algorithm requires up to counterexample queries and membership queries.
Angluin, D. (1978). On the complexity of minimum inference of regular sets. Information and Control, 3(39):337–350.
Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation,, 75, 87-106 DOI: 10.1016/0890-5401(87)90052-6
Gold, E.M. (1978). Complexity of automaton identification from given data. Information and Control, 3(37):302–420.
Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM, 41(1), 67-95.
Schapire, R. E. (1991). The design and analysis of efficient learning algorithms. (No. MIT/LCS/TR-493) MIT, Cambridge, USA