Error-Driven Rule Learning

Error-driven rule learning is specific type of supervised machine-learning applicable to many linguistic processing tasks. One of the primary advantages of error-driven rule learning is in the transparency of the patterns learned by the system. While other machine-learning techniques such as genetic, Bayesian, or neural-network approaches produce a statistical model that can only be investigated and understood through its performance, the output of an error-driven rule learning approach, typically an ordered set of transitions, can be readily traced, modified, and understood by human readers.

The learner takes as input a manually tagged ground-truth corpus (Y) and an untagged or start-state version of the same corpus (X) and produces an ordered set of transitions (Rule List) suitable for transforming X into a workable approximation of Y. To the degree that X represents the universe of possible inputs, the Rule List can then be applied to novel inputs and produce taggings similar to those performed manually (such as in Y). The behavior of the learner is governed by a set of rule templates (Rule Space) defining all possible transition rules that might be generated by the system and a scoring function (score) capable of accurately ranking automatically tagged versions of X as compared to Y. User defined thresholds based on number of rules generated, score, or scoring function delta determine when the system has produced a ‘complete’ set of transformation rules.

Example applications of error-driven rule learning include basic linguistic processing tasks such as lexical part-of-speech tagging and phrase bracketing, but also extend to higher level tasks such as automatically determining which subset of text in a webpage is most relevant to an image embedded in the same page (useful for automatic image tagging and/or ad placement).

1) Define Rule Space, R – the definition of the rule space will be based on the task being performed. The rules take the form of templates, where the specific parameters can be filled in based on artifacts found in the ground-truth tagged corpus. For example, in part-of-speech tagging, a template may look like: <word>|<POS> -> <word>|<POS>, where <word> could be any word in Y and POS could be any part-of-speech tag in Y. Note – BNF is a useful format for specifying the rule space.

2) Define Scoring Function – The scoring function must be able to compare two manual taggings of X, resulting from the application of one or more rules in the rule space, and determine which is ‘closer’ to the ground-truth, Y. Note that it is the ability to define such a function in conjunction with the Rule Space that determines the applicability of this methodology to a given task.

3) Define Threshold – The threshold can define the minimal gain of any single rule (score delta), total number of rules, or a minimum delta between X and Y following application of the rule set (score). When the threshold condition is reached, the learner’s task is complete and the search for additional rules ceases.

4) Gather Training Data, X – The training corpus should be of adequate size and coverage to result in a rule set that is applicable to novel input.

5) Manually Categorize X, Save as Y – X is annotated, tagged, or categorized consistent with the performance expected from the automated system which will apply the final rule list.

6) Advance to Next Rule in R – Iterate through all possible rules defined by the rule space.

7) If there are more rules, proceed to 8, Else proceed to 10

8) Apply rule to X, Save as Z – For each possible rule, apply it to X.

9) Compare Z to Y w/ Scoring Function – Using the scoring function, determine which of the rules provide the most gain (get us closest to Y).

10) Determine Highest Scoring Rule, r

11) If we passed the threshold limit, quit, Else procced to 12 – If the gain of the best performing rule is outside of our predefined limits, we are finished. The rule list is complete.

12) Add r to end of Rule List – Otherwise, add the new rule to the rule list.

13) Apply r to X, Save as X – Apply the new rule to X and use this as our new baseline.

14) Reset list of Rules in R

Reference: Eric Brill. 1992. A simple rule-based part of speech tagger. In Proceedings of the third conference on Applied natural language processing (ANLC ’92). Association for Computational Linguistics, Stroudsburg, PA, USA, 152-155. DOI=10.3115/974499.974526

Comments are closed.