optimize recall, precision, and length when parsing a data domain

lexical analysis
database consistency and reliability

A given value will typically be parseable in … parsed as .Epsilon.*, as [A-Za-z]* [0-9]*, [0-9]* … to name a few possible structures. Structure extraction … There are three characteristics that we want … @ Recall: The structure should match as many … as possible. @ Conciseness: The structure should have … An effective way to make the tradeoff between … [Rissanen, Automatica 14:465-471, 1978], that minimizes the total length required to encode the data …   Google-1   Google-2

Quote: use minimum description length to select how to parse data in a column

