Group: document preparation
Group: hypertext
Topic: attribute-value pairs as information
Topic: hypertext nodes
Topic: loosely structured data
Topic: publishing
Topic: schemas for hypertext
Topic: semi-structured text
Topic: semistructured messages for automated processing
Topic: translation of data
Topic: writing
Topic: writing hypertext
Topic: XML data type
Topic: World-Wide Web
| |
Summary
Structured text uses markup to represent an ordered hierarchy of content objects. For example, emphasized text may be embedded in a quotation block of a paragraph of a section of a chapter of a book. It describes the content of text without describing how the content is presented. Examples are SGML and XML. They include a schema or document type declaration that defines and restricts the component elements.
Structured text is designed for automatic processing.
A linear sequence of content objects may be used instead, for example, semi-structured text or some word processing formats. While effective for simple tasks, a linear sequence loses important structural information.
Regular expressions work with structured text with a shortest-substring match rule. For example, one regular expression can provide the context for another regular expression. Regular expressions do not work with the more common, leftmost-longest match rule. (cbb 4/98)
Subtopic: text as ordered hierarchy
Quote: people understand graphs of tree-like documents (HTML) connected by links (URL) [»boswA10_2005]
| Quote: text is an Ordered Hierarchy of Content Objects; any other model is inadequate [»deroSJ2_1990]
| Quote: text is an ordered hierarchy of content objects; invariant over layout, printing, and translation [»deroSJ2_1990]
| Quote: SGML defines a document as an ordered hierarchy of content objects (OHCO); it does not specify formatting [»deroSJ2_1990]
| Quote: an XML value is a sequence of zero or more elements and atomic values, strings or integers; an element has a name, optional type, and value [»simeJ1_2003]
| Subtopic: formalized schema
Quote: formalization of XML Schema using tree grammars; named types and structural types; matching and validation [»simeJ1_2003]
| Quote: formalization of XML Schema used for XQuery and XPath specifications; one of the first
| Subtopic: shared attribute
Quote: global object model for PeopleWeb based on a common identity for objects, individuals, and attributes [»ramaR8_2007]
| Subtopic: XML type
Quote: an untyped XML value is a sequence of untyped elements and strings; e.g., XML before validation [»simeJ1_2003]
| Quote: XML types are composed of atomic types and element types; sequence, choice, multiple occurrence; optional element name and type name [»simeJ1_2003]
| Quote: a restricted type matches the base type's structure and may be used everywhere the base type is used [»simeJ1_2003]
| Subtopic: schema validation
Quote: validation theorem -- validates iff matches and erases; roundtripping if unambiguous [»simeJ1_2003]
| Quote: instead of matching, an XML value validates against a type; either produces an internal value or it fails; e.g., element ints of type intsType { 1, 2, 3 } [»simeJ1_2003]
| Subtopic: elements vs. structure
Quote: many word processors represent text as a stream of content objects; e.g., list items and chapter headings instead of lists and chapters [»deroSJ2_1990]
| Subtopic: text as database
Quote: the OHCO model treats a document as a database of text elements for manipulation, searching, and combination into compound documents [»deroSJ2_1990]
| Quote: managing text is often more important than formatting it; should be done quickly and cheaply [»mashJR_1976a]
| Subtopic: markup
Quote: use descriptive markup to identify content objects in text
| Quote: document markup should describe structure and attributes instead of processes to be performed [»goldCF6_1981]
| Quote: Generalized Markup Language, GML, does not restrict documents to an application, formatting style, or processing system [»goldCF6_1981]
| Quote: translates from Standard Generalized Markup Language; widely used [»mamrSA5_1987]
| Subtopic: markup syntax
Quote: GML identifiers delimited by : or :: and a '.'; use mnemonics for paragraph, quotation, ordered list, ... [»goldCF6_1981]
| Subtopic: configuration
Quote: SCRAM defines a project using XML; e.g., the BootStrap document includes the source code servers [»willC4_2001]
| Subtopic: processing structured text
Quote: in GML, an attribute of the document is recognized, eg., 'footnote', then mapped to a processing function and executed [»goldCF6_1981]
| Quote: identify search records by shortest-substrings of regular expressions; e.g., all blocks [»clarCL5_1997]
| Subtopic: problems with markup
QuoteRef: goldCF6_1981 ;;[cbb: GML gets complicated and rather arbitrary in practice.
|
Related Topics
Group: document preparation (8 topics, 180 quotes)
Group: hypertext (44 topics, 786 quotes)
Topic: attribute-value pairs as information (57 items)
Topic: hypertext nodes (19 items)
Topic: loosely structured data (20 items)
Topic: publishing (14 items)
Topic: schemas for hypertext (7 items)
Topic: semi-structured text (17 items)
Topic: semistructured messages for automated processing (22 items)
Topic: translation of data (26 items)
Topic: writing (32 items)
Topic: writing hypertext (13 items)
Topic: XML data type (22 items)
Topic: World-Wide Web (42 items)
|