Topic: text markup and structured text

topics > computer science > Group: information

document preparation

attribute-value pairs as information
hypertext nodes
loosely structured data
schemas for hypertext
semi-structured text
semistructured messages for automated processing
translation of data
writing hypertext
XML data type
World-Wide Web


Structured text uses markup to represent an ordered hierarchy of content objects. For example, emphasized text may be embedded in a quotation block of a paragraph of a section of a chapter of a book. It describes the content of text without describing how the content is presented. Examples are SGML and XML. They include a schema or document type declaration that defines and restricts the component elements.

Structured text is designed for automatic processing.

A linear sequence of content objects may be used instead, for example, semi-structured text or some word processing formats. While effective for simple tasks, a linear sequence loses important structural information.

Regular expressions work with structured text with a shortest-substring match rule. For example, one regular expression can provide the context for another regular expression. Regular expressions do not work with the more common, leftmost-longest match rule. (cbb 4/98)

Subtopic: text as ordered hierarchy up

Quote: people understand graphs of tree-like documents (HTML) connected by links (URL) [»boswA10_2005]
Quote: text is an Ordered Hierarchy of Content Objects; any other model is inadequate [»deroSJ2_1990]
Quote: text is an ordered hierarchy of content objects; invariant over layout, printing, and translation [»deroSJ2_1990]
Quote: SGML defines a document as an ordered hierarchy of content objects (OHCO); it does not specify formatting [»deroSJ2_1990]
Quote: an XML value is a sequence of zero or more elements and atomic values, strings or integers; an element has a name, optional type, and value [»simeJ1_2003]

Subtopic: formalized schema up

Quote: formalization of XML Schema using tree grammars; named types and structural types; matching and validation [»simeJ1_2003]
Quote: formalization of XML Schema used for XQuery and XPath specifications; one of the first

Subtopic: shared attribute up

Quote: global object model for PeopleWeb based on a common identity for objects, individuals, and attributes [»ramaR8_2007]

Subtopic: XML type up

Quote: an untyped XML value is a sequence of untyped elements and strings; e.g., XML before validation [»simeJ1_2003]
Quote: XML types are composed of atomic types and element types; sequence, choice, multiple occurrence; optional element name and type name [»simeJ1_2003]
Quote: a restricted type matches the base type's structure and may be used everywhere the base type is used [»simeJ1_2003]

Subtopic: schema validation up

Quote: validation theorem -- validates iff matches and erases; roundtripping if unambiguous [»simeJ1_2003]
Quote: instead of matching, an XML value validates against a type; either produces an internal value or it fails; e.g., element ints of type intsType { 1, 2, 3 } [»simeJ1_2003]

Subtopic: elements vs. structure up

Quote: many word processors represent text as a stream of content objects; e.g., list items and chapter headings instead of lists and chapters [»deroSJ2_1990]

Subtopic: text as database up

Quote: the OHCO model treats a document as a database of text elements for manipulation, searching, and combination into compound documents [»deroSJ2_1990]
Quote: managing text is often more important than formatting it; should be done quickly and cheaply [»mashJR_1976a]

Subtopic: markup up

Quote: use descriptive markup to identify content objects in text
Quote: document markup should describe structure and attributes instead of processes to be performed [»goldCF6_1981]
Quote: Generalized Markup Language, GML, does not restrict documents to an application, formatting style, or processing system [»goldCF6_1981]
Quote: translates from Standard Generalized Markup Language; widely used [»mamrSA5_1987]

Subtopic: markup syntax up

Quote: GML identifiers delimited by : or :: and a '.'; use mnemonics for paragraph, quotation, ordered list, ... [»goldCF6_1981]

Subtopic: configuration up

Quote: SCRAM defines a project using XML; e.g., the BootStrap document includes the source code servers [»willC4_2001]

Subtopic: processing structured text up

Quote: in GML, an attribute of the document is recognized, eg., 'footnote', then mapped to a processing function and executed [»goldCF6_1981]
Quote: identify search records by shortest-substrings of regular expressions; e.g., all blocks [»clarCL5_1997]

Subtopic: problems with markup up

QuoteRef: goldCF6_1981 ;;[cbb: GML gets complicated and rather arbitrary in practice.

Related Topics up

Group: document preparation   (8 topics, 180 quotes)
Group: hypertext   (44 topics, 786 quotes)

Topic: attribute-value pairs as information (57 items)
Topic: hypertext nodes (19 items)
Topic: loosely structured data (20 items)
Topic: publishing (14 items)
Topic: schemas for hypertext (7 items)
Topic: semi-structured text (17 items)
Topic: semistructured messages for automated processing (22 items)
Topic: translation of data (26 items)
Topic: writing (32 items)
Topic: writing hypertext (13 items)
Topic: XML data type (22 items)
Topic: World-Wide Web
(42 items)

Updated barberCB 3/06
Copyright © 2002-2008 by C. Bradford Barber. All rights reserved.
Thesa is a trademark of C. Bradford Barber.