April 22nd, 2013 by Heather Asbeck

Encoding Epistles: Old Artifacts, New Concepts

I presented this information at the 2013 CAS Colloquium:  Thinking about The Book, but I wanted to share it here as well.  I want to talk about the Flagg Project – a project that involves converting 19th century correspondence to a digital format – but before delving into the topic, I want to present a bit of background information about digitization.

Why (or why not) digitize?

It seems quite simple, really.  Converting static texts to an digital format provides accessibility for a wide range of people, while physical documents can only reside in one place and be handled by people who share (or travel to) that place.  But there are some differences in the ways that people perceive these two types of objects.  One is a physical document with a particular texture, smell, size, weight, and form.  We can see the subtle nuances of light and dark ink shading where pen nib pressure was inconstant.  We can feel the texture of the paper, see its true color, and view a watermark (if it has one) by holding it up to the light.  Any embellishments that have been added to the document are also detectable in a way that is not possible via a virtual image; in short, we can engage in haptic exploration of the object.  When we create a virtual facsimile of this same object, we give up our 3D exploration of the item itself in favor of a 2D image of it.  What we gain from this exchange, then, is the ability to have a dynamic, malleable, and searchable version available to a much wider audience.

There are an abundance of digital exhibits featuring the correspondence of noteworthy people:  The Walt Whitman Archive, Vincent van Gogh: The Letters, Mapping the Republic of Letters, The Willa Cather Archive, and Emily Dickinson’s Correspondences, to name a few.  These sites give access either to the wider public or to academic communities, allowing individuals who would otherwise not be able to handle the documents to view them online.

Emily Dickinson

Emily Dickinson’s work provides a reminder of what may be lost and/or gained through digitization.  She manually produced, edited, and embellished her letters & poems, a topic that I have discussed previously.  In short, she was strongly opposed commercial printing and editing of her work, and her embellished letters are valuable reminders that documents are more than the sum of their words.

Encoding the Text

The TEI – or Text Encoding Initiative – is an international organization that develops and maintains guidelines for encoding physical, linguistic, and textual attributes of a document in a digital format.  TEI coding schemas utilize XMLextensible markup language – as an HTML-style tagging system, but to a much different end.  HTML tagging allows the user to use tags within arrow brackets to alter the physical appearance of the text.  For example, the opening and closing tags <strong> </strong> can be placed around a word or phrase to change the text to a bold typeface.  Other tags can change font types and sizes, underline text, and display superscripts and subscripts (likethis or likethat).  The XML markup allows us not only to alter the physical appearance of the text, but to provide commentary about it and describe the content of that text.  For example, when a person or a place is mentioned, we can define the name type as that of a person or place.  We can be specific enough to include biographic or geographic information about it, or define the referent that a particular pronoun represents.

This form of encoding changes the way we think about both the data and the text.  It gives us a “scientist’s view of text,” which Buzzetti and McGann explain consists of “‘information coded as characters or sequences of characters’ (Day 1).  Coded information is data and data is a processible material object. . . . Digital text is a physical thing residing in the memory cells of a digital computer in a completely disambiguated condition.  That precise physical structure matters for digital text, just as the very different precise physical structure matters for paper-based text.”So, we gain data as a material object, as well as the information encoded within the text of the data, and the associated metadata. As a result, we have to conceive of ways of thinking, talking about, defining, and encoding our simulated objects, the information we attach to them, and the materials and means we utilize in this process.  This drives scholars to focus microscopically on each aspect of an object, from its form, to its function, material composition, history, and the language and technology necessary to communicate these characteristics in a digitally simulated format.


[To be continued…]

