Full text at




Text shapes the human body in strange ways. A lifetime of reading stoops the shoulders as though books were exerting pressure against the solar plexus. Continual exertion deteriorates eyesight. Tendons and other supporting structures in the wrist swell from the repetitive stress of striking keys. The word takes its toll. A further profound change happens when we read and write along with the machine. As we interpret it, it interprets us.

Machine learning algorithms track the speed by which readers advance from paragraph to paragraph, creating a fingerprint that points to markers of gender, age, race, ethnicity, and economic status. Algorithmic agents follow the movement of eye and finger to direct the reader’s attention and to understand how the human brain connects topics. Heat maps are drawn to represent the dynamics of boredom, fatigue, focus, and desire. Supervised training algorithms use our collective philological output—sorting and commenting—to classify information autonomously and to curate content suited to our predilections.

Deep neural networks mimic the brain to build models of human behavior. These models are notoriously difficult to interpret because they are not intended for human comprehension.1 A vast archive of texts written by and for machines support the tiny, in comparison, corpus of human-compatible literature.

Despite its formative effect on practices of comprehension, code, the programmatic sign, does not often figure in our theories of meaning making. Instead, we consign it to the ornamental formatting layer of document structure. We do so at our peril. Unlike passive decorative elements—fleurons, daggers, and pilcrows—the programmatic sign actively molds text to context. Words find their topography.

At the maximally blunt limit of its capabilities, format governs access. Commands render some words and sentences visible on-screen while suppressing others. The ability to hide text from view completely or to make it so small as to be illegible affects not just the style but also the politics of text. Code determines its audience, privileging certain voices and modes of reading. In this sense, the programmatic sign acquires its nonrepresentational, tactical character. Stripped of references, resemblances, and designations, it commands and controls.2

Unlike figurative description, machine control languages function in the imperative. They do not stand for action; they are action. More binding than what J. L. Austin has called speech acts—edicts such as “I pronounce you husband and wife” and commitments such as “I do”—control codes ensure regulation. Code is an exercise of power, not its representation. The difference between representation and control is one of brute force. It lies in the distinction between a restraining order and physical restraint. A restraining order signifies the calling forth of codified power. Physical restraints, for example, handcuffs, enact the exercise of codified power. Like all violence, they do not stand for anything. The handcuffs simply contort the body into the shape of submission. Absent a body, the restraints draw an empty shape. Code similarly shapes the written word. Located somewhere between screen and storage medium, formats relate matter to content. They are techniques by which immanent inscriptions, the electromagnetic charge, are transformed into transcendent digital objects: novels, songs, films, poems. Formatting imposes structure.

Think of a paragraph, for example. Writers use them to break up the flow of thoughts on a page. Paragraphs contain information. Can one imagine an empty paragraph? Could the shape of a paragraph persist outside the material confines of a page or screen? Can one imagine paragraphs that unfold spatially not in two dimensions, a rectangle, but in one, along a straight line, or in three, in the shape of a cube? These questions confound because paragraphs draw a singular figure. They are textual containers of a type. Any other shape less or more than the paragraph would go by another name; it would constitute another format. To imagine something like a one-dimensional paragraph is akin to imagining a flat shoebox. A flat shoebox cannot contain shoes. It can hold only images of footwear. A paragraph embodies a similarly singular arrangement of elements. It is a container or a data structure of a kind, made to hold a certain amount of sentences.

We may liken books, paragraphs, and sentences to nesting dolls: data structures that contain within them further smaller arrangements of information. A word fits inside a sentence, the sentence within a paragraph, the paragraph within a chapter, the chapter within a book, the book within an archive, and so on.

Formats such as the book or the broadsheet newspaper are known entities. We understand how they are made and how to unfold them in space. By contrast, computational formats change rapidly and proliferate. They contain further, as yet unexplored structural possibilities: shapes similar to the paragraph on paper but native to new media. What you see is what you get on the page. On-screen, what you see is but a small part of what you could get. We are presented with thick content, beyond visible image: the composite of all that is contained. In print, content can be gleaned from surface; there is nothing but surface expanse on a page. Screens are laminates. Light and liquid crystal, the conduits for digital media, surge between substrates in response to electric signal. Screen surfaces conceal further strata of codification, inscribed onto recondite planes of inscription: hard disks, solid-state drives, platters, drums, memory sticks, layers of copper and oxide.

A byte, made up of eight binary bits, holds a letter. The string of letters spelling out “hello world,” occupies eleven bytes on a hard drive: ten bytes for the letters and one byte for the space character. A file in the Portable Document Format (PDF) containing the same “hello world” takes up 24,335 bytes on my system. What sort of information do these extra bytes contain? Historically, such data have included machine instructions for the viewing and printing or even clandestine ciphers. The PDF specification describes features that include “accessibility of content to those with disabilities,” “digital signatures to certify authenticity,” “electronic forms to gather data,” “preservation of document fidelity independent of the device, platform, and software,” and “security and permissions to allow the creator to retain control of the document and associated rights.”3

These capabilities mediate between visible image and stored information: one surface facing the human; the other, the machine. The formatting layer specifies the affordances of electronic text. More than passive conduits of meaning, electronic texts thus carry within them rules for engagement between authors, readers, and devices. In our example, the PDF encodes, among other things, ideas about reading, authenticity, fidelity, preservation, and authorship. Whatever literary-theoretical framework the reader brings to the process of interpretation must therefore meet the affordances encoded into the electronic text itself.

Jonathan Sterne, a media scholar who pioneered the study of audio formats, writes that format theory “invites us to ask after the changing formations of media, the contexts of their reception, the conjunction that shaped their sensual characteristics, and the institutional politics in which they were enmeshed.”4 Attending to the affordances of format, to paraphrase Caroline Levine, “opens a generalizable understanding of political power.”5 Constraint, what Levine calls the collision of forms, happens not on the level of representation or ideology but on the level of the physical, the phatic, and the imperative, where formatting and control codes reside.

Format and content compose what may be called thick content, which accounts for the disparity between plain and fancy text.6 Its explication requires thick description that draws on material particulates.7 These further acquire tactical significance in practice: Texts that edit themselves or collect their own fees necessitate new formalisms and strategies of interpretation. In his monograph on audio formats, Sterne argues for a study of formats that “highlights smaller registers like software, operating standards, and codes as well as larger registers like infrastructures, international corporate consortia, and whole technical systems.”8

A familiar paper paragraph structure already presents several interesting problems for analysis. A paragraph, we intuit, corresponds to a unit of thought. But there is nothing inherently paragraph-like in the neural arrangement of thoughts in our brains. Physiologically, the brain arranges information in hexagons, along the entorhinal grid.9 There is also nothing inherently paragraph-like in the arrangement of bits along the surface of electromagnetic storage. Formats thus translate between disparate systems of ordering and signification.10 We are presented with metaphors of order on-screen: paragraphs, pages, files, folders. These resemble their paper counterparts, but they represent other, less familiar and nonequivalent ordering structures on disk.

Formats mediate between data structures, transforming one into the other according to predefined rules. Mental images, information stored in the head, become inscription, information stored in the machine, which turns into a projection, content arranged on-screen (Figure 3.1). The complexities of transformation stem from a fundamental incompatibility between incommensurate languages and physicalities. Format specifications govern the transference of data structures from one medium to another at the point of contact between human, symbol, and machine.

In this chapter I move us toward a systematic study of textual formats. I argue that the history of formalism contains within it at least these two contradictory intuitions about the nature of literary form. Going back to the reception of Plato, Hegel, and the Russian formalists, the English “form” renders at times the material, outward, and apparent shape of something said, written, or pictured. Just as often, it is used in the sense of a Platonic ideal: abstracted from matter, inward-facing, and in need of explication. Form in this sense is closer to the idea of an algorithm or formula; it signifies according to implicit rules.

I augment these two concepts of form with a third “format.” In the process I show how formats developed historically from simple machine instructions for typographical layout into complex metaliterary directives related to the protection of intellectual property rights, constraints on speech, trade agreements, the politics of surveillance, and clandestine communication. In the second half of the chapter, an intellectual history of form, drawn from the annals of literary theory, meets the material history of format, drawn from computer science. I end the chapter with a discussion of smart documents, increasingly common instruments of record capable of policing their own encoded mechanisms of reader engagement: what can be read, how, and where.

FIGURE 3.1. Formats change with the medium, as shown in the arrangement of data in the brain (left), on a page (middle), and on disk (right). Image adapted by Emily Fuhrman from author’s sketches.



1. Andrews et al., “Survey and Critique”; Karim and Zhou, “X-TREPAN.”

2. Baudrillard, Simulacra and Simulation, 139–40.

3. ISO, “Document Management,” vii.

4. Sterne, MP3, 11.

5. C. Levine, Forms, 7.

6. The Unicode Consortium defines fancy text as “text representation consisting of plain text plus added information” (Unicode Consortium, Unicode Standard, 9–10).

7. Sterne wrote of the need for format theory that “demands greater specificity when we talk in general terms about media.” See Sterne, MP3, 11.

8. Sterne, MP3, 11.

9. Stensola et al., “Entorhinal Grid Map.”

10. See Wittgenstein, Philosophical Grammar, 45. On incommensurate languages, see also D. Davidson, Essays, 187–99.