Our discussion in the last class meeting was terrific! I have no other words for it: It was terrific. I think this is because the readings for this
week were a bit more in our comfort zone as English Majors and a bit less
involved with the digital world in general.
Matthew Wilkens’ article, “Canons, Close Reading, and the Evolution of
Method” dealt with our old notion of Canon(s) and Canon construction, and
complicated the matter by noting just how many new books are added to the back
log of texts on a yearly basis. This is
what he describes as a “problem of abundance” (250) and with ever more novels
being published each year, our old way of reading—close reading—needs to evolve
to keep up with the sheer volume of books yet to be encountered. The books are able, he posits, to tell us
something about the culture that produced them, and since we cannot read them
all in order to extract this information, we should be devising modern,
technologically assisted means of digesting these incalculable pages. His “anything and everything else” to “data
mine” these texts include, as he shows the reader through an extended example
and figures in the text, as distant reading (for a good explanation click
here).
I really do like the idea of distant reading, though as a supplement
to close reading, not as an alternative.
The example that Wilkens provides takes all of the books, popular and
obscure, published in a 25 year window starting in 1851, and data mines them to
find Place Names, foreign and
domestic, and plots these mentions (or multiple mentions) on a map. The purpose of this exercise is to note how
often places are named or written about in an era thought by scholars to be dominated by the Northeast. What this example shows is that the culture
that produced all of these texts had in their minds places overseas, in Europe
and Asia, several mentions of South America, and plotting in Australia. This shows that texts in this period mention
these places, shifting the primary focus from New England to the rest of the
world—or so it would seem.
In our discussion, I had some questions of this
practice. The ability of technology to
extract these very distinct instances and utterances (is utterance the right
word for a text?) shows us, the scholarly reader, what, exactly? That other places are mentioned? What of the context of these mentions? Were they casual in nature? Hostile?
Romanticized? What were these
utterances and how do they illustrate the cultural attitudes of the time? The concept seems a bit removed from what
reading itself is to be—the encountering of a text. Distant reading, in this sense, seems like a
brief fly over. One of my colleagues
noted that the way that the programs work in this situation is very much like
how Google Books works: they find these
words and send back results in the context of how they were found, so “Australia”
would be found in the context of “Australia is an island peopled entirely of
criminals” as opposed to simply as “Australia”.
This example is glib, I know, but the way that I read Wilkens’ article
indicates that this example is ostensibly correct. Wilkens never once mentioned text in context,
only that the methodology of finding place names needed to be tweaked in that
several places include proper names and this needed to be accounted for. The reader sees three maps showing the
locations, but not what these locations mean.
I appreciated that my colleague mentioned that the results would come in
context, but the published example would have benefitted from proffering this information. Another colleague of mine noted that data
mining in this manner was helpful beyond the confines of the Literature
profession, which, again, I am grateful to have heard, seeing as we are all lit.
people and to think beyond our boundaries, at least for me, is difficult. However, despite these few moments of relief,
something was still bothering me: Data
mining can read a text, and even provide the statistician with context, but is
in NO WAY capable of providing subtext.
What is meant by an utterance
or turn of phrase cannot be picked up by a machine (anyone hear about how
sarcasm doesn’t translate well in an e-mail will know that what I am saying here
is true), so there is so much that data mining will miss in relation to the
culture that produced these texts.
To punctuate my final point above, I cited James Joyce as an
example. This did not go over well. At least not initially. As soon as Joyce was invoked, several of my
colleagues jumped up and shouted (not really):
Joyce is perfect for data mining.
His work is so dense. We would
benefit from DH in his case.
It seemed as though He was the first author of DH data
mining. I may have blasphemed and taken
the Lord’s name in vain.
I continued: Yes, his work is dense, and much of its depth
lies in the subtext of his writing, the collective, cultural meanings that are
revealed only after repeated readings.
These tend to be different from person to person, and in the case of universalities,
they reveal themselves at different time and at different degrees. How could/would data mining be able to pull
back the layers of subtext when these layers are invisible to a mechanical
eye? No, we know nothing of Joyce—at least
not all of it—and it is only through reading the text and mining it ourselves
can the mysteries of Joyce be revealed.
Data mining, in this case, would be the Mountain top removal method, as
opposed to the individual prospector. Prospectors
are in touch with the environment and readers are very similar, knowing the
terrain of the text. This insider
knowledge, as far as the Literary Profession is concerned, should be at the
fore, not the mechanized version.
Save that for sociology.
No comments:
Post a Comment