Fielding History (Bauer) Fall 2011

Fielding History: Relational Databases and Prose

by Jean Bauer

¶ 1 Leave a comment on paragraph 1 0 It wasn’t until I started writing the introduction to my dissertation, “Revolution-Mongers: Launching the U.S. Foreign Service, 1775-1825,” that I realized how much building The Early American Foreign Service Database and its underlying open source software package, Project Quincy, influenced how I understand and explain my research. At that point the EAFSD had been live for four months, and I had been telling people that the two projects have a symbiotic relationship. My dissertation contains the stories, quirky situations, and historiographical analysis necessary to bring the past to scrutiny and life. The database provides the backdrop which showcases particular moments as quintessential or unusual, but it is also a standalone secondary resource, a separate publication in its own right. All of this was and remains true, but as I started that introduction, I became conscious of another way the two projects inform each other. As I described the nature of late eighteenth-century diplomacy — the difference between diplomats and consuls, the geopolitical realities of empire, the personal and commercial connections between Foreign Service officers — I found my description replicating the data structure I had built into the EAFSD, because that structure was the best way to get my background knowledge of my topic on paper.

¶ 2 Leave a comment on paragraph 2 6 When I realized this overlap I gave a little cheer, because I knew I had designed the EAFSD properly. Databases are normative statements about reality. If all data are theory-laden, then data structures are theories in and of themselves. When you design a database you are making proclamations about what matters (and, by implication, what can be safely ignored), and because relational databases are particularly constricting in how you can represent and link data, you are forced to be very explicit and systematic in your choices. This constriction has lead some historians to abandon relational databases for more flexible data structures, like XML or semantic linking. Some of this rejection is fueled by the fact that databases and statistical packages were adopted by historians before the technology was sufficiently advanced to handle historical sources with the nuance they require.¹ We should remember that eighty-eight hole punch cards frustrated the cliometricians themselves, as well as their readers. In my opinion, much of the reaction against relational databases is simply another symptom of the split among historians that goes back to the very beginning. As a rule of thumb, if you prefer Herodotus to Thucydides you probably want XML. It all depends on your sources and temperament. Relational databases are powerful tools, but they work best when the data you want to record and analyze consists of discrete pieces of information with clear connections between them. However, you have to be careful while designing your database to ensure that you accurately model your field of study without feeding your own preconceptions back into your analysis.

¶ 3 Leave a comment on paragraph 3 1 Designing a Database
Good decision support database design involves breaking the metadata description of a data set (and therefore its logical organization) into the smallest viable components and then linking those components back together to facilitate complex analysis. This process, known as normalization, helps keep the data set free of duplicates and protects the data from being unintentionally deleted or unevenly updated.² These components are known as entities, and the links are called relationships. Each entity represents something in the “real world” which is modeled in the database. Entities contain fields, discrete pieces of data, each with a designated name and datatype (ex. “start_year” “integer”). Entities are sometimes referred to as tables and fields are also called attributes.³ Entities and relationships only make sense when discussed together, because they take their form from each other. Relationships connect entities, and entities are constructed based on how they relate to each other. But while the analytic power and stability of relational databases comes from its basis in relational algebra, the conceptions can be hard to grasp in the abstract. So, let us turn to a concrete example: The Early American Foreign Service Database.

¶ 4 Leave a comment on paragraph 4 0 The heart of my dissertation is concerned with tracing written information flows to and among American Foreign Service officers who served from 1775 to 1825. The database was created to help me track these flows, which are preserved in the historical record as letters. This brings up another crucial part of designing databases for historical projects: you need to think long and hard about the nature of the sources you are using and what data you need to analyze. For the network/prosopographical⁴ analysis I am doing, I do not want to record the full text of the letters, although I do use the database to determine which letters should be read in full. The best databases point you back to the original sources for more information. So the database structure had to begin with the information that can be extracted from a letter.

¶ 5 Leave a comment on paragraph 5 0 Generic text of letter with personal names, dates, and locations highlighted in blue, yellow, and green respectively. Figure 1

¶ 6 Leave a comment on paragraph 6 0 Figure 1 illustrates the fielded data typically contained in a letter. Letters have the names of the sender and recipient. Letter writers usually indicate where they are writing and where they want to send the letter (whether the recipient is there when the letter arrives is, of course, another issue entirely). Letters also have a number of dates associated with them. There is the date the letter was begun, the date the letter was finished (with additional dates for addenda and enclosures), and, if you are very lucky, the date when the letter was received and then another date for when it was entered into an archive. So, if we are to model the data extracted from a letter, the resulting entity might look something like the second graphic.

¶ 7 Leave a comment on paragraph 7 0 Figure 2: List of fields in an database entity designed to model a letter.

¶ 8 Leave a comment on paragraph 8 0 Letters can be sent to and from individuals or organizations (two or more people acting together). They are sent to and from locations on particular dates (more on this later). Letters are given titles for when you need to cite them, and in case the same letter is sent to more than one person, you can mark it as a “circular,” with the term ‘boolean’ meaning that the field can only have the values ‘true’ or ‘false.’ The Letters entity also has the ever-useful “Notes” field for any information that does not fit nicely into one of the pre-chosen fields. Notice also how many of the fields are marked as “foreign keys.” A foreign key means that the field in question is in fact one end of a relationship with another entity.

¶ 9 Leave a comment on paragraph 9 2 This means that in order to accurately trace a correspondence network the database needs to have entities for Individuals, Organizations, and Locations. Everything else is specific to a particular letter, including the title, the notes field, and the dates. How you choose to record information about people, places, and groups depends on what information you think you will be able to reliably gather about most of the members of each category. You want to strike a careful balance between the uneven richness of sources and a relatively uniform standard for comparison. Just because a person or organization left behind more surviving documentation does not automatically make them more important, just easier to study.

¶ 10 Leave a comment on paragraph 10 0 As you are designing entities to describe other parts of the database, it is often helpful to create tables that hold subject keywords you want to use for classifying and later searching. Pre-selected keywords often work best when a clearly defined set of people are in charge of marking up the content. They are great for searching, and if indexed in a hierarchical structure, can provide semantically powerful groupings (especially for geographical information). As a historian, however, I am wary of keywords that are imposed on a text. If someone calls himself a “justice,” I balk at calling him a “judge” even if it means a more efficient search.

¶ 11 Leave a comment on paragraph 11 4 Of course, it all depends on your data and what you want to do with it, but my preferred solution is have, at minimum, two layers of keywords. The bottom layer reflects the language in the text (similar to tagging), but those terms are then grouped into pre-selected types. You can fake hierarchies with tags, but it requires a far more careful attention to tag choices than I typically associate with that methodology. For example, in the EAFSD I have an entity called AssignmentTitles that contains all the titles given to U.S. Foreign Service officers by the various American governments. However, there were forty-five distinct titles used between 1775 and 1825, and without highly specialized knowledge it is difficult to understand how they related to each other. So I created another entity, AssigmentTypes, which groups those titles into three distinct types: “diplomatic,” “consular,” and “support staff,” allowing for ease of searching among similar appointments without having to remember every term for consul, or those performing consular functions, used by the Continental Congress, the Congress of the Confederation, and the State Department. It was this three-part distinction that I unconsciously replicated in the introduction to my dissertation, which made me realize the two publications were more intimately linked than I had previously understood.

¶ 12 Leave a comment on paragraph 12 2 Modeling Time
When designing databases for historical research and teaching it is crucial to remember that these databases are works of history. One of the great challenges of digital history, but also one of our field’s most important contributions to digital humanities in general, is the careful representation of time. Our sources do not exist in some eternal present, but are bound to the past in ways that computers find hard to understand. Computers record time in ways that are simply ridiculous when you are trying to bring the past alive. Who thinks in date-time stamps? True, someone’s life can change in the blink of an eye, but fractional seconds are not helpful in recording human experiences. In fact, they impose an anachronistic, hyper-precise gloss on events that creates an unnecessary barrier to comprehension. While building the EAFSD there was a harrowing week when I could not enter dates prior to 1999, and any date field left blank reverted to today’s date. I could not concentrate on anything else while the two historical dates I had entered into the database were wrong.

¶ 13 Leave a comment on paragraph 13 0 Even so, relational databases have very powerful analytic tools for analyzing dates and date ranges that can be very useful for historical purposes. The trick, therefore, is to massage the strict date-time formats to hold your data in ways that are properly formatted, but also intellectually honest. Interface design is your friend in this case, because you can set a whole range of options for how you want your dates to be displayed. However, it is still important to think long and hard about how you want to record dates in the database.

¶ 14 Leave a comment on paragraph 14 2 How you record dates will depend on what sorts of dates your sources provide. While PostgreSQL (and other relational database packages) do not know how to handle dates that are not in the Julian calendar, with the appropriate settings they can record dates back to the fifth millennium B.C.E.⁵ Figure out how you want to map your dates to the Julian calendar, and explain that process clearly on your site and any documentation you provide. Depending on the age and completeness of your sources, you may need to record partial or fuzzy dates. Partial dates are dates that are missing pieces of information (ex. June 1922). Fuzzy dates are date ranges (ex. January 5-7, 1789). Neither are officially supported, but can (with some ease) be built into the data structure. For partial dates, you can choose to enter only the data you have (month and year) and leave day as 1. Then add a series of boolean flags called “day known,” “month known,” and “year known.” Depending on which of those fields are true, the system can display the dates appropriately. This means that on average you will have a fifteen-day margin of error on any of your partial dates, but can still use all the default date calculators. For date ranges, you can have start_date and end_date fields, or the fields can be labeled “no earlier than,” and “no later than,” which is how TEI (Text Encoding Initiative) handles date ranges. Keep in mind that the more elaborate the solution, the harder it will be to extract date information. The simplest solution that can be mapped to your sources is your best bet. Once the dates are in your system, you can decide how best to display them.

¶ 15 Leave a comment on paragraph 15 8 Historical Prose
So, how does all of this affect the writing of history? One answer is that standalone secondary source databases are already a major form of publishing historical research. While I am not submitting the EAFSD as my dissertation, it is a publication in its own right. As more and more history finds its way online, databases will structure future research in ways that we need to be very careful and thoughtful about. Making data structures (and the theoretical decisions that underly them) transparent through good documentation is a first step toward educating our colleagues and students about the material they are likely to find available in digital formats. There are not nearly enough digital resources for historical sources that carefully explain the reasons why the designers built their databases the way they did.⁶

¶ 16 Leave a comment on paragraph 16 4 Databases can also be used for note taking, which as Ansley Erickson has shown, is a powerful tool for research.⁷ But designing databases brings a whole new set of issues to the forefront of the researchers mind: What are the structural similarities of my sources? What are the most important elements of the world I study? What are the key relationships between those elements? How do I need to represent time? It is my belief that investigating these questions in a systematic way deepens the historian’s understanding of their own source material and analytic framework. How that is represented in their prose (if any is generated), will depend largely on the historian and the historical subject under investigation. At a bare minimum, finding the contours of your subjects’ reality will sharpen your own understanding of what is worth including in a narrative analysis, and what is best left aggregated in the database. Earlier uses of databases by cliometricians in the 1960s and 1970s focused on large-scale analysis to discover the average experience of people in different walks of life, whether in New England townships or the U.S. Army.⁸ In contrast, working with a database allows me to privilege the mistakes and missed communications of individual Foreign Service Officers. I have found that one of the greatest benefits of a data structure as constricting as a relational database, is its ability to place the downright weird in historical context. While I was drawn to the topic because of the Foreign Service’s ability to function despite being run entirely by amateurs who, at best, learned while doing, the database has allowed me to see where the especially interesting gaps or overcompensations occurred. By making it easier to find the overall trends, I am free to explore, without overstating, any anomalies I find in the course of my research. For those of us who work on trans-Atlantic and even global topics, that freedom can prove invaluable as we sculpt arguments from an ever expanding set of potential sources.

¶ 17 Leave a comment on paragraph 17 0 About the author: Jean Bauer is the Digital Humanities Librarian at Brown University. She is finishing her dissertation, “Revolution-Mongers: Creating the U.S. Foreign Service, 1775-1825,” in the Corcoran Department of History at the University of Virginia. www.jeanbauer.com

¶ 18 Leave a comment on paragraph 18 0
William G. Thomas III, “Computing and the Historical Imagination” in ed. Susan Schreibman, Ray Siemens, John Unsworth, A Companion to Digital Humanities. (Oxford: Blackwell, 2004). http://www.digitalhumanities.org/companion/ and Daniel Cohen and Roy Rosenzweig Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web. (Philadelphia: University of Pennsylvania Press, 2005). For an older, but still excellent discussion of digital history (past and present) see Edward L. Ayers, “The Pasts and Futures of Digital History,” 1999. http://www.vcdh.virginia.edu/PastsFutures.html. ↩
For a solid overview of relational databases, see Stephen Ramsay, “Databases” in A Companion to Digital Humanities. ↩
For more technical readings on databases and relational algebra see E. F. Codd, “A Relational Model of Data for Large Shared Data Banks. Communications of the Association for Computing Machinery” 13(6): 377–87. C.J. Date, The Database Relational Model: A Retrospective Review and Analysis. (Reading: Addison-Wesley, 2001). Elmasri, R. and S. Navanthe, Fundamentals of Database Systems. (Redwood City: Benjamin/Cummings, 2004). ↩
Prosopography, or group biography, consists of investigating common characteristics of a group of people, many of whose individual lives would be difficult to trace on their own. See Lawrence Stone, “Prosopography,” Daedalus 100.1 (1971), pp. 46-71. ↩
“Date conventions before the 19th century make for interesting reading, but are not consistent enough to warrant coding into a date/time handler.” This is the final line of PostgreSQL’s Documentation on DateTime Datatypes, found online at http://www.postgresql.org/docs/8.4/static/datatype-datetime.html. Lines like that make me laugh, because the only other option is crying. ↩
For a tool I have developed to make this easier, see http://www.jeanbauer.com/davila.html. DAVILA is an open source relational database schema visualization and annotation tool, and it generated the image of the Letters entity seen above. ↩
See Ansley Erickson’s essay in this same volume as well as this earlier version: Ansley Erickson, “Historical Research and the Problem of Categories: Reflections on 10,000 Digital Notecards,” Writing History: How Historians Research, Write, and Publish in the Digital Age, October 6, 2010, http://writinghistory.wp.trincoll.edu/2010/10/06/erickson-research/. ↩
Edward M. Cook, The Father’s of the Towns: Leadership and Community Structure in 18th Century New England, (Baltimore: John Hopkins University Press, 1976). J.C.A. Stagg, “Enlisted Men in the United States Army, 1812-1815: A Preliminary Survey,” The William and Mary Quarterly, 3rd Series, Vol. 43, No. 4 (Oct., 1986), pp. 615-645. ↩

Comments

Comments are closed

5 Comments on the whole Page

Ansley Erickson October 17, 2011 at 7:56 pm

This essay helped bring to life a different and more sophisticated use of database technology than I was familiar with. Your descriptions of the basic architecture of the database are clear to this novice reader.

My ongoing question as I was reading was less about the technology than the substance of your dissertation and database. I would have found it helpful to know earlier, possibly interwoven with your description of your database and the decisions it required you to take, what your core research questions were: why did you want to look at correspondence networks, and why was doing so in this way valuable? This is not only to situate the example better, but to allow those of us who haven’t imagined database variants of our own work to understand the relationship between the two more fully, and thus to be better able to think about possibilities for our own work.
Jean Bauer October 19, 2011 at 10:11 am

I also wrote a blog post about writing this essay called “Am I even qualified?: Writing about digital history”. It is more about writing DH in general, than anything particular to what I write here. If anyone is interested it can be found on my blog, “Packets:” http://packets.jeanbauer.com/2011/10/16/am-i-even-qualified-writing-about-digital-history/
Amanda Seligman November 9, 2011 at 10:44 am

This essay is important in ways that I know I do not fully comprehend. Part of its importance is that Bauer is so far ahead technologically than people like me! I sent a note about it to my collaborator indicating that this is an important essay for us but I don’t know why yet!
William G. Thomas November 23, 2011 at 12:34 pm

In the current “spatial turn” it is refreshing to hear Jean Bauer’s sage and insightful reminder that one of the most important contributions to digital humanities we can make is “the careful representation of time.” Date-time stamps, to anyone who has attempted to make a database in say Access and transfer it to say early versions of Mysql would recognize Jean Bauer’s “harrowing” moment: finding out that the database cannot capture dates before 1999 (remember Y2K?) or that the early web relational database operates on “unix time” and cannot convert easily dates from the eighteenth century. Bauer’s revitalization of relational databases, however, makes great sense and provides an inspiring and thoroughly grounded case study for historians to consider. She asserts carefully that historians moved away from the RDBMS model because many web database systems, such as msql, MySQL, and post-gres, did not initially handle time and other important historical values very well or very usefully. She’s right in this assessment I think. But there is another dimension to the move to XML and semantic linking that she might explore. Historians face large quantities of documentary materials rather than tabular data and some decisions to abandon RDBMS approaches and favor XML encoding came from the recognition and concern that these materials were fundamentally textual and documentary rather than numerical. And in addition the energy in digital humanities around TEI and XML suggested other reasons for making this move. Perhaps, Bauer might give a more robust exploration of how historians might conceive of textual sources and new database systems (MongoDB for example) and what epistemological considerations might be at stake in this decision if any.

Bauer’s section on “modeling time” is important, but I’d like to see it developed in the way that Tanaka’s essay does. This section slips too quickly into “tips” on how to encode time in RDBMS when I’d like to know more about how her project “models” or might “model” temporal relationships.
Jean Bauer November 28, 2011 at 9:57 pm

Thanks so much for the great comments. One theme that emerges is a desire for more examples. I initially struggled with deciding how much time to devote to my own research topics, but clearly I need to flesh this out more.

0 Comments on paragraph 1

6 Comments on paragraph 2

fred gibbs November 5, 2011 at 12:45 pm

having only read this one P so far, i like where’s it’s going. 2 points:
1) i wouldn’t say that databases are normative statements about reality so much as about relationships (perhaps inherent in your use of database to mean relational database), as the information in a database often does not reflect an actual reality as much as an artificial one–and this is a great freedom!
2) i’m not sure the conflation of XML and databases is helpful here. i consider databases for storage and retrieval (including very complex queries), and XML as a standard format for transfer of information. an XML file could be a database in a way (ala iTunes), but i see them as serving largely different purposes most of the time. since it seems you don’t talk about XML anymore, why introduce a new tech upfront?
Amanda Seligman November 9, 2011 at 10:48 am

This is one of those places where Bauer is far ahead of people like me. It would be really helpful to gloss (or have links to explanations) of what relational databases, XML, and semantic linking mean. I have heard all those terms before, but I know that I have less than the most superficial grasp of what they are and (more importantly) what they imply.

This comment illustrates a further important point–which is that historians who are thinking about digital history can be really far apart in what they count as basic knowledge.
Amanda Seligman November 9, 2011 at 10:49 am

I also think that Bauer should keep the sentence “Databases are normative statements about reality.” I don’t fully know what that means, but it is a powerful claim that should be available for analysis and interpretation by others.
Bethany Nowviskie November 26, 2011 at 12:51 pm

I don’t read this at all (with Gibbs) as a conflation of XML and databases — but do think that Bauer should consider defining XML and semantic linking, or linked data approaches. Especially useful would be a description of why she sees XML as “more flexible.” The Herodotus/Thucydides analogy is terrific, but will be lost on many readers without a little more framing discussion.
Jean Bauer November 28, 2011 at 10:01 pm

I wondered how people would react to my discussion of relational databases vs. XML. As Will Thomas mentioned in his comment, relational databases can be something of a hot potato for historians. I was trying to separate the two, but as Bethany points out I should do so more clearly and with greater nuance for the strengths and weaknesses of each option.
Jean Bauer November 28, 2011 at 10:05 pm

Thanks! I do, however, think that data structures in general are normative statements about reality — statements about what “ought” to matter in the world around us, created by what we choose to record and therefore save out of all the possible information. Probably don’t have time or space here, but I’d love to continue this discussion.

1 Comment on paragraph 3

Jonathan Jarrett November 3, 2011 at 5:14 pm

“Good decision support database design involves…”
I had to look at this phrase for a while before I understood all the appositions. Perhaps “Design of a database that’s good for supporting decisions involves…” would be kinder on the lay reader?

Charlotte D Rochez October 31, 2011 at 2:54 pm

In considering source selection and trying to represent different aspects of history and historical experience, it might be good for readers to see some online dialogue between you and Julie Judkins…
fred gibbs November 5, 2011 at 12:59 pm

because normalization is so primary for useful databases and so foreign for historians (sorry, i couldn’t resist), it might be very helpful to show in graphical form some of the other tables for individuals, locations, etc.
also, you might mention how in some cases (perhaps not yours) the to, from, and location fields in the letters table would need to be in separate connector tables of their own. i realize this introduces even more complexity is what is not a technical tutorial, but in order to reach a wide audience of people with very messy data, it may help others realize that it’s both possible and not _that_ difficult. otherwise, i worry that skeptics will say: oh, that’s fine for you, but i can’t use this. and that seems to be exactly the attitude this essay tries to, and hopefully will, minimize.

0 Comments on paragraph 10

4 Comments on paragraph 11

Charlotte D Rochez October 31, 2011 at 3:09 pm

It could be made more explicit whether there is the ability to change keywords for later searches.
Jonathan Jarrett November 3, 2011 at 5:19 pm

This still involves normalisation of categories, even at the lower level, which makes Charlotte’s question all the more pertinent; if, for example, it should be that someone with a consular title was operating temporarily in a diplomatic role, would you get the guy in the searches for the latter? (I don’t know if this particular example is feasible, but there must be edge cases.)
Charlotte D Rochez November 25, 2011 at 5:19 am

Also I am wondering as historiographic fashions change and different concerns come to the fore in our readings of the past, whether categorisations will become outdated or whether they will be allowed to develop and change for later searches with different concerns at later times.
Jean Bauer November 28, 2011 at 10:09 pm

In this kind of system, I’m tempted to say that the people in charge of shaping the content could always add new key words and categories — or dump the data and start over again when all the questions change. However, I do think that some of these systems will in time come to reflect the questions of their day and be of interest to historiographers for just that reason. The trick is to make your data exportable so that people can re-envision it as the field changes.

2 Comments on paragraph 12

Charlotte D Rochez November 18, 2011 at 3:07 pm

This section, and your reflections in paragraph 16 below, are particularly interesting. Read alongside Tanaka’s essay, I am left reflecting further on the extent to which technology dictates the way we see our sources, the past and indeed our conception of time. I wonder if in the online publication of this book, there might be an explicit link, label or tag for a reader between the two.
Bethany Nowviskie November 26, 2011 at 12:34 pm

Some specific citations would be helpful here, of work and experimentation in encoding and implementing time in digital humanities projects — and assumptions about temporal representation and lived experience that underlie them.

0 Comments on paragraph 13

2 Comments on paragraph 14

fred gibbs November 5, 2011 at 12:50 pm

you might also mention here the problem of certainty. though perhaps not relevant to your specific example, a common historical problem is not knowing exactly when something happened. but historians can and have to make educated guesses about such cases, and have (at least internally) degrees of certainty associated with their guesses. this too could be managed in the database, and give DB skeptics one less round of ammo in their arsenal.
Jean Bauer November 28, 2011 at 10:12 pm

Excellent point. I should build some certainty measures into the system — there are some in the citation module which allows you to associate contradictory sources with a single record and indicate which one you trust and why.

8 Comments on paragraph 15

Christopher Hager October 5, 2011 at 10:16 am

Hear, hear! In the same vein as some of the most trenchant critiques of Google Book Search — if this is going to become the world’s digital library, and it probably is, then we can’t afford sloppiness or inattention to metadata — this ¶ makes the crucial point that, as we build digital resources that likely will shape if not drive historical inquiry for decades to come, it’s imperative that the scholarly community has ways of communicating about the implicit arguments of database design. I wonder if Note 6 ought to be promoted to the main text and expanded, as a possible way forward in this regard?
William Caraher October 16, 2011 at 1:40 pm

Transparency in database design is as important to understanding databases as a historial publication as transparency is in traditional historical work. Will a database come to represent the point of mediation between the historical prose and the primary source? In other words, can you drill down through your database to your source in an efficient way?
Amanda Seligman November 9, 2011 at 11:04 am

I think that the idea “it is a publication in its own right” is an important one, although one that needs a little scrutiny and possibly explanation.
One of my worries about the character of historical scholarship in the 21st century is that I (or other historians) will spend a lot of time and effort putting together digital projects that, while excellent, will be useful for only a small number of people. One of the things that publication in print format (whether for journals or university presses) does is to certify that there is an audience for a project. That might be only several hundred people, but we seem agreed that such an audience makes our work worthwhile. In mathematics, the actual audience of a given journal article might be even smaller than that–half a dozen people. Still satisfactory.
But should we put together digital versions of archival materials that are available permanently but only have one or two other long-run users? I don’t know. (I also don’t mean this to be a comment to the effect that no one will want to know about Bauer’s database and use it; I am worrying theoretically here, not specifically; for all I know, lots of people want the data she has organized and made available).
To give an example from my own work. I am putting together a history of block clubs in Chicago. In order to know how many block clubs I am talking about, I made a spreadsheet listing them all and some salient data about them. In the old days, I might have organized that as part of a methodological appendix to the book. These days, I am considering whether to make that spreadsheet into a table and put it up online. Since I have organized the data anyway, it is probably worth the additional labor to make it available for those one or two follow up users. But I am not at all convinced that it would be worthwhile for anyone if I created that table and made it publicly available in the absence of the book I am simultaneously writing. That is, if I was just organizing the archival information in a publicly available format, it would take a lot of my time and effort, and I am not at all persuaded that it would be worthwhile a) for it to be done by anyone; b) for it to be done by me when a librarian or archivist would actually do it much better.
I suppose that where I am going with this is to wonder whether the EAFSD really is a publication in its own right, or really only in relation to the narrative interpretation that is Bauer’s dissertation. Bauer does suggest that the database is meaningful and useful to other people on its own. I think the article would not raise the questions I am meandering about here if it explicitly addressed the question of audience and the balance of time and effort with output/product.
Finally, Bauer asserts that the EASFD is a publication in its own right; but it is far from clear that a tenure or promotion decision in a traditional history department would count it. Bauer’s essay begins to help us understand why we should count it, but does so more by assertion than by finishing the job of helping skeptics understand why the database should count as scholarship too.
Kathryn Tomasek November 11, 2011 at 3:00 pm

I think that as historians explore the utilities of digital tools our models for what “counts” as publication will change. The very existence of this volume as a project is evidence to that effect. And I would say that we need to consider how digital tools, their production, and what we produce with them call into question what “counts” in our departments.
Timothy Burke November 23, 2011 at 11:07 am

I don’t necessarily think this essay is the right place to take it up, but as per Christopher’s comment above, here’s a thought: past archives were not particularly good with metadata. In fact, the entirety of everything that can be used as evidence by historians in making authoritative claims is shot through with inconsistency, fragmentation, ambiguity. That’s one of the defining features of historical writing, and why it perches on an unstable balance between humanistic and social-scientistic work. This essay might be a good place to muse a bit about the belief that better, more transparent, more consistent, more complete data structures will produce some better, more consistent, more authoritative, more valuable history. When I look at the debate over the numbers of Africans taken into the Atlantic slave trade, which was kicked off by cliometricianal ambition, I really wonder whether the historiography that engendered really led us to some greater understanding or wisdom about the Atlantic slave trade or whether it led us into a cul-de-sac where designing increasingly precise and comprehensive archives governed by more and more fully apportioned metadata became a purpose in and of itself.
This relates a bit to the question of whether tools are publications. It’s a bit of a red herring, perhaps? The point is to get credit for what we do, and to have that credit count as a form of reputation capital which the discipline and our institutions value and reward. But the ethos of a tool might be different from a publication. What’s a good tool? Sort of depends on what work you want to do. If I want to clean out the clog of a drain, only a snake will do. If I want to draw a picture, I might want a full toolbox of artistic materials. The point is that we don’t make tools for the sake of tools: we have to have a task in mind. Making some kinds of important arguments about the past might not depend upon the construction of superior informational tools.
Bethany Nowviskie November 26, 2011 at 12:31 pm

I second the notion of promoting discussion of the DAVILA project to a paragraph in the main body of the essay. It is a concrete and original contribution to the crucial problem set you outline here. Moreover, it’s a project designed by an historian for historians, and this essay is the right place to describe it in greater detail.
Jean Bauer November 28, 2011 at 10:18 pm

I only have anecdotal evidence for my user base — really should put some analytics on this thing — but I do know that several scholars working on Atlantic history and particularly the staff of the Adams Papers use the site as a reference tool for finding people and learning more about the distribution of foreign service posts in their area of study. 18th Century Diplomatic History is a sufficiently small field that I will call that a win, and hope that the site can help the field grow by making difficult to find information accessible.
Jean Bauer November 28, 2011 at 10:20 pm

I agree! Of course, I opted off the tenure track and into alt-ac, but I found that my database work generated most of my funding in graduate school and not just from DH centers at UVA. But that is also why we need to be explicit in our choices and explanation of those choices. The more we educate our colleagues, the more they will recognize these new publications as things of value.

4 Comments on paragraph 16

Jacqueline Wilson October 12, 2011 at 7:42 pm

“Databases can also be used for note taking, . . . is a powerful tool for research.”
Awkward sentence try: and are a powerful
Jonathan Jarrett November 3, 2011 at 5:22 pm

But that doesn’t mean the same thing!
Jonathan Jarrett November 3, 2011 at 5:23 pm

The last part of this paragraph cries out for an example, if you have the spare words for one! Just because this (interesting) essay is about databases doesn’t require it to abjure human interest…
Bethany Nowviskie November 26, 2011 at 12:46 pm

I see a great deal of “human interest” in this final paragraph — resting nicely in a parallel between mistakes, missed communications, and the “downright weird” in Bauer’s research, and the overall experience she presents, of digital historians-cum-database designers, who are also largely amateurs “learning while doing.” But maybe I’m reading too much into it. I really should structure my thoughts and evidence in a relational DB.

Writing History in the Digital Age (archived site)

Fielding History: Relational Databases and Prose

by Jean Bauer

Contents

Comments

Activity

Comments

Comments are closed

5 Comments on the whole Page

0 Comments on paragraph 1

6 Comments on paragraph 2

1 Comment on paragraph 3

0 Comments on paragraph 4

0 Comments on paragraph 5

0 Comments on paragraph 6

0 Comments on paragraph 7

0 Comments on paragraph 8

2 Comments on paragraph 9

0 Comments on paragraph 10

4 Comments on paragraph 11

2 Comments on paragraph 12

0 Comments on paragraph 13

2 Comments on paragraph 14

8 Comments on paragraph 15

4 Comments on paragraph 16

0 Comments on paragraph 17

0 Comments on paragraph 18

Table of Contents

Activity

Recent Comments on this Page

Recent Comments in this Document