Column Editors’ Note: Historians, sociologists, and philosophers often struggle to trace the ways that the meaning of data shifts over time, and in particular how the ‘same’ data might be deployed to answer radically different questions. In this Mining the Past column, historian Caitlin Rosenthal introduces an important example from her work on the history of plantation slavery to emphasize the way the same data may be used not just to different ends, but also to reveal entirely new understandings of the past.
Keywords: plantation slavery, data exhaust
Over the past decade, companies and data scientists have touted the strategic importance of ‘data exhaust’—data that are a byproduct of other processes (Fourcade & Healy, 2016; Zuboff, 2019, pp. 68–70). So-called data exhaust usually refers to the trail of information that individuals leave behind as they move through the digital world.
In a sense, historians have always relied on data exhaust to do our work—archives contain trails of discarded data and documents, albeit on paper. In my own research, I rely on the thousands of plantation account books left behind by American slaveholders, many of them highly quantitative. The numerical remains of this brutal economic system offer a cautionary tale for modern researchers. Scholars who have relied on slaveholders’ data have tended to answer slaveholders’ questions. Modern data sets are no less laden with values, and revisiting the account books of American slavery can help us to recognize the risks of replicating the biases of our sources.
Consider this record of cotton picking. In August of 1859, overseer Henry L. Eggers opened a slim volume to record data about Elder Grove, a Louisiana cotton plantation (“Statement of Cotton,” 1859–1866; 1860 United States Federal Census). The picking season had just begun, and Eggers listed the names of each person who he forced into the fields to gather cotton. All of these individuals were enslaved, and Eggers was tasked with pushing them to pick as much cotton as they could. Beside each name, he entered the number of pounds picked each day, Monday through Saturday. At bottom he tallied up this data by day, and at right, by person.
In the first week of the harvest, Eggers’s spreadsheet showed a variety of tasks. The cotton boles were just beginning to open, and not everyone was picking cotton. A month later, the pages had become neat grids of numbers, and the daily output had doubled (Figure 1). Almost every hand was forced to pick cotton every day, and the overseer carefully noted their daily output. In the last week of September, output reached between 7,510 lbs. and 8,930 lbs. per day. Individual productivity ranged from 780 lbs. for Julian up to 2,425 lbs. for Little Ellick. In total, the enslaved workers brought in 57,260 lbs. over the course of the week. Even Ellen, who had been “out sick for the past eight months,” was deemed well enough to pick, harvesting 840 lbs. of cotton over the week. The only person working elsewhere, B. Henry, was “in the lint room” at the cotton gin—perhaps weighing the cotton so the data could be entered into the account book.
While the scale of this paper spreadsheet was, of course, much smaller than the data exhaust of today, data-minded Southern slaveholders fantasized about what they might learn from not just their own ledgers, but from hundreds or even thousands of similar records dispersed across the South. As slaveholder and accounting expert Thomas Affleck wrote in a letter, “Think of the advantage to both planters & overseers, of even 1,000 books written from day-to-day experience, scattered over the country!” (Affleck, 1855). Data from such books, kept in a standardized format, promised precise calculations and comparisons, turning Southern plantations into a vast laboratory for understanding labor productivity.
Such data promised an answer to a question that obsessed Southern slaveholders: How much cotton could they expect people to pick? This was, in a sense, the key performance indicator for increasing profits on cotton plantations. In a crop where output predicted profits and picking was the biggest bottleneck for production, knowing how much someone could pick was essential information. As South Carolina planter Plowden C. J. Weston put it, “In nothing does a good manager so much excel a bad, as in being able to discern what a hand is capable of doing and in never attempting to make him do more” (Weston, 18–, p. 8).
Yet, at the time, preformatted account books like the one used on Elder Grove plantation yielded only shadowy answers to these questions. These primitive spreadsheets enabled slaveholders to arrange and organize productivity data for each of the men, women, and children they enslaved. Planters and overseers tallied up the pounds, experimenting with incentives and brutal punishments. They made comparisons and drove people to work faster. However, without the tools for more complex analysis, definitive answers remained elusive.
More than a century and a half later, we have the tools to analyze planter’s data easily and fully. Two economists— Alan Olmstead and Paul Rhode—have, in a sense, answered slaveholders’ questions. Toting up data from 142 plantations like Elder Grove, they have estimated changes in picking rates over time, finding that the amount of cotton each person picked increased dramatically under slavery (Olmstead & Rhode, 2008).
The increase in productivity was dramatic. By 1862, enslaved people were picking on average about four times as much cotton apiece as they had in 1801. Olmstead and Rhode have also analyzed the micro data to estimate productivity by age and by gender. They are finally answering the questions that drove Southern slaveholders to collect the data in the first place. The causes of the productivity gains remain a matter of intense debate and analysis.1 Yet amid this debate, scholars concerned mainly with analyzing data have rarely paused to ask whether slaveholders’ questions are the most important ones to answer.
Presented with slaveholders’ data, we have continued to ask slaveholders’ questions. Indeed, the latest round of fighting over the productivity of slavery sometimes feels like a replay of a similar clash over economists Robert Fogel and Stanley Engerman’s Time on the Cross (1974/1995). Fogel and Engerman argued that slavery was highly profitable and that slaves even had a “Protestant ethic,” not only using planters’ data but also imbibing some of the lies they told about enslaved peoples’ motivations. Fogel and Engerman’s findings around profitability have come to be widely accepted and have helped us to understand the extent to which slaveholders’ practices reflected the growth of capitalism. Yet even as we acknowledge the ways slaveholders blended profit seeking with brutality, we are still gravitating toward slaveholders’ numbers and prioritizing the kinds of questions those numbers can answer. Slaveholders’ data—and the way scholars have used it—reflects this fundamental limitation: Data that is the exhaust of an economic system will be most useful for answering questions about the things that economic system values.
Data ethicists have already begun to document such biases in modern data practices, demonstrating that both the composition of data sets and the ways algorithms are trained can harm vulnerable groups, entrenching and perpetuating race and gender stereotypes (Benjamin, 2019; Criado-Perez, 2019; Noble, 2018). For example, training data from sources like newspapers has produced tools that are more useful for populations seen as newsworthy—disproportionately White men. Commercially available facial recognition algorithms value Whiteness and masculinity, performing best for lighter-skinned males and worst for darker-skinned females (Buolamwini & Gebru, 2018).
More generally, the data exhaust of the online economy will be most useful for answering a narrow set of questions about a specific set of people. Though there are surely unexpected insights to be gleaned from the digital breadcrumbs that trail behind us as we move through the online world, a huge proportion of this data is gathered as we buy and sell. The data exhaust of consumption will prompt us to tell stories about consumer capitalism that highlight our identities as consumers—narrow identities that leave many (perhaps most) important things out.
Slavery’s data tells us how much cotton people could be forced to pick. Data gathered as we consume will tend to offer insights about how to sell us more.
What have economic historians missed by devoting their energy to answering slaveholders’ questions? Another group of historians have also relied on plantation account books for their research, taking a different, slower path through the data and acknowledging the violence embedded in historical archives (Fuentes, 2016; Rosenthal, 2018; Smallwood, 2007). Data scientists can learn from historians as we all seek to develop critical approaches to biased data sets.
Returning to the Elder Grove account book, we can learn as much about the institution of slavery by questioning the data as we can by analyzing it. The grueling nature of the work—invisible in the productivity calculations—is immediately apparent.
Labor was not freely given. On the neat grid of numbers for the last week of September, there is one gap, for Charles who has “run away.” He escaped at the peak of the picking season—at the very moment when the planter most desired his labor. Following Charles’s row forward, we see that he managed to stay away for 3 full weeks. How did he survive away for so long? He stayed away on 2 days too rainy for anyone to work—who sheltered him? Did he negotiate the conditions of his return? Did he inspire (and perhaps even aid) the escape of Mose the following week?
We can see, too, the constant sicknesses that afflicted enslaved people. As the picking season began, Ellen, Emily, Big Watson, and Angeline were all too ill to go to the fields, yet as the need for pickers increased, they were sent to work anyway. Would Francis, out for 4 days and described, as “always sick” in the margins, have been so ill if he had not been constantly sent to the fields? On Residence plantation, also in Louisiana, the journal refers to a “measles gang,” a team of laborers who were still sick but had been deemed well enough to return to making sugar (Rosenthal, 2018, p. 115).
Looking around and between the numbers raises different questions—about power, struggle, and coercion. These questions are easier to ask with the benefit of hindsight, but even at the time, abolitionists sometimes remade data collected in the pursuit of profit into exhibits of the human costs of slavery. British writer James Stephen relied on data from a plantation account book to critique West Indian slavery. He used both the numbers and the horrifying descriptions of punishment interspersed between them to reconstruct the poor treatment of enslaved people, even depositing the original with his bookseller so that other individuals could reflect on it and see that his “extracts are correct” (Rosenthal, 2018, pp. 80–81).
Can we craft a slow data approach that opens up similar possibilities in modern data science? Though the scale of today’s data sets makes it more difficult to analyze them qualitatively, researchers have called for careful reflection and documentation of their biases. The hope is that through auditing and documentation—creating “Datasheets for Datasets”—the creators and distributers of databases can help users to see their biases and thus avoid misuse (Gebru et al., 2020; Raji et al., 2020). Citing lessons from historical archives, they have suggested that careful curation can help us to critique and counter underlying biases in data sets—for example, developing supplementary data sets that counter biases in representation (Jo & Gebru, 2019).
Developing adequate documentation and supplementary data sets will not be fast. Indeed, artificial intelligence ethicists have been clear that these processes are “not intended to be automated” (Gebru et al., 2020, p. 3). While automation may be fast and cheap, it compounds the risk of unthinkingly replicating data’s biases and overlooking what is left out (Eubanks, 2018).
Nor will such ‘slow data’ approaches make data exhaust useful for every question. More often, careful documentation and critique will expose what is missing, making clear what questions ‘data exhaust’ simply will not answer. Here, too, scholars of slavery can show the way. Even as historians have uncovered stories about enslaved people in the data exhaust of slavery, they have also recognized how little the records often tell us (Fuentes, 2016; Hartman, 2007; Morgan, 2016). No amount of reading between the numbers will wring full stories about enslaved people from records of their value as labor and capital. If historians have sometimes found the human in the data, more often we are left searching.
Yet here modern data scientists have the advantage over historians, who cannot go back in time to ask different questions. Today’s data scientists can reckon with bias, crafting new research questions and recognizing that answering them fully will require multiple kinds of data, qualitative and quantitative, fast and slow.
1860 United States Federal Census, Ward 3, Carroll, Louisiana (p. 368). Family History Library Film 803409. https://www.ancestrylibrary.com/discoveryui-content/view/38477521:7667
Affleck to Hammond, 3 January 1855, Box 32, Folder 10, Thomas Affleck Papers, Louisiana and Lower Mississippi Valley Collections, LSU Libraries, Baton Rouge, LA.
Benjamin, R. (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity.
Buolamwini, J. & Gebru, T.. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, in PMLR 81:77-91. http://proceedings.mlr.press/v81/buolamwini18a.html
Criado-Perez, C. (2019). Invisible women: Data bias in a world designed for men. Abrams Press.
Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press.
Fogel, R. W., & Engerman, S. L. (1995). Time on the cross. W. W. Norton & Company. (Original work published 1974)
Fourcade, M., & Healy, K. (2016). Seeing like a market. Socio-Economic Review, 15(1), 9–29. https://doi.org/10.1093/ser/mww033
Fuentes, M. J. (2016). Dispossessed lives: Enslaved women, violence, and the archive. Early American Studies. University of Pennsylvania Press.
Gebru, T., Morgenstern, J., Vecchione, B., Wortman Vaughan, J., Wallach, H., Daumé III, H., & Crawford, K. (2020). Datasheets for datasets. ArXiv. https://arxiv.org/abs/1803.09010v7
Hartman, S. V. (2007). Lose your mother: A journey along the Atlantic Slave Route. Farrar, Straus and Giroux.
Jo, E. S., & Gebru, T. (2019). Lessons from archives: Strategies for collecting sociocultural data in machine learning. FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 306–316). ACM. https://doi.org/10.1145/3351095.3372829
Morgan, J. L. (2016). Accounting for “The Most Excruciating Torment”: Gender, slavery, and Trans-Atlantic passages. History of the Present, 6(2), 184–207. https://doi.org/10.5406/historypresent.6.2.0184
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. New York University Press.
Olmstead, A. L., & Rhode, P. W. (2008). Biological innovation and productivity growth in the antebellum cotton economy. The Journal of Economic History, 68(4), 1123–1171. https://doi.org/10.1017/S0022050708000831
Weston, Plowden C. J. (18–). Rules for the government and management of ___ plantation : To be observed by the overseer. A. J. Burke.
Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. ArXiv. https://arxiv.org/abs/2001.00973
Rosenthal, C. (2018). Accounting for slavery: Masters and management. Harvard University Press.
“Statement of Cotton,” 1859–1866, Robert H. Stewart Account Books, Mss. 404, 4732, Louisiana and Lower Mississippi Valley Collections, LSU Libraries, Baton Rouge, LA.
Smallwood, S. E. (2007). Saltwater slavery: A middle passage from Africa to American diaspora. Harvard University Press.
Zuboff, S. (2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power. PublicAffairs.
This article is © 2021 by the author(s). The editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license (https://creativecommons.org/licenses/by/4.0/legalcode), except where otherwise indicated with respect to particular material included in the article. The article should be attributed to the authors identified above.