Author interview: Doug Lambert on Oral History Indexing

In OHR’s 2023 special issue on “Disrupting Best Practices,” Douglas Lambert presented an overview of the last several decades of oral history indexing, a strategy for creating entry points in oral history audio and video. Here, we ask Doug a few questions about the practice and what it means for the state of the field of oral history.

Tell us what Oral History Indexing is and why practitioners and archivists do it. What should an oral historian know about OHI?

Oral History Indexing (OHI) is a set of practices that involves creating access to digital audio/video collections at the timecode level. In 2021, I brought some established practitioners together at the Oral History Association meeting to talk about their indexing systems and methods. Those presentations became the basis of case studies for my OHR paper, where I summarized the work of the Shoah Foundation, the University of Kentucky (OHMS), and several other remarkable online sites featuring dynamic, electronically-linked access to oral history collections.

Better access is why we do OHI. Recording oral history on long tapes or even digital files is not enough. Transcribing them may help, depending on your goals. But if the goal is to publicly present whole, raw interviews, which is often the case in oral history, you need to provide users some meaningful mileposts and thematic cues by which to navigate. OHI is about developing roadmaps to browse and explore within and across interviews, because consuming them linearly is typically not feasible.

OHI is serious A/V content management and oral historians embraced it early on and went deep. It is much more sophisticated than, say, thematic chapter labels in YouTube videos, which only came about recently. I coined “OHI” for the paper, first to characterize existing applications in oral history. But also, I wanted to have a framework to understand these institutions and systems—which evolved mostly independently—as a collective. Building from this initial survey, we can begin to look across systems, see what they have in common and not, and move toward planning future development intentionally, building on the wisdom gained in the first three decades of OHI.

Is OHI driven by technology or need?

I would say OHI is driven by the intersection in a three-way Venn diagram between tech, need, and ambitious oral historians diving in and putting it all together. I see it like this:

OHI is predicated on technology. Technology is the road, or the infrastructure. Computers, multimedia, databases, instantaneous linkage to timecodes, the internet—those have available to use for decades.

The “need” is the ability to provide access points if you expect and desire for people to consume raw interview content. If there was any point in recording an interview collection in the first place, there is a need for providing decent access to it, and OHI aims to do just that.

The third factor is the human intelligence of oral historians. Technology just sits there until we make something of it. The real OHI work still involves creating or editing the segment elements: timecodes, summaries, digests, keywords, titles, etc. The contextual choices OHI curators make regarding these elements creates the access points. 

What conclusions did you draw from analyzing various collections’ approaches to OHI?

My article was mostly an inventory of various OHI approaches in the oral history field over 25 years, accompanied by some commentary. The real analysis is yet to begin, where we examine elements across systems: How do different OHI practices create segments and why? How do they most effectively deploy controlled vocabularies? Analyzing parameters across systems will help optimize future OHI practices, and it can also reveal where and how AI might best play in.

How do you imagine OHI may change as artificial intelligence continues to evolve?

First, I assume that anyone serious about their research or collections will never leave everything up to AI. There will always be a handshake between human intelligence and aesthetics and the tools that make the processes easier and better. AI automatic speech recognition (ASR) transcripts, though imperfect, are fast and cheap to make. Availability of better ASR is already affecting choices, like when and how to create transcripts, or even whether to build an interface for synchronized transcripts or an index.

The most exciting thing I see, is that because AI/ASR transcripts are loaded with timecodes linked to the media down to the word, they already are a form of index themselves. If you take an ASR transcript and add a theme/title periodically next to a timecode, you essentially have a thematic index. My friend Mike Frisch has been pursuing methods in this vein for years already, essentially creating hybrid index-transcripts. He does this using an indexing software called TIM that I developed with TheirStory, which we originally built to leverage AI/ASR transcripts in building OHMS indexes.

These improved ASR transcripts with timecodes may start to make the distinction I outlined between synchronized transcripts and indexes obsolete. With ASR in the mix, new tools, methods, backend workflow systems, front-end display systems, etc. will look like neither the synchronized transcripts nor OHI interfaces we have known. The entire OHI enterprise is ready to enters its next evolutionary stage. We should take a closer look at where we’ve been and select the best bits, while we embrace what AI has to offer and see what grows. I’m eager to attend and present at the upcoming Oral History Association “AI In OH” virtual symposium, July 15-19, 2024, and learn more about how to harness AI for good in our field. 

See the supplement to my article here, with lots of links and demos. 

Douglas Lambert is an engineer who began working in the field of oral history and audio/video content management in the early 2000s when he joined the Randforce Associates—a consulting firm established by oral historian Michael Frisch—to pursue new practices in thematic, timecode-level indexing for long-form recordings. As Randforce’s Director of Technology, he led dozens of projects, helping clients develop multimedia data and online displays for better access to oral histories and other a/v content. Building on his master’s degree in environmental engineering and supported by a National Science Foundation fellowship, he earned a PhD in civil engineering using oral history interviewing and indexing methods. His dissertation analyzed the results of a multidisciplinary NSF study, where a team of researchers recorded anecdotal and experiential knowledge from technical and nontechnical professionals about Superfund-era groundwater contamination. Lambert went on to a postdoctoral fellowship at the Centre for Contemporary and Digital History at the University of Luxembourg, where he codeveloped the initial version of the Timecode Indexing Module (TIM) software tool. He is currently a research scientist and project manager in the Department of Civil, Structural and Environmental Engineering at the State University of New York at Buffalo (SUNY). He continues to apply approaches and methods from oral history indexing in multidisciplinary projects and to develop the open-source version of TIM.