What it takes to make 165 years of journalism available online

Aldana Vales
Building The Atlantic
8 min readApr 5, 2023

--

Digitizing The Atlantic’s archive and turning it into a product for readers are two different things. Here’s how we did both.

The Atlantic’s archive logo and a collection of a few magazine covers.

It was, above anything else, an editorial imperative. The right thing to do for our readers.

As The Atlantic’s deputy executive editor Sarah Yager puts it, making journalism is really about creating a record of the world around us. It’s about documenting what is happening and hopefully helping readers to make sense of it. And there’s enormous value in being able to learn from what was happening in the past, and how history was interpreted in real time.

The Atlantic had wanted to digitize its archive ever since it started publishing online in 1995. At that moment, a website for our magazine introduced the new opportunity for people to read the journalism we were writing that day, online. Our magazine, however, had been continuously printed and published since the autumn of 1857. There were a lot of stories that our readers couldn’t access online. From time to time editors would manually reproduce archival articles, but the massive task of making the archive available remained elusive for almost three decades.

How does one take tens of thousands of words from our print pages and publish them on the internet? How would The Atlantic suddenly go from not offering this content to having it on its data structure, its website presence? What are the mechanical steps to getting there? These were some of the questions that our product and technology colleagues asked themselves in May 2021, when the magazine committed to the challenge of bringing the archive online.

First steps

At the beginning of a project of this scale, there are many different visions of what it should look like, of all the features it could include. As Executive Director of Product Carson Trobich explains, it can be hard to figure out how to put something that large into its first steps.

“You need to identify the limits to your ambition and put the initial excitement into research.”

Carson Trobich, Executive Director, Product

To orient our visions and find first steps, our product colleagues researched 20 publishers to see how they were resurfacing and repackaging archival content.

The team identified that some publishers’ archives only consisted of scans of printed pages, while others transformed pages into digital text. An archive could be fully available online or just in part. It could live on a publisher’s website adjacent to modern content, or it could be its own separate product, with additional functionalities. Some archives even live off platform.

The Atlantic decided early on that it was our ambition to make the full archive available. For transparency to our readers, and for the historical record, we wanted to share it all — from our most enduring reporting to some stories that have rightly fallen into obscurity. As our editor in chief Jeffrey Goldberg wrote in an editor’s note introducing the project, “It’s all here: the good, the bad, the brilliant, the offensive, the ridiculous. We knew from the start that we would engage in no censorship, trimming, or dodging.”

By building space for the archive to live on the current website, our product colleagues worked on digitizing and presenting past articles in our modern article template. These are the steps they followed to get there:

1) Transcribing the content: The Atlantic came into this project with PDF scans of all the pages that it had ever published. To make sense of all that information, our engineering team worked with a vendor specializing in digitizing media magazine archives. The contractors used optical character recognition and high resolution scans to identify different regions and zones within each page — mapping the position of everything The Atlantic ever printed.

This first step also required a schema definition, which taught the vendor to recognize what they were digitizing and laid the foundation for content ingestion. This way, the vendor learned how to identify content types (e.g. headlines or page numbers) and tag them in a way that our own internal systems could understand.

This process produced highly detailed packs of XML files, PDFs, and JPEGs that amounted to 400 GB of data.

2) Ingesting the data: The next step was to import it all into our CMS. Before our engineering colleagues could do this, they needed to assess what types of content were in the archive, to determine what should or shouldn’t be imported. To achieve this, they built an index of 110,000 pieces of content. Not all of that would turn into article pages during the importing process. Just over half, for example, were advertising.

Engineering, as our Systems Architect Chris Barna describes, can often be loaded with resource constraints. They had the opposite issue here–too much information. The team narrowed down what The Atlantic wanted to republish in the digital archive: articles, short stories, poetry. Or better, what we didn’t want to publish: ads, table of contents. Then, they took that index and turned it into articles in our CMS, the same way our contemporary magazine articles are uploaded. It worked great–until they started running into problems.

“For our system, it’s easy to publish 30 articles at a time. We needed to publish a thousand articles at a time.”

Chris Barna, Systems Architect

3) Publishing thousands of articles at a time: A lot of the later stages of the conversion and importing process were about taking our bulk actions and making them bulkier. That is, scaling from operating on a single magazine issue to operating on a year’s or even a decade’s worth of content.

While our product and engineering colleagues figured out the digitizing part of the process, the audience research team talked with readers to identify if and how they would use The Atlantic’s archive. Because digitizing the archive and turning it into a product that readers will enjoy are two different things:

Digitization is the transcription of a print archive into a consumable and useful format.

Productization consists of the ideation, research, and execution of how we present this content to our audiences.

Understanding readers’ experience with the depth of our journalism was key to see what kind of product development we should be focusing on. How would all this fit in our readers’ lives, if at all?

After conducting interviews with Atlantic readers, the audience research team identified two primary profiles of archive discovery. We called them the seekers and the surfers.

Seekers typically researched a specific topic or a time period, mostly for professional purposes. They engaged deeply and often used search functionality to find archives.

Surfers would come across archives serendipitously while surfing the internet. They were motivated by curiosity and wanted historical context on contemporary issues.

There were more surfers than seekers, but they engaged less deeply and less frequently. Whatever product and design direction we chose, we wanted to serve our readers in both modes.

Context that wouldn’t underestimate our readers

From a design perspective, the archive publication presented an opportunity to move people in a particular direction on our webpage, to offer cues and help them navigate our site. Our design team wanted people to remember that it was, in nature, a tool. They wanted readers to use it to browse and find things.

This design portion started around December 2021. Our design team planned to use the standard Atlantic article page and introduce a couple of new features for the archive experience. Every archival page, for example, would incorporate a recirculation module to encourage readers to continue exploring. It would also prompt them to read the full issue that featured that article. Essentially, it’s the same approach we follow in our contemporary magazine.

Offering context was also key. All these articles from the past were suddenly going to be available to our readers. Since The Atlantic was going to use the same format that it uses for our modern magazine, the design team aimed to give readers the tools to quickly identify when each article was published. They hoped that readers would understand when something was from the archives, as opposed to current, new information. And they wanted to achieve this without underestimating our audience, providing awareness without making readers feel warned.

Christopher Chester, one of our senior product designers, wanted to make sure that people understood that they could easily jump in the archive to browse the magazines. If they are in the archive and want to see the latest issue, they can do that. He also wanted to link our magazine pages back to the archive, so that readers always see they have a quick way to navigate the rich history of The Atlantic.

“To me, the archive felt like it was a celebration of the magazine. I started thinking that, if we’re looking at our archive, I wanted to tie it back to our physical editions.”

Christopher Chester, Senior Product Designer

The structure of Atlantic articles has evolved since 1857. In our magazine, we now have stories with a subhead or dek, which hadn’t always been the case. For decades, articles didn’t include a description below the headline. Luckily, our modern article structure is adaptable to each story’s content. For example, we can publish stories with lead images and without them. This flexibility allowed us to respect the original material and faithfully republish archival content as it appeared in The Atlantic’s pages.

Focus on the main goal

The Atlantic archive launched in July 2022 — a little over a year after our product and technology teams were tasked with building it. It includes, besides the modified article pages, its own landing page and a completely redesigned magazine section that not only helps readers navigate 165 years of Atlantic covers, but also celebrates writers that contributed to The Atlantic throughout its history.

Scrolling of the archive landing page.

When it launched, the archive still faced some challenges. No archival digitization is perfect: occasional typos and formatting issues will always come with this kind of process. Despite this, our team focused on the number one goal: putting the archives in the hands of the readers.

It’s easy to get bogged down with details and strive for perfection when creating a product. In reality, striving for perfection is not only impossible, but it can also be counterproductive. It’s important to recognize that you can still deliver an excellent experience and fulfill all kinds of readers’ needs without achieving perfection.

As a product team, there are still many opportunities that we can explore. For instance, our readers haven’t gotten the chance to explore advertising through the decades. That’s just one example. Our content is so vast and we still have the door open to build other things.

Carson Trobich, Chris Barna, Christopher Chester and Sarah Yager contributed to this post. Additional acknowledgements: Jefferson Rabb, Kristen House, Emily Goligoski, Mollie Leavitt.

--

--