Newspaper Navigator: A New Image Search Tool from the Library of Congress
September 24, 2020
The Library of Congress has just debuted Newspaper Navigator, a new tool for searching over 1.5 million newspaper images in their collections. The Library of Congress and their partners have digitized over 16 million pages of newspapers from across American history as part of their Chronicling America database. The Newspaper Navigator uses machine learning and computer algorithms to enable users to search the databases by both keyword and a new “similar image” search. Newspaper Navigator currently covers roughly 90% of newspaper images from 1900-1963. Everything accessible through the application is in the public domain, so all of the images can be used for free.
Researchers can search for images in the collection in multiple ways. A search can be limited to a particular time range by year or by the state where a paper was located. Users can then enter keywords to search for images, and Newspaper Navigator extracts keywords from the newspaper to identify images. Users can also use a the newly designed tool to do a similar image search, which uses those algorithms to identify images that resemble the initial image. Newspaper Navigator uses machine learning to continually improve its search capabilities. More information about these search processes can be found here.
Newspaper Navigator was designed by Ben Lee as part of the Innovator in Residence Program at the Library of Congress. The program supports “innovative and creative uses” of the Library’s collections. Past residents also created applications utilizing the Library’s sound and art collections. Lee was inspired to apply because of an earlier crowd-sourced project with the Library called Beyond Words, in which members of the public identified keywords for images in the newspaper collections. Newspaper Navigator builds on that project, using the keyword metadata and adding the similar image search.
Lee also studied the problem of algorithmic bias and explored its effects on this system. He used four photographs of W.E.B. DuBois as a case study and published a paper discussing the results here.
Newspaper Navigator and the code used to build it are all in the public domain. The developers have also published a white paper on the website, as well as data sets and other materials related to the tool, all for public use.
Newspaper Navigator gives researchers new ways to explore images from across millions of newspapers and further develops search technologies for all libraries and databases. Whether you’re a legal scholar looking for a relevant political cartoon, a lecturer looking for that perfect image to illustrate a point in a slide show, or a student trying to understand a moment in American history, Newspaper Navigator provides new ways of accessing some of the Library of Congress’s core digital collections.