Why Captioning is the Standard for Accessibility in Video Content Par Excellence

Nicole Downing
February 14, 2023

Guest Post by Hagen Mendrykowski*

In recent years, the importance of captioning training, introductory, or demonstration videos has become increasingly recognized in the private, governmental, and higher education sectors. The drive to make digital media content more accessible to larger audiences has led to the adoption of captioning as a standard practice. The benefits of captioning extend far beyond just accommodating the potential audiences that digital offerings can reach; captioning also creates new ways for everyone to interact with content.

This blog post will explore the significance of captioning video content and its impact on accessibility. We will also examine the ways in which new technologies, such as speech recognition software and machine learning algorithms, have made the process of captioning easier and faster. Moreover, we will look at captioning standards and implementation practices at UNC’s University Libraries. Finally, we will explore the potential opportunities that captioning video content unlocks for content creation and use cases in user communities.

Accessibility

Captioning videos has become a standard across the private, governmental, and higher education worlds in recent years as institutions—and individuals—push to make their digital media content more accessible to larger audiences. This is the result of top-down mandates as well as grassroots intra-institutional initiatives.

Captioning video content is essential for accessibility because it allows people who are deaf or hard of hearing to understand the audio portions of videos. Additionally, captions can also be helpful for people who are not native speakers or for people watching videos in a noisy public place. Finally, captions provide a written audio transcript that can be read, understood, and indexed on the web, making the video more searchable via search engines. Captioning is the keystone in making video content more inclusive and accessible for a wider audience and is a crucial step towards creating a more inclusive digital landscape for all users.

Captioning Technology

The results of captioning video content carry substantial strides towards accessibility, but captioning can be a difficult process without the aid of captioning software. Captioning technologies such as automatic speech recognition (ASR) provide us with new a new tool to meet user expectations for captioned content. Speech recognition software and machine learning algorithms have greatly lessened time and resource considerations traditionally weighed against making content more accessible. Utilizing them not only decreases constraints towards accessibility but dismantles old baselines for what was previously thought of as accessible content.

As I write this post, technology companies like Google’s YouTube continue to improve their services and performance in this area, making it easy to upload video content and auto-generate subtitles for free. Other services may charge by the minute of video play-time for auto-captioning; however, this is not cost prohibitive. Companies like Rev charge as little as 0.25 cents a minute for AI-generated captions, which are surprisingly accurate—more so than even free services like YouTube.

These subtitles are not perfect—most auto-captioning services are still a long way off from perfection. However, the strength of these services comes not from their impeccable diction but in timing. While their auto-generative subtitles may not catch every word correctly, they do note when a word is spoken and when it ought to appear in the video. This function speedily assists when inputting the correct captions, which will then appear in sync without extra effort in fussing with the timing in which the text appears.

These auto-generative captioning services can only be expected to improve in the coming years as machine-learning techniques improve. The more data and use these services receive, the better and faster they will become. Likewise, speech recognition technology improves as quickly as we decide to use it. As we continue to add human captioned videos onto the internet and correct auto-generative subtitles, the faster algorithms for speech recognition improve.

Using Captioning at the UNC Law Library

Here at the Law Library, we recently created and published a new research guide: Using eBooks at UNC. We employed video demonstrations as an additional mode for utilizing the guide. Thus, creating a new access point to our content offerings and more ways for our users to approach the information we’re sharing. We also have captioned the videos, providing yet more structure and another pathway for content utilization and interaction. UNC’s University Libraries provide in-house implementation guidelines for using YouTube’s automatic captioning.

Using these guidelines made it easy and user-friendly to implement, although guidelines are only the start. I found the process much easier in practice. While dictating speech-to-text can be monotonous, error-checking already generated captions is much more engaging. All one needs to do is follow along with the video and skim the transcript for errors.

Captioning Potential

As we become increasingly comfortable with digital content creation, we can also take note of the new potential for transforming our relationship to content and materials. Not only can we create content that replicates older modes of service for our patrons and user communities, but we can also imagine new futures and ways of creation for our offerings and services.

As speech recognition technologies improve, we’re finding new applications for them all the time. For example, we can now record and caption lectures in real time for students to watch and read along. Looking forward, it seems automatic diarization—or being able to differentiate and note separate speakers—is right around the corner for some automatic speech recognition software. Auto translation for content accessibility with non-native speakers is already being done on platforms like YouTube. Additionally, I imagine that in the coming years, this technology may be used over Zoom in classroom settings so students of multiple languages can participate in classrooms with a non-native language of instruction in real-time. Furthermore, as institutions continue to look for search engine optimization (SEO) strategies, captioning becomes a great way to ensure video content is optimally placed in users’ search engine results, allowing for increased discovery and engagement with content. The future seems primed to take advantage of practices like captioning video content for accessibility and speech recognition software which can aid in the process. These new modalities also open up possibilities for forward-thinking ambitions. Adding rich multimedia approaches to digital offerings pushes the box outside of what traditional mediums can accomplish and scaffolds new pathways for content creation and new use cases within user communities. Not only can we still seek to accomplish the same old goals and agendas more efficiently, but we can explore new future potentials more effectively with these new tools. Encountering unexpected insights, creating content untethered to old constraints, and charting new use case opportunities for our digital offerings are among several opportunities afforded by captioning.

*Hagen Mendrykowski is a Public Services Assistant at the Kathrine R. Everett Law Library and MSLS Candidate at the School of Information and Library Science, University of North Carolina.