The rise of synthetic media: Get ready for AI avatars at work
Despite ongoing concerns about the rise of deepfake videos — online content created or manipulated typically for nefarious purposes, from election interference to emerging cyber security threats — digital “synthetic media” offers real-world benefits in the workplace.
That’s the promise, at least, offered by a variety of startups that have turned to generative AI (genAI) tools and deep-learning algorithms to create human-like digital avatars. In particular, the new tools can be used by enterprises to generate in-house communications, training videos for employees, how-to manuals for specific jobs or tasks, and even customer-facing marketing videos.
Primed with a pre-written script, digital avatars can be used in video content without the usual production and editing costs and effort. The result: faster content creation, greater personalization, and the ability to translate communications into a range of languages without hiring a voice-over actor — all while still delivering useful information to employees and customers.
Ritu Jyoti, group vice president for AI at IDC, sees “huge potential” for AI-based video creation tools in a business context. “Enterprises are going to use it for marketing, for education, training, creating video manuals,” she said.
In most cases, it’s immediately clear that a video has been created artificially. But the technology has advanced to a sufficient level of realism that AI video generation tools are now suitable for corporate communications.
“They look very realistic,” Jyoti said of the synthetic avatars. “Now they can blink they can, move their eyes, their cheeks, the lip movement is there….”
A number of startups have emerged in recent years that promise to help businesses create life-like digital avatars of their employees. That list includes Synthesia, which has received $156 million in funding in the past two years; D-ID; HeyGen; and Hour One. (Another, Rephrase.ai, was recently acquired by a “leading technology company” reported to be Adobe.)
Larger players are also developing similar features: Microsoft unveiled its Azure AI Speech service in November, with the tool currently in preview.
“I think that we will continue to see a market grow out of this, both on the large tech vendor side as well as the startup side,” said Rowan Curran, senior analyst at Forrester.
For now, though, the market is in its early stages, he said, at least in terms of enterprise uptake. “We’re still in a very nascent period with these tools, more so in terms of the adoption than in terms of the actual functionality,” he said.
Video to replace text documents?
The basic process of creating AI-generated content in most applications is fairly straightforward. Users typically choose either an off-the-shelf, generic avatar from a range of options, or upload video footage (or in some cases, just an image) of an employee to create a digital representation. A voice is selected, a text script is then added, and other customized aspects such as background can be included, too.
Once all the parts are in place, a video is generated that can be used on its own or embedded into files — a talking head in a PowerPoint presentation, for instance.
The key advantage for business is reduced costs, Victor Riparbelli, CEO and co-founder of Synthesia, said in an email interview. (The company’s customers include multinational firms such as Heineken, Zoom, and DuPont.)
“The price of employing a video production team, as well as paying for expenses like equipment and studio time, can make video production impossible for many organizations,” Riparbelli said.
Customers can cut the time required to produce videos, he said, and make changes without the need for reshoots. The tools also allow a broader array of workers to create video within an organization, without the need for video production know-how.
Aside from marketing content, the most prevalent business use at the moment is for creating learning and development content, said Riparbelli, with onboarding and hiring videos other common examples.
Officials at D-ID, whose customers include Fortune 500 firms, explained that video created via a genAI-based platform often replaces traditional office documents for purposes such as employee learning and development.
“Whereas that content used to be predominantly written, like PowerPoint slides or whatever, we can now help them create that content and make it be video,” said Matthew Kershaw, vice president for commercial strategy at D-ID. People are more likely to watch a video than read a written document or presentation slides, he said — and more likely to retain that information afterwards.
In addition to video, D-ID is also focused on the use of AI avatars for close-to-real-time interactions with enterprise customers or in-house employees. The idea is to marry synthetic media with powerful content generation of the AI — essentially making avatars the “face” of large language model (LLM) based chatbots, Kershaw said.
“You can then create this digital human avatar that you can talk to in real time and ask it questions,” he said. “LLMs are very limited. It’s still text: you put text in and you get text back. What we have is the ability to chat to it in a much more natural human way.”
The company hopes eventually to be able to include sentiment analysis to track the emotional flow of conversation, he said. (This is not currently a feature of D-ID’s product.)
“So if it’s a customer service thing — and the customer is getting frustrated or angry — the avatar can recognize that and say, ‘I hear you’re quite frustrated,’” said Kershaw. Another example could be for HR-related purposes, he said, with the ability to ask an avatar a query relating to a company rules – guidelines when selected for jury service, for instance – rather than having to consult an employee handbook document that might even be in a different language.
With AI avatars, caution needed
As with the use of any genAI tool, analysts advise businesses to take precautions around security and governance when deploying AI video-creation tools. “Any company that is considering using these [applications] should do rigorous testing, risk assessments,” said Curran. That includes user acceptance testing to understand how employees respond to these tools in practice.
Businesses should also be wary of the outputs of AI video creation tools, said Jyoti. Just as text-based tools like ChatGPT can have “hallucinations,” an avatar’s conversation might diverge from the script input. This can especially be a problem when text is translated into multiple languages. Businesses should ensure content filtering is in place to mitigate hallucinations and any “toxic” outputs, Jyoti said.
It’s also important to make sure controls are available to control an avatar’s delivery so it matches the intended tone of the message. “Make sure that you test it out, experiment with it well, and use it for simpler, less riskier use cases [first],” said Jyoti.
The use of avatars also raises real questions about the ownership of data. AI-based video creation tools make it easy for an employer to continue to create video content based on an employee’s likeness even after the person leaves the company, for instance. “Some of these things are answered in some employment contracts already, but there are going to be additional grey areas,” said Curran.
And while concerns about the misuse of these tools to create deepfakes or unauthorized content are real, vendors are taking steps to prevent this from happening. Kershaw noted, for example, that videos created using D-ID’s software will contain a logo (either of D-ID itself or from the customer) or a disclaimer to indicate that that the video is “real.”
A coming influx of synthetic media?
AI video-generation tools in some ways represent the evolutionary next step in the genAI wave that began in late 2022. Early tools like OpenAI’s ChatGPT relied more on text generation, but that’s likely to change.
Curran predicts a “big refocus on image and video generation” in 2024, “instead of just the text generation that we’ve seen as the focus of the generative AI boom over the past year.”
Beyond AI-generated avatars for video, there are other text-to-video tools under development, including voice- and audio-generation technologies that are starting to gain traction. The combination of these technologies could dramatically increase the amount of content generated by businesses and across the internet. People could be viewing or interacting with so much synthetic media that content might soon created “at a rate that can actually meet the demands of enterprise channels,” said Curran.
That’s not to say genAI will replace the need for human involvement in content creation anytime soon. AI-generated content may be unsuitable for certain types of communication where a human connection is desirable — a CEO addressing staffers during a crisis within the organization, for instance.
Kershaw said the point of tools such as D-ID isn’t to replace video production in all scenarios, but to make it possible to create video where it hasn’t been practical to do so before.
“The reality is there will still be video production, because there are things you can do with real video that you can’t do at the moment with AI,” he said. “What this does enable you to do is put video in more places — places where you never normally might have had it.
“There used to be a lot of print in black and white,” he said. “Now you almost can’t print in black and white; everything is in color. And I think we’re going to see a similar thing with videos: video is just going to become the norm in communications in business.”