Integration Guides

Didimo wants to ensure that wherever possible we enable developers to easily build the digital human solutions that work for them. We understand that there are other services/interfaces commonplace in extending specific functionality for digital humans and bridging the gap between a static 3D digital human and a virtual being that is brought to life and acts, performs, moves, speaks, and emotes in a truly believable manner.

We strive to ensure that didimos are compatible with these third-party solutions and technologies out-of-the-box wherever possible and when there is a significant need from our customers.

To be designated as a Supported Service, Didimo carries out extensive testing, as well as provides a how-to example integration/use-case in the How-To Guides section of this Developer Portal.
To date, Didimo has chosen to support the following services:

Amazon AWS Polly

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk and build entirely new categories of speech-enabled products. Polly's Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural-sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries.

For an example AWS Polly integration, please see Integration: Text-To-Speech

Apple ARKit

ARKit is Apple's toolkit to support a variety of Augmented Reality solutions and includes powerful real-time face tracking technology, allowing for lightweight and easily usable motion capture on Apple devices.

For an example ARKit integration, please see Integration: ARKit Face Capture

Oculus Lipsync

Oculus Lipsync offers a Unity plugin for use on Windows or macOS that can sync avatar lip movements to speech sounds and laughter. Oculus Lipsync analyzes the audio input stream from microphone input or an audio file and predicts a set of values called visemes, which are gestures or expressions of the lips and face that correspond to a particular speech sound. The term viseme is used when discussing lip reading and is a basic visual unit of intelligibility. In computer animation, visemes may be used to animate avatars so that they look like they are speaking.

For an example Oculus Lipsync integration, please see VR Showcase (Didimo + Oculus Lipsync)


We are working on adding more services to this list, and are open to requests from prospective customers or companies wishing to have Didimo support their service. Please submit a feature request at Feature Requests or Contact Us to find out more.