Automating Agent Actions with Cognitive Services Speech Recognition in USD

Microsoft’s Cognitive Services offer an array of intelligent APIs that allow developers to build new apps, and enhance existing ones, with the power of machine-based AI. These services enable users to interact with applications and tools in a natural and contextual manner. These types of intelligent application interactions have significant applicability in the realm of Customer Service, to augment the experiences of both customers and agents.

One example of this is leveraging the Cognitive Services Bing Speech API within the agent’s experience, to transcribe the spoken conversation with customers, and automate agent activities within the Unified Service Desk, such as automatically triggering a search for contextually-relevant authoritative knowledge, based on the captured text.

Geoff Innis, Technical Specialist at Microsoft, provided a walkthrough of how we can leverage the power of Cognitive Services for speech transcription and automatic knowledge searching via a custom Unified Service Desk hosted control on his blog.

As agents interact with customers by telephone, they are engaged in a conversation in which the details of the customer’s need will be conveyed by voice during the call. Agents will typically need to listen to the information being conveyed, then take specific actions within the agent desktop to progress the case or incident for the customer. In many instances, agents will need to refer to authoritative knowledgebase articles to guide them, and this will often require them to type queries into knowledgebase searches, to find appropriate articles. By transcribing the speech during the conversation, we can not only capture a text-based record of some or all of the conversation, but we can also use the captured text to automatically perform tasks such as presenting contextually relevant knowledge, without requiring the agent to type. Benefits include:

  • Quicker access to relevant knowledge, reducing customer wait time, and overall call time
  • Improved maintenance of issue context, through ongoing access to the previously transcribed text, both during the call, and after the call concludes
  • Improved agent experience by interacting in a natural and intuitive manner

This 30-second video gives an example of how we can leverage the power of Cognitive Services speech recognition to automate a contextual knowledge search, using the sample custom hosted control from Geoff’s post (with sound):

Real-time recognition and transcribing of speech within the agent desktop opens up a wide array of possibilities for adding value during customer interactions. Extensions of this functionality could include:

  • Speech-driven interactions on other channels such as chat and email
  • Other voice-powered automation of the agent desktop, aided by the Language Understanding Intelligent Service (LUIS), which is integrated into the Speech API, and which facilitates the understanding of intent of commands; As an example, this could allow the agent to “send an email”, or “start a funds transfer”
  • Saving of the captured transcript in an Activity on the associated case, for a record of the conversation, in situations where automated call transcription is not done through the associated telephony provider
  • Extending the voice capture to both agent and customer, through the Speech API’s ability to transcribe speech from live audio sources other than the microphone

For a step-by-step walk through of building a custom Unified Service Desk hosted control that leverages the Cognitive Services Bing Speech API to transcribe the spoken conversation with customers and automate agent activities, see the full post here.