Contact Center Solutions Featured Article

Q&A with Tellme on Speech Recognition

February 01, 2010

Make a call, or press a button on a voice activated device to reach a contact center or receive a call from it and chances are growing that you will be listening and speaking to a machine, i.e. speech recognition  or speech rec. So much so that you may be trained already to converse with it via slow, clear and minimal-accent pronunciation when you hear that automated voice. 

Tellme, part of Microsoft's business division, is helping to make speech rec a more feasible and popular means of interacting with organizations. More than 40 million people already use Tellme’s solutions every month to reach the people, businesses and information they need on the phone and on the go.
Grant Shirk is director of industry solutions at Tellme Business Solutions at Microsoft Corp. ContactCenterSolutions caught up with Grant to ascertain his take on speech rec trends and to find out new developments are in the works from his firm.
ContactCenterSolutions: What trends are you seeing in applying speech rec to contact centers? Are you seeing increasing, the same, or decreasing interest and why? Discuss the impact of the downturn and the prospects of a slow recovery on speech rec demand.
GS: In part due to the downturn, we see speech technology and voice self-service applications entering a period of accelerating change. Consumers want faster access to smarter, more personalized customer service and expect the businesses they interact with to make it easier to do business across online and offline channels. Leading companies in turn will provide a higher level of cost-effective customer service that builds their brands (instead of diminishing them). This is well aligned with macro trends we are seeing in the market, where businesses consistently identify high-priority strategic goals of 1. Improving the customer experience and  2. Managing service costs. 
Microsoft sees three key trends that will help drive these changes over the next few years: 
A focus on personalized, cross channel experiences  
As noted above, consumers are becoming more mobile and more connected across more devices, increasing their expectations for great customer service. In response, businesses will need to manage each of these customer interactions as one part of a larger conversation that spans multiple touch points. This personalized, unified view of the customer will result in more contextually relevant service events for customers, and stronger relationships between people and products. Microsoft is investing in the Tellme platform to give our clients the capabilities to deliver these rich services, including more powerful speech technologies, dynamic personalization engines, and unified inbound and outbound services.
Speech is approaching an inflection point
Speech has been on a steady trajectory of improvement for the last ten years, benefitting from the millions of utterances that have been recorded, transcribed, and analyzed on platforms like Tellme. But, more importantly, the rise of distributed computing resources like Microsoft’s Azure will significantly transform our ability to accurately and reliably recognize both directed dialog and natural language speech. Harnessing this computing power promises more accurate recognition, greater automation per task, and higher user satisfaction. Together these will help us deliver on our vision to make speech a resource (not just an application) that more people and more devices can utilize.
 More services moving into the network
In addition to the emergence of distributed computing platforms for speech recognition, we expect to see more IVR services moving into the network as businesses seek the performance improvements and lower costs that on-demand platforms can provide. Virtualization of queuing and routing is a logical next step that can drive higher agent utilization within the contact center and reduce the cost and necessity of standalone CTI services. Tellme expects this virtualization to also improve the customer experience by getting callers to the right agent at the right time (avoiding unnecessary transfers) and enabling the growth of innovative services like virtual hold and scheduled call backs.
ContactCenterSolutions: What purposes are you seeing speech rec deployed and rank them in importance and why:
1.         Divert calls from live agents
2.         Replace DTMF IVR systems to improve service and reduce 'zero outs'
3.         Shorten live agent calls
4.         Cut call queues and lower costs and bolster service by asking customers to leave V/Ms that are converted to text, and forwarded to agents as e-mails for e-mail response or scheduled outbound calls
Within the contact center, anywhere from 50 percent to70 percent of the costs are labor, training, and management. As a result, it is no surprise that the primary reason enterprises are investing in speech self-service is the opportunity to divert more calls from live agents. Closely following this is businesses’ desire to improve their overall customer experiences (as noted above) – speech can play a key role in this endeavor as well. By making phone interactions more natural, more efficient, and more effective than legacy touchtone applications, speech can unlock previously unavailable automation opportunities and free up agents to spend time on higher order tasks. Another key reason we see businesses adopting speech as a key technology is to drive brand and service differentiation in the market.  
Finally, we are seeing enterprises that already have mature speech deployments looking to experts like Microsoft to help them further optimize their utilization of speech, with a particular focus on shortening agent tasks. By tuning speech applications and consulting with Voice User Interface (VUI) experts, opportunities to save agent time through “partial automation” (e.g., increasing authentication, capturing more complex intent from callers) abound.
ContactCenterSolutions: What new developments are you seeing in speech rec technology?
GS: There is a strong trend in the market moving speech resources into the network, taking advantage of the power of distributed computing and services architectures. As a result, compelling speech interfaces are becoming more useful and prevalent for new usage scenarios, including automotive, mobile, search, and proactive customer care. This evolution of the technology is also driving the adoption of more natural, open speech interfaces that speed callers’ time to task and improve routing across agent skills.
Together with Ford Motor Company and Kia, Microsoft and the Tellme platform pioneered the use of network-based speech to drive in-car experiences with the Ford SYNC product, and Kia with UVO (your voice), both profiled at CES and SpeechTEK 2009. The Ford SYNC service accesses the Tellme platform to provide drivers with hands-free access to local business search, driving directions, and other information. We expect to see more manufacturers moving toward a network-based model in the near future.
In addition, speech is quickly becoming an integral part of the mobile device interface. A great example that showcases the power of speech and language processing technologies is the recently launched Bing Mobile client. To provide mobile users the best possible speech performance for these advanced tasks, the speech features need to take advantage of network-based (rather than embedded) recognition capabilities.
Finally, we see strong adoption of speech for proactive customer care applications (Outbound IVR). Adding speech capabilities to services that were previously agent-only or touchtone gives businesses the opportunity to create more compelling, efficient campaigns, for early-stage collections, alerts, or customer care scenarios. Because the scope of outbound applications is much smaller than inbound –they are usually focused on completing a very specific task – on-premise speech deployments were often cost-prohibitive. However, with the advent of on-demand speech as a service, more companies now have access to the technology in an affordable way.
Following this trend, we expect to see speech adoption in many more traditionally consumer scenarios over the next several years. Indeed, the adoption of more natural user interface (NUI) technologies like speech, touch, and gesture are already underway, with one of the most prominent implementations being Microsoft’s Project Natal (Xbox 360).
ContactCenterSolutions: Pricing both for the software and total cost of deployment has been a key issue with speech rec deployment. What trends are you seeing there? Are the costs and install time coming down and if so by how much (percentage/months) and if so what is driving them?
GS: To ensure our clients are able to achieve the lowest possible cost for deploying and operating speech applications, Tellme offers several flexible pricing options, and many of Tellme’s clients have decided to work with Tellme specifically because of these models. These pricing models include:  per minute, per subscriber, per automated or partially automated call and per port. Additional variants of these pricing models exist, but over 90 percent of our current customers have chosen per-minute or performance based pricing models. 
This usage-based pricing helps bring down the cost of deploying and operating speech solutions by minimizing up-front capital expenditures and eliminating costly maintenance and upgrade fees. Additionally, because our on-demand platform is open-standards based, companies can quickly develop and deploy very rich solutions, without the hassle of purchasing, provisioning, and integrating new hardware platforms.
ContactCenterSolutions: What's new and coming down the pike with your speech tools?
GS: Tellme continues to drive momentum and significant interest in the adoption of the Microsoft speech engine. In 2009, we answered over 1.3 billion calls (nearly 50 percent of our total annual traffic) on this engine, and our clients are observing significant improvements in recognition accuracy, automation, and task completion across their applications. The average task completion improvement when moving to the new engine has been three percent on average, out of the box. 
The close relationship between the delivery and R&D teams within the Speech at Microsoft group allows us to continually influence the evolution and enhancement of the speech engine to best meet the evolving needs of our customers. As a result, clients on the Tellme platform can expect:
Continuous improvement in recognition accuracy based on unmatched volume of caller data.
Faster release cycles: upgrades made to the core speech engine are made available immediately to Tellme clients, including core algorithm improvements, acoustic model tuning updates, and tools releases.
A virtuous feedback loop that fuels additional research and advances in speech processing technology
In the next year, Tellme customers can expect to receive enhanced capabilities for speech-enabled outbound (an improvement over the rich speech services already available on the platform), new performance optimization tools to help customers improve task completion rates, continued core engine improvements and the launch of cloud-based routing and queuing services to optimize contact center resource utilization. We will also debut technology to open up our speech platform to more channels, enabling speech interfaces for mobile, online, and other devices.

Brendan B. Read is ContactCenterSolutions’s Senior Contributing Editor. To read more of Brendan’s articles, please visit his columnist page.

Edited by Amy Tierney