Most voice systems fall into a number of well-defined categories. The classic applications are audiotex, automated attendant, voice mail and interactive voice response. More recently we have seen fax messaging, and fax-on-demand added to the list of standard configurations. Most of the available software development tools cater well for these traditional applications.
However, there has always been a minority of voice systems that are awkward, if not impossible, to develop using the ordinary development tools. Some require features that the normal development platforms do not provide, such as audio file analysis, PABX protocols, inter-channel communications, and vocabulary switching. Others need to integrate with third-party software to provide features such as speech recognition, text-to-speech, database searches and facsimile analysis. Still others need to run with state-of-the-art hardware such as ISDN interfaces, MVIP & SCSA digital buses, conference bridges and Group 4 fax modems. Systems such as these require bespoke programming work.
The demand for bespoke systems does not appear to be diminishing. On the contrary, I expect its share of the market to grow. New kinds of system such as Computer Supported Telephony and Integrated Messaging are getting ever more complex. Management will look for ways to leverage their investment in voice systems by integrating them with the rest of their IT strategy. And as the standard applications become packaged products and the network operators offer competing services, the voice processing trade will look for new ways to add value to their sales.
As voice processing applications get larger and increasingly complex, we can expect their scripts to look more like conventional software programs. Many of the issues that need to be considered when developing computer software will come to the fore when developing voice systems. These issues include source integrity, version control, structured programming techniques and conformance to open system standards (both in software and hardware). We are already seeing a massive migration from proprietary hardware to PC-based hosts. As voice systems get larger, the underlying operating system will need to be upgraded from DOS to more sophisticated platforms such as UNIX, OS/2 and Windows NT. The development language is likely to change from proprietary script languages and application generators to C, the dominant software language for technical systems.
A mature programming language like C has many advantages over proprietary solutions. For a start it will give you a choice of development tools, such as compilers, lexical analysers, debuggers and editors. Almost all other software that you might want to integrate into your system is written as a C library. That means that you can add database libraries (to access any type of database), networking libraries (to communicate with remote computers), windowing libraries (to create a graphical user interface) and communications libraries (to connect to a PABX call logging port). You can also directly control fax modems, speech recognition devices, text-to-speech software and practically anything else you care to name. All this gives you more control and flexibility than any other method, whilst safeguarding your development effort from obsolescence.
If you take the approach of developing a system in C rather than with a voice application generator or script interpreter, you will need a C voice library. These give your program the capability to control voice processing resources in much the same way as the proprietary tools do. Some commercial voice libraries go much further and incorporate other software useful for building bespoke voice and fax systems. The remainder of this article describes the features that you should look for in products of this type.
Dialogues are structured conversations which achieve a high-level goal common to many applications. There are three main types of dialogue: those that convert information into speech (e.g. speaking a number), those that gather information from the user (e.g. collecting a telephone number) and those that get a decision from the user (e.g. a menu of available choices). You should definitely look for dialogues that can speak and gather all the basic data types such as numbers, currency amounts, dates, times and telephone numbers. There are two main types of decision that you normally want from the user: a yes/no decision and a choice from a menu of available options. Dialogues to handle menus should be flexible enough to allow your program to selectively enable each menu item. This means that the list of options spoken to the user must be dynamically composed by the dialogue code.
Dialogue libraries need to have many options and settings so that they can accommodate any conversational style. For example, calendar dates may need be spoken with or without the day of the week. Similarly telephone numbers can be spoken as local, national or international codes with or without a spoken country or exchange name. Dialogues need different scripts for a variety of methods of data input including voice recognition, grunt detection and DTMF.
Each human language requires a different set of dialogue functions to implement its grammatical rules. For example, the way numbers are spoken in English is quite different from the way they are spoken in Japanese. What is not so apparent is that there are significant differences between British English and American English.
Dialogue libraries contain hundreds of prompts. Just to speak numbers requires a large number of audio segments. These need to be extremely carefully recorded to have the correct stress and to match each other in volume and pitch. They then need to be edited with the correct amounts of silence so that the concatenated speech sounds completely natural.
Many applications need to work with telephone and fax numbers. For example, a typical fax-on-demand application asks the user to dial in his fax number. This number is then dialled and a fax transmission is attempted. Most systems merely treat the entered fax number as a sequence of digits and make no attempt to interpret it. If they speak the number back to the user, it is generally spoken without intonation, which is hardly user-friendly. Systems that do not block unsuitable telephone numbers are exposed to a variety of frauds (e.g. dialling a premium rate service) or pranks (e.g. dialling the emergency services). The solution to both problems is to integrate into your software a knowledge of the telephone numbering plan. The dialogue library can use this information to speak telephone numbers with gaps in the appropriate places and to enter telephone numbers without waiting at the end or requiring a termination digit.
Most voice development systems only permit a script to control one telephone line. This is adequate for many applications including audiotex and interactive voice response. However, more complex applications need to control two or more telephone channels, either simultaneously or consecutively. This enables them to set up a second leg for a telephone call while still providing service to the first. The most logical way to implement fax-on-demand, for example, is for the script that took the order to continue running to supervise the fax transmission. It could even be enhanced to ring back the original caller to confirm the fax delivery. These and many other scenarios require a system architecture in which the tasks that run the scripts are independent from the telephone channels and the voice processing resources.
It is also often necessary for scripts controlling different calls to share data to provide services such as call queueing, predictive dialling and automatic call distribution.
With the arrival of digital buses and digital matrix switches, voice systems are increasingly being expected to perform all the channel switching instead of using an external PABX. The system needs to manage the switching and endpoint resources in the system to provide exclusion between the tasks. To do this it needs to build a software model of the system based on the hardware installed.
Bespoke systems are likely to incorporate multiple hosts, probably connected over a network. The software that runs the application script may not be located in the same computer that drives the voice processing resources. This means, besides all the other changes that voice processing systems are undergoing, they must run with a client-server architecture as well.
The above is only a small selection of the tools available in commercial voice libraries. You may need the following features:
You should also expect the following:
As the market matures, voice and fax processing applications are becoming mainstream computer systems. All the management issues that apply to software engineering will apply to voice processing as well. Just as in the software industry, the goal posts are moving all the time. Products that led their market one year can be well down the pack the next year unless constantly enhanced. Systems must be maintainable, future-proof and capable of unlimited improvement. Developers need all the help they can get to build such systems economically and quickly.