SYNERGISTIC MULTIMODAL COMMUNICATION
IN COLLABORATIVE MULTIUSER ENVIRONMENTS

J. Flanagan, G. Burdea, C. Kulikowski, I. Marsic, P. Meer, and J. Wilder

Center for Computer Aids for Industrial Productivity (CAIP)
Rutgers University

CONTACT INFORMATION

James Flanagan
CAIP Center
Rutgers University
Piscataway, New Jersey 08855-1390
Phone: (732) 445-3443
Fax : (732) 445-0547
Email: jlf@caip.rutgers.edu

WWW PAGE

http://www.caip.rutgers.edu/multimedia/multimodal/

PROGRAM AREA

Adaptive Human Interfaces.

KEYWORDS

multimodal interfaces, gesture recognition, intelligent agents, sensory fusion, collaborative multiuser environments, conversational interaction, image understanding, gaze tracking

1997 ANNUAL PROGRESS REPORT

You can download a brief summary of achievements attached to the 1997 Annual Progress Report. (7MB Postscript).

PROJECT SUMMARY

Motivation Advances in networking and computing open new opportunities for collaborative work by geographically-separated participants. The challenge is to employ technology to extend human intellectual capabilities. Success depends upon natural, easy human/machine communication, and upon strategies that permit the machine to serve as a "value-added mediator." Technologies for human/machine communication, though imperfect as yet, have individually advanced sufficiently that they can serve simultaneous, multimodal communication in computer interfaces. The aim is to emulate features and advantages of multisensory human communication.

Objective This research establishes, quantifies, and evaluates design methodologies for the synergistic combination of human/machine communication modalities in collaborative multiuser environments.

Method This research creates a multiuser, collaborative environment with multimodal human/machine communication in the dimensions of sight, sound and touch. The network vehicle (called DISCIPLE, for Distributed System for Collaborative Information Processing and Learning) is an object-oriented groupware (presently evolving under DARPA sponsorship) running on the Internet TCP/IP as well as Asynchronous Transfer Mode (ATM) intracampus network.

At three user stations, CAIP-developed technologies for sight (eye-tracking, foveating sensing, image and face recognition), sound (automatic speech and speaker recognition, speech synthesis, distant-talking autodirective microphone arrays) and touch (gesture and position sensing, force-feedback gloves, and multitasking tactile software) are integrated into DISCIPLE for simultaneous multimodal use. The system so constituted provides a test bed for measuring benefits and synergies. With participation from cognitive science and human-factors engineering, a realistic application scenario is designed to evaluate combinations of modalities and to quantify performance.

Application scenarios that might be served by the system embrace activities as disparate as collaborative design, cooperative data analysis and manipulation, battlefield management, corporate decision making, and telemedicine. An initial experimental scenario is chosen to encompass ingredients of these collaborative tasks. The experimental scenario is based on the design, layout and equipment acquisition for a digital signal processor laboratory. Subjects are sets of three collaborators who are to share and work in the facility. Measurements of the time to achieve a satisfactory solution and the quality of the solution (as judged by a technical panel) quantify the utility of multimodal communication.

Significance This research formulates and establishes methods for designing networked computer systems for multiuser collaborative tasks in which multimodal human/machine communication produces a demonstrable benefit. An additional impact of the research is the graduate training of four Ph.D. candidates in this newly-emerging field.

PROJECT REFERENCES

G. Burdea, Force and Touch Feedback for Virtual Reality, John Wiley & Sons, New York, 1996.

D. Comaniciu, P. Meer, "Robust Analysis of Feature Spaces: Color Image Segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'97), San Juan, Puerto Rico, pp.750-755, June 1997.

J.L. Flanagan, "Technologies for Multimedia Communications," Proceedings of the IEEE, Vol.84, No.4, pp.590-603, April 1994.

J.L. Flanagan, "Multimodality", In R. Cole and J. Mariani (editors), Survey of the State of the Art of Human Language Technology, Chapter 9, National Science Foundation and Directorate General XIII of the European Commission, Cambridge University Press, pp.277-300, 1996.

J.L. Flanagan and E.-E. Jan, "Sound Capture with Three-Dimensional Selectivity," Acustica, (in press)

J.L. Flanagan and I. Marsic, "Issues in Measuring the Benefits of Multimodal Interfaces," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), Munich, Germany, April 1997.

L. Gong and C.A. Kulikowski, "Composition of Image Analysis Processes Through Object--Centered Hierarchical Planning," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.17, No.10, pp.997-1009, October 1995.

N. Langrana, G. Burdea, K. Lange, D. Gomez, and S. Deshpande, "Dynamic Force Feedback for Virtual Knee Palpation", Artificial Intelligence in Medicine, Vol.6, pp.321-333, 1994.

Q. Lin, C.-W. Che, D.-S. Yuk, and J. L. Flanagan, "Robust Distant Talking Speech Recognition," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), Atlanta, GA, pp.21-24, May 1996.

E. Pere, D. Gomez, G. Burdea, and N. Langrana, "PC-Based Virtual Reality System with Dextrous Force Feedback," Proceedings of the ASME Winter Annual Meeting, Atlanta, GA, pp.495-502, November 1996.

J. Wilder, P.J. Phillips, C. Jiang, and S. Wiener, "Comparison of Visible and Infra-Red Imagery for Face Recognition," Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition, Killington, VT, pp.182-187, October 1996.

A. Shaikh, S. Juth, A. Medl, I. Marsic, C. Kulikowski and J.L. Flanagan, "An Architecture for Fusion of Multimodal Information" Workshop on Perceptual Interfaces, Alberta, Canada, October 20-21, 1997, (submitted).

AREA BACKGROUND

Collaborative work is a hallmark of human achievement. As societies become global, knowledge-work conducted by geographically separated collaborators can be supported by the technologies of networked computing and communications. The machine system now becomes a "mediator" in human cooperation, and it has the opportunity to enhance human intellect and expand the ability for collaborative work.

For information exchange and joint decision-making, humans typically depend upon the dimensions of sight, sound and touch -- used simultaneously, and in combination. Emulation of this natural "multimodal" communication promises comfort and ease-of-use in collaborative systems. While component technologies for human-machine communication are, as yet, imperfect, they are sufficiently advanced that with engineering prudence and with intelligent software agents they can be employed to human benefit.

But, the design of a multimodal collaborative system also depends on other factors, encompassing the human user, the task domain, the environment and the intellectual context. To produce an optimally effective system, the total elements must be considered in an integrated design. The nature of these varied parameters suggests an interdisciplinary effort, combining cognitive and social sciences with computer science and engineering.

The purpose of this research, therefore, is to establish a new science of design and its methodology for engineering human-centered multimodal collaborative systems.

AREA REFERENCES

M.M. Blattner and E.P. Glinert, "Multimodal Integration," IEEE Multimedia, Vol.3, No.4, pp.14-24, Winter 1996.

A. Waibel, M.T. Vo, P. Duchnowski, and S. Manke, "Multimodal Interfaces," Artificial Intelligence Review, Vol.10, No.3-4, 1995.

P.R. Cohen, L. Chen, J. Clow, M. Johnston, D. McGee, J. Pittman, and I. Smith, "Quickset: A Multimodal Interface for Distributed Interactive Simulation," Proceedings of the UIST'96 demonstration session, Seattle, 1996.

D.B. Moran, A.J. Cheyer, L.E. Julia, and D.L. Martin, "The Open Agent Architecture and its Multimodal User Interface," Proceedings of the 1997 International Conference on Intelligent User Interfaces (IUI97), Orlando, Florida, pp.61-68, January 1997.

C. Mignot and N. Carbonell, "Oral and Gestural Command: An Empirical Study," Techniques et Sciences Informatiques, Vol.15, No.10, pp.1399-1428, 1996.

C.L. Bajaj and V. Anupam, "SHASTRA -- An Architecture for Development of Collaborative Applications," International Journal of Intelligent and Cooperative Information Systems, Vol.3, No.2, pp.155-172, 1994.

D.B. Roe and J.G. Wilpon, Voice Communication Between Humans and Machines, National Academy Press, Washington, DC, 1994.

V.I. Pavlovic, R. Sharma, and T.S. Huang. "Visual interpretation of hand gestures for human-computer interaction: A review," IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.19, No.7, July 1997.

RELATED PROGRAM AREAS

Speech and Natural Language Understanding.

Other Communication Modalities.

Usability and User-Centered Design.

Intelligent Interactive Systems for Persons with Disabilities.

Virtual Environments.

POTENTIAL RELATED PROJECTS

Justine Cassell (MIT): A Unified Framework for Multimodal Conversational Behaviors in Interactive Humanoid Agents.

Jerry Hobbs and Andrew Kehler (SRI International): Multimodal Access to Spatial Data

Barbara Grosz and Stuart Shieber (Harvard University): Human-Computer Communication and Collaboration

Francis Quek and Rashid Ansari (University of Illinois at Chicago): Gesture, Speech and Gaze in Discourse Management

Thomas Huang and Rajeev Sharma (The Beckman Institute): Vision-based Hand Gesture Analysis in a Multimodal Interface for Controlling Virtual Environments