Proactive Attentive Support System

June 22, 2017 | Autor: Alfons Salden | Categoria: Video Surveillance, Support System
Share Embed


Descrição do Produto

Proactive Attentive Support System Johan de Heer, Sorin M. Iacob, Alfons H. Salden, Wouter B. Teeuw Telematica Instituut Enschede, The Netherlands {johan.deheer, sorin.iacob, alfons.salden, wouter.teeuw}@telin.nl

Abstract—In this paper we describe our endeavour of developing a Proactive Attentive Support System (PASS). We explain that PASS is bio-inspired and can be grounded on mathematical physics. We demonstrate its successful applicability in the video surveillance domain.

I.

INTRODUCTION

At the Artificial Cognitive Systems group 1 of the Telematica Instituut [http://aims.telin.nl] we are looking at how models for natural intelligence proposed by cognitive scientists can inspire new computational models of artificial intelligent systems, and how the latter models can be utilized in a societal relevant context. The approach we take is studying biological phenomena of natural cognition and how models giving accounts for those phenomena can be exploited by partly embedding, embodying and sustaining them by cognitive artificial intelligent systems. Further, we also study to which extent these artificial intelligent systems can be applied to future information society problem areas and what are their socio-economic implications within various application domains. F

F

In our vision we do not believe that cognitive artificial systems will come about just by a simple translation of cognitive research results for biological systems into manually handcrafted knowledge-based information, computing and communication technologies. Rather, we plea for a new research paradigm for developing artificial intelligent systems, in which biological cognitive systems and artificial intelligent systems co-evolve, interact and are aligned for the purposes of various problem domain areas. We believe that this research paradigm shift will boost cross disciplinary cognitive and computer science research resulting in new scientific community networks, will encourage new research areas, and will strengthen the scientific and methodological foundations of cognitive sciences. In the remainder of this paper we take you on our journey to develop our Proactive Attentive Support System (PASS). Its exploration illustrates the beginnings of our vision becoming reality. In essence, PASS takes a specialized task, namely focussing on relevant information in huge amounts of data (selective attention) and making sensible predictions in the course of action (anticipation). Finally, we sketch an example 1

The authors are a cognitive psychologist, a computer scientist, a physicist, and a business developer, respectively. They are listed in alphabetical order since they contributed equally to this paper.

in the surveillance domain in which PASS pre-selects and guides a multi-camera system to attend to objects (left luggage) in its video image (a train station) and to anticipate alarms if needed. In latter cases the selected images are presented to a human observer for further inspection. This illustrates how PASS reduces the cognitive load of a human observer. Our paper is organized as follows. In section 2 we examine how humans filter and select information to act upon [1, 2]; and we argue that PASS follows a similar strategy in performing these processes. In section 3, we show that PASS can be given mathematical-physical grounding [3, 4]. The proposed framework illustrates the experimental and theoretical complexities we have to face and tackle to ‘translate’ cognitive research results into applicable technologies. In section 4, following this framework of previous section we describe a PASS architecture [5, 6] that it is capable of selective visual attention and anticipation comparable to that of a human visual system. Finally, in section 5 we show case the applicability of PASS [8] for the video surveillance domain. II.

PASS IS BIO-INSPIRED

In [1, 2] we described an Attention Selection Model (ASM), which has its roots in various psychological and biological models of attention. This ASM model explains why, how, and when humans selectively attend to objects in their visual field of view. We investigated selection probability distribution functions over objects that are visually and/or cognitively distinctive/salient/conspicuous vis-à-vis their contextual elements. ASM builds on various models of attention viz. Filter Theory [8][9]; Response selection theory of attention [10]; Capacity Theory [11]; Resource Theory [12][13][14][15]; Treisman’s theories of attention [16][17] and the Feature Integration Theory (FIT) [18]; Van der Heijden’s [19], Allport’s [20] and Neumann’s [21] unlimited capacity to process theories; and computational theory of attention [22].

where or what level) in a visual scene is a cue for an effectiveness response (selection-for-memory). In other words, the visual system seems to select the most conspicuous item in a visual scene.

LCM

• Information

Eye

IN ICD

Figure 1. Figure 1: Information flows in ASM

Attention selection conform the ASM-model can be specified as follows: visual information enters the input map (IN) and is further relayed to two separate modules that undertake specialized tasks, i.e., the Identity Conspicuity Domain (ICD) mapping and the Location Conspicuity Map (LCM) (see Figure 1). The LCM provides a saliency map and follows a winner-takes-all (WTA) procedure that detects the most conspicuous location in the visual scene. The most conspicuous location is based on grounded spatial relations between objects in the visual scene, i.e., where-relations. For the ICD configuration it is proposed that also a similar WTA procedure localizes the most meaningful objects in the visual scene. Just like the LCM the ICD is assumed to be a saliency domain mapping. In ICD object conspicuity is based on the identity relations for and between the meaningful objects, i.e., the what-relations. The most conspicuous object vis-à-vis its contextual objects can be simultaneously selected at the where (early) or what (late) level. At the LCM object conspicuity is based upon crude sensory features, such as contrast, brightness, color, outline, size, shape, movement, etc. vis-à-vis their contextual items. At the ICD object conspicuity is based on the (lack of) fit of it’s meaning with its surroundings - their placement vis-à-vis their context contrasts with expectations. There are two feedback loops, (1) one from the location conspicuity map (LCM) to the input map (IN), and one from the identity conspicuity domain (ICD) to the input map (IN). The operation of the feedback loops can be regarded as selective attention triggered by position information and / or by identity information. The map of locations and the domain of identities are the sources of the fields of attention. Location information - if fed back to the input module - and identity information - if fed back to the input module – makes operational or is attention. We therefore proposed that the feedback loops select the localized and identified information for memory storage and action. We also investigated the structural and strategic aspects of our ASM-component within visual systems. We posited two hypotheses that concern our ASM-model with respect to exploratory and goal-directed visual searches: •

In the case of unintentional or exploratory acts we assume that the most conspicuous object (either at the

In the case of goal-directed or intentional acts, the subject is assumed to be in control of attaching weights to individual items that make up a scene irrespective of their conspicuity (that is conspicuity based on the where or what item relationships). In principle, the subject is able to overrule the default value of the system in the case of intentional acts (selection-foraction).

Therewith, our ASM-model has structural as well as strategic aspects. Here ‘structural’ refers to the processing or exploratory capabilities of the visual system, whereas ‘strategic’ refers to its external or internal goal-directed control capabilities. In sum, this ASM model explains why, how, and when humans selectively attend to objects in their visual field of view (see [1], for more details and experimental results). III.

PASS GROUNDED ON MATHEMATICAL-PHYSICS

In [3, 4], we departed from our cognitive scientific perspective [2] and built a case for what we called then Natural Anticipation and Selection of Attention (NASA). We described why, when and how exploratory and goal-directed acts by humans could be controlled while optimizing the changing and limited structural and functional capabilities of multimodal sensor, cognitive and actuator systems. In addition, we made explicit how NASA is embedded and embodied in what we denoted as Sustainable Intelligent Multimodal Systems (SIMS) such that humans and systems can interact taking their own and environmental contexts into account. We argued [3, 4] that how to realize NASA (pre-) schemes in cybernetic systems is a problem that is almost not ever adequately tackled in computational or cognitive science. We suggested that in the Nobel Lecture of Ilya Prigogine [23] addressing perception-decision-action problems, and the seminal works of Roger Penrose and Stuart Hameroff [24] on consciousness and quantum computation in the brain, there’s a lot of inspiring material in finding answers to this question. Equivalent with Salden [25-27] these authors stress the importance of unravelling physical laws and symmetry breaking mechanisms before one may even think of reaching any sensible levels of consciousness (awareness), understanding or intelligence. Based on this line of reasoning we proposed in [3, 4] our framework by applying an appropriate dynamic scale-space paradigm. This paradigm provides a robust statistical physics grounding and extension to connectionist models as we earlier suggested in [1, 2].

function for data pre-processing, and an optimisation strategy, such that the appropriate subsets of incoming (i.e. perceived) data are selected for further analysis. The efficiency of attention selection mechanism is indirectly estimated through a utility measure that shows the degree to which the selected information contributes to task completion. In principle, this utility measure can be further used by the cognitive system for fine-tuning its pre-processing algorithms and optimisation strategy.

Figure 2. Figure 2: SIMS enacting itself and its environment

We discerned in our SIMS architecture various functional system components and relations between them, including feed forward and feedback loops (see Figure 2). The feed backward and forward loops appear as control streams, respectively, originating from the multimodal dialogue decision and planning system such that it can manage resources in line with our NASA (pre-) schemes. In particular we focused on the functions of the multimodal dialogue decision and planning component. This component looks after reinforcement learning and self organization. During reinforcement learning the fitness and utility of NASA (pre-) schemes are assessed, memorized and selected for possible action. Subsequent to selforganization NASA pre-schemes are available to contextualization (pre-) schemes, disambiguation (pre-) schemes; indexing, retrieval, querying, association, and inference (pre-) schemes. On the basis of the input streams and embedded and embodied NASA (pre-) schemes appropriate explorative and goal-directed multimodal dialogue decision and planning acts can be launched that serve the multimodal and multi-actuator fission components. Therewith, our SIMS architecture can orchestrate, gauge and renormalize in an intelligent way the SIMS components in compliance with various usage contexts, keeping in mind the users, environmental and multimodal dialogue systems’ intentions and their foci of attention as well as their capacity and capability constraints. Further details regarding the mathematical physics the reader is referred to [3, 4]. IV.

PASS ARCHITECTURE

In [5, 6] we designed a simplified architecture for attention and anticipation. As we argued, attention (or rather selection of attention) is one of the most fundamental mechanisms that allow cognitive systems to interact in an efficient way with their environment [28]. Through attention, cognitive systems are able to limit their complex processing effort to that information that is immediately relevant for completing their tasks. This definition assumes a certain functional model (typical for cognitive systems) in which task completion depends on collected environmental data, and the interaction is conducted in cycles of perception-action. It also assumes that the cognitive system possesses some short-term memory, a fast

Anticipation allows the cognitive system to optimally achieve a complex goal involving completion of multiple (possibly concurrent) tasks. The essential difference from attention is the required global optimisation over multiple components of the cognitive system, and that over a larger period of time. An interpretation of this global optimisation is that the system is steered to a desired future state based an empirical model of previous systems and environmental states given the current system and environmental state. In Figure 3 a particular implementation of attention and anticipation is presented. An important assumption made here is that environment’s state varies smoothly, and its changes between two consecutive cycles of perception and action are finite. The selection of attention works as follows. The input data set acquired through perception is processed on two parallel pathways, pre-attentive and post-attentive.

Figure 3. Figure 1. Architecture for attention and anticipation

The pre-attentive pathway may require only a uniformly sampled subset of input data, which is stored in the “short-term memory”. Although straightforward, such a uniform sampling must ensure that the sampled data preserves the topological properties of the full data set. The “feature extraction” algorithm detects the subsets that possess a given (simple) property. All these subsets are then analysed by the selection function, which assigns to each of them a priority. The subset with the highest priority is also extracted by the “subset extraction” function from the corresponding original input data passed to the post-attentive analysis pathway. The subset analysis function detects the presence of more complex patterns, and produces a utility measure indicating the strength of the detected patterns. This value together with the selected subset is used by the reinforcement learning function for adjusting the simple feature extraction algorithm. The pattern recognition function calculates a distance between known target patterns and those detected by the subset analysis

function. A second utility measure is derived from the reliability of recognition, and used for locking the attention until the expected pattern is reliably detected. To avoid possible stable states, the anticipation algorithm monitors the realization of the global optimum, and overrides, as needed, the focus or fixation of attention. Appealing for sure, it brought us from bio-inspiration and empirical findings via mathematical grounding to design, in which a simplified version of selective attention and anticipation is suggested. V.

Delay

Ca Stable Changes Detector

Cb

Region-of-interest selection

Calibration interface Coordinates quantization and mapping Event detection a (commercial package)

PASS APPLIED AND EVALUATED

The objective of the PASS demonstrator in the surveillance domain [7] is to prove that PASS reduces the computational load of a certain event detection system, while maintaining the surveillance detection performance. The surveillance case involves left luggage detection on railway platforms (Figure 4).

Figure 4. Figure 4. A real setting

A static camera (a in Figure 4) overviews the whole observation area of approximately 20 x 50 meters, and detects some scale-invariant features (e.g. static local changes, possible faces, etc.). A Pan Tilt Zoom (PTZ) camera (b) is directed successively to all detected regions, according to an optimization criterion. The high-resolution video from the PTZ camera is further analyzed with dedicated algorithms, for the detection of unattended luggage. In this experimental setting the entire surveillance area is divided into four regions of interest (ROIs). The static video camera captures a video of the surveillance area, while PASS detects and orders conspicuous (coherent and stable) changes within all ROIs. On the basis of this analysis PASS instructs the PTZ camera to direct to each quadrant in the surveillance area (for convenience, each quadrant is associated to one ROI) in an order defined by the saliency of changes occurring in the ROIs. Subsequently, a commercial software package running on the PTZ video camera system can store and analyse possibly four different views of the surveillance area quadrants, but can also be requested to run the same set of left luggage detection rules for one of the views needing the foremost attention given the saliency of the changes occurring in the corresponding ROI. There are two major software components: the eventdetection software, and the pre-analysis software and camera control.

Data flow in operational mode Data flow in setup mode

Figure 5. Figure 5. Surveillance SW architecture

PASS is slightly modified to accommodate the requirements of our particular surveillance case. The video data from the static camera (Ca) is analyzed by the Change Detector, which produces a set of regions where possible objects of interest (unattended luggage in this case) are detected. A selection process ensures that only the static regions will be considered further. Next, the coordinates of all remaining regions are quantized to determine the quadrant where they belong to. The PTZ camera is then directed successively to the quadrants where possible objects of interests were detected. A simple prioritization scheme is sufficient in this case, i.e. the moment of time when a stable change did occur. After the PTZ camera has changed position, the PASS control mechanism should wait for an amount of time that is sufficient for event detection software to detect the current view of the PTZ camera, and to produce an alert if an event was detected. This delay is specified by the user during surveillance system setup. The event detection software uses one video sensor (supplied by the PTZ camera), for which four different reference views were previously recorded, one for each quadrant of the surveillance area. In order for these four images to overlap with the four quadrants in the image of the static camera, an alignment (or calibration) procedure is performed during system setup. The camera control software is implemented as a Microsoft DirectShow filter, which is in principle a multi-threaded DLL with DCOM-style interfaces. . The main outcome of our experiment is that a lowresolution camera system in combination with PASS suffices to increase the computational efficiency of a distributed surveillance system consisting of multiple high-resolution PTZ video camera systems each remotely running a commercial software package for left luggage detection. In our experiment the low-resolution camera system and PASS performed the change detection task for the four different surveillance area quadrants. The computational complexity of the left-luggage detection algorithm run by the event detection software on our single PTZ camera system in each view is exactly the same as

in the case of one view per PTZ camera in a distributed surveillance system setting. Although, additional computational load is generated by the change detection performed by PASS, this is relatively negligible for the lowresolution video used in our experiment (320 x 240 pixels, at 25fps). Even at higher resolutions the computational complexity can be kept within reasonable limits, by lowering the frame-rate. In principle, the surveillance area can be divided into a larger number of ROIs for the stationary low-resolution overview camera running PASS. Consequently one could replace all static surveillance cameras by one high-resolution PTZ camera. This way an even higher computational and financial cost reduction can be achieved. VI.

CONCLUSIONS AND FUTURE WORK

We took you on our journey to develop the Proactive Attentive Support System (PASS). We studied an empirical model for human visual selection of attention and anticipation. Subsequently, we pointed out how cognitive science models can be described by a mathematical physics framework. Accordingly we designed a system architecture for selective attention and anticipation, and showed how PASS reduces and distributes the load of a surveillance application over remotely running camera systems on which commercial software is deployed to detect e.g. left luggage in public areas, and to alarm e.g. security personnel. In short, explanatory models of natural intelligence proposed by cognitive scientists can inspire new computational models of artificial intelligent systems, and the latter systems can be exploited in a societal context. REFERENCES [1] [2]

[3]

[4]

[5]

[6]

[7]

[8] [9] [10] [11] [12] [13] [14]

[15]

[16] [17] [18]

[19] [20]

[21]

[22]

[23]

De Heer, J., The Attention Selection Model, PhD thesis, Tilburg University, The Netherlands, Feb 2001. De Heer, J. and Salden, A. H., “Intelligent user interfacing requires Attention Selection Modeling” submitted to IUI2007, the International Conference on Intelligent User Interfaces. Salden, A.H. and de Heer, J., “Natural Anticipation and Selection of Attention within Sustainable Intelligent Multimodal Systems by Collective Intelligent Agents,” In The 8th World Multi-Conference on Systems, Cybernetics and Informatics (SCI 2004), July 18-21, 2004 Orlando, Florida, USA. Salden, A.H., and Kempen, M.H., “Sustainable Cybernetics Systems Backbones of Ambient Intelligent Environments”, In P. Remagnino, P., Foresti, G.L., and Ellis, T., (eds.), Ambient Intelligence, Springer, November 2004 (see also Salden, A. H. and de Heer, J., “Mathematical Physics Framework Sustaining Natural Anticipation and Selection of Attention,” In Journal of Systems, Cybernetics and Informatics, as WMSCI 2004 best-paper, in press). Salden, A.H. and Iacob, I.M, Special Session on “Embodiment of Anticipation and Attention in Cybernetic Systems” at the IEEE International Conference on Systems, Man and Cybernetics, The Hague, Netherlands, October 10-13, 2004. Iacob, I.M. and Salden, A.H., “Attention and Anticipation in Complex Scene Analysis – An Application to Video Surveillance Systems”, In IEEE International Conference on Systems, Man and Cybernetics, October 10-13, 2004, The Hague, Netherlands. Iacob, I.M., Salden, A.H., and de Heer, J. “The PASS demonstrator: Detailed design and experimental results.” Enschede, 2006. The Netherlands. https://doc.telin.nl/dscgi/ds.py/Get/File64559/INES_PASS_D3.pdf . H

H

[24]

[25]

[26]

[27] [28]

Broadbent, D.E. (1958). Perception and Communication. London: Pergamon Press. Broadbent, D.E. (1971). Decision and Stress. London: Academic press. Deutsch, J.A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall. Navon, D. and Gopher, D. (1979). On the economy of the human processing system. Psychological Review, 86, 214-255. Norman, D.A. and Bobrow, D.G. (1975). On data-limited and resourcelimited processes. Cognitive Psychology, 7, 44-64. Wickens, C.D. (1980). The structure of attentional resources. In R.S. Nickerson (Ed.), Attention and Performance 8, New York: Academic Press. Wickens, C.D. (1984). Processes resources in attention. In R. Parasuraman and D.R. Davies (Eds), Varieties of Attention. New York: Academic Press. Treisman, A..M. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental Psychology, 12, 242-248. Treisman, A..M. (1964). Verbal cues, language, and meaning in selective attention. American Journal of Psychology, 77, 206-219. Treisman, A.M. (1988). Features and objects: The XIVth Sir Frederic Bartlett memorial lecture. Quarterly Journal of Experimental Psychology, 40A, 201-237. Van der Heijden, A.H.C. (1992). Selective Attention in Vision. London: Routledge and Kegan Paul. Allport, D.A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer and A.F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum. Neumann, O. (1987). Beyond capacity: A functional view of attention. In H. Heuer and A.F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum. Itti, L., Koch, C., and Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and machine Intelligence (PAMI), 20(11), p. 1254-1259. Prigogine, I., Time, structure and fluctuations, Nobel Lecture, 8 December 1977. Penrose, R. and Hameroff, S., “Quantum computation in brain microtubules? The Penrose-Hameroff “Orch OR” model of consciousness,” Philosophical Transactions Royal Society London (A), Vol. 356, 1998, pp. 1869-1896. Salden, A.H. and Kempen, M., “Sustainable Cybernetics Systems Backbones of Ambient Intelligent Environments”, In P. Remagnino, G.L. Foresti and T. Ellis (eds.), Ambient Intelligence, Springer, November 2004. Kempen, M. and Salden, A.H., The Way Forward for Cognitive Environments; Improvement requirements for use in the design process of mobile applications and services, In Proceedings of 11th International Conference on Human-Computer Interaction HCI International 2005, Las Vegas, USA, July 2005. Salden, A.H., Dynamic Scale-Space Paradigms, Ph.D. Thesis, Utrecht University, The Netherlands, 1996. Pylyshyn, Z., "Some Primitive Mechanisms of Spatial Attention," Cognition, vol. 50 pp. 363-384, 1994.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.