Anthropomorphism as a pervasive design concept for a robotic assistant

June 1, 2017 | Autor: Christian Faubel | Categoria: Object Recognition, Intelligent robots, Robot Arm, Human Perception, Motor Behavior
Share Embed


Descrição do Produto

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

PREPRINT

1

Anthropomorphism as a pervasive design concept for a robotic assistant Ioannis Iossifidis1 , Christoph Theis, Claudia Grote, Christian Faubel, and Gregor Sch¨oner Institut f¨ur Neuroinformatik, Ruhr-Universit¨at Bochum, Germany 1 phone: +49 234 3225567 Email: {jannis,theis,grote,faubel,gregor}@neuroinformatik.rub.de Abstract— C O RA is a robotic assistant whose task is to collaborate with a human operator on simple manipulation or handling tasks. Its sensory channels comprising vision, audition, haptics, and force sensing are used to extract perceptual information about speech, gestures and gaze of the operator, and object recognition. The anthropomorphic robot arm makes goaldirected movements to pick up and hand-over objects. The human operator may mechanically interact with the arm by pushing it away (haptics) or by taking an object out of the robot’s gripper (force sensing). The design objective has been to exploit the human operator’s intuition by modeling the mechanical structure, the senses, and the behaviors of the assistant on human anatomy, human perception, and human motor behavior.

1

INTRODUCTION

Industrial robots perform preprogrammed actions in a specifically prepared, highly structured environment. They have therefore little need for perception and no on-the-task interaction with human users. Robotic assistants, by contrast, are expected to share an environment with a human operator [21], [14], [13], [18], [4]. Those environments will be less structured and partially unknown. They include the human operator, with whom the assistant must interact on-the-job. When a robot assists in an assembly task or serves as a handyman, the human operator in effect “programs” the robot as the work unfolds. Two fundamental requirements arise: First, a robotic assistant must possess a certain degree of autonomy, so that its perceptual processes can obtain information about the environment including the human user. Second, robotic assistants must be programmable by the human operator, on-line and with a very intuitive user interface. Such programming requires two-way interaction, with human instructions being picked up by the robot and, conversely, feedback about the robot state and current command being provided to the user. Traditional means of communication between people and machines involve classical human computer interfaces such as keyboard-mouse-monitor or touch panels and the like. These means of interaction are not well suited to the typical task setting of robotic assistance. To instruct a robot to grasp a particular object, for instance, the user might need to specify the position of the object in terms of numerical coordinates, or by selecting an appropriate item from a menu [17], [2], [23]. This means that the

user adjusts his or her conception of the task to the concepts of the robotic system (coordinates, items in time lists, etc.). Given that robotic assistants are meant to come into play in as yet non-computerized aspects of manual work, such interfaces are clearly undesirable. Moreover, classical interfaces are inefficient as they may require the solution of difficult, but unnecessary problems (e. g., obtaining a complete scene analysis in which all visible objects are identified and located only in order to generate the selections in a menu). Furthermore, the classical interfaces interfere with the user’s own manipulation tasks as they may require the user’s hand to leave the assembly field to type or control a pointing device. The need for a natural and intuitive user interface is thus obvious. Such an interface may be based on the robot’s recognition of user instructions. For instance, the user may specify an object by naming it, exploiting object recognition capacities of the system, or by pointing at it, exploiting gesture recognition capacities of the system. To command the robot to hand over an object, it might be enough if the user grasps the object in the robot’s gripper, the sensed forces being sufficient to trigger the release. More generally, the idea is to exchange information via the naturally available meens of communication channels of people, that is speech, vision (gesture, gaze), and mechanics (touch, force) [4], [18], [21]. The autonomous robotic manipulator C O RA (Cooperative Robot Assistant) is endowed with perceptual systems enabling such interaction based on natural communication channels. C O RA is designed anthropomorphically at multiple levels. Its overall shape, size, and arrangement can be immediately recognized as human-like (see Fig. 1), and thus offers a natural setting for cooperative work with the human operator. The actuators, a pan/tilt stereo camera unit, a seven degree of freedom arm, and a one degree of freedom torso, are modeled on the mechanical structure of the human head, human arm and human torso. This makes it easy for the human user to understand the mechanical state of the robot. Because movements are generated to mimic basic properties of human movement (e. g., end-effect straightline trajectories, lifting the elbow to avoid obstacles), it is easy for the user to predict the robot’s behavior. All sensory channels (vision, audition, artificial

PREPRINT

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

Fig. 1. The service- and assistance robot C O RA. A seven DoF manipulator arm is mounted on a one DoF trunk which is fixed on a table. The robot possesses two two DoF stereo camera heads, the upper one with microphones.

skin and a force/torque sensor) have their equivalent in the human. The perceptual modules built on these channels are likewise analogous to human perceptual processes (e. g., perception of gaze direction, perception of the direction of a pointing movement). Because of these isomorphisms at the structural, sensory, perceptual and behavioral level, the intuition of the human user supports efficient cooperation with the robotic assistant. C O RA has a number of elementary behaviors such as finding and tracking the hand of a human user, detecting and recognizing objects, grasping objects, transporting objects and releasing objects. A software architecture organizes these behaviors, which are processed in parallel. More complex behavioral sequences arise from combinations of the elementary behaviors. We conceive of C O RA as a demonstration of capabilities that may find application in a number of areas in which people and robots may interact intensively embedded in natural environments, which are not specifically adapted to the robot. Such applications may include robotic assistants on the shop floor, robotic assistants in field assembly situations, service robots in a commercial or home setting, as well as toy, entertainment, or concierge robots. 2

HARDWARE

C O RA (see Fig. 1) is fixed on a table and meant to physically interact with a human sitting across the table. It has a head with two degrees of freedom (DoF): pan and tilt. The head carries a stereo color camera system and microphones. The vision system performs tasks such as object recognition, gesture recognition, the estimation of the human’s gaze direction and the estimation of the 3D position and orientation of objects. To proceed gaze direction detection parallel to the other perceptual task a second camera head is mounted beside C O RA’s trunk. C O RA’s body consists of a redundant seven DoF manipulator connected to a one DoF trunk which

2

is fixed on the edge of a table. C O RA can exploit the redundant eight DoF of the arm-trunk configuration that guarantees a high degree of flexibility with respect to manipulation tasks under external constraints. Grasping, for instance, is possible in the whole workspace choosing different arm postures without the necessity of changing the position or orientation of the end-effector. By turning the trunk joint, the robot can also change its configuration from right- to left-handed. The sensor equipment and the configuration of the joints in C O RA’s body and manipulator arm are anthropomorphic, which means that they are structurally similar to the human body. Two of the manipulator arm’s modules are covered with a touch-sensitive so-called artificial skin. By means of these sensors, the operator can correct C O RA’s arm movements, e. g. by lifting its elbow while the robot grasps an object (see [11]). A force/torque sensor is mounted on the robot’s wrist providing the ability to perform force feedback related action like putting an object onto the table, handing over objects or to be guided by the human partner. The computational power is provided by a network of five PCs with 1.46GHz Athlons running LINUX. When a human partner is sitting at the opposite side of the table, the robot and the human partner share the same eye-level. Relying on the stereo camera, the microphones and the artificial skin, C O RA uses similar sensor channels as those available to the human partner. The restriction to audition, vision and touch and the redundant configuration of the arm make high demands on the control system of the robot. The goal of our research is a robot system that can perform the following tasks: a) visual identification of objects presented by a human teacher, b) recognition of these objects from scenes with many objects, c) grasping and handing over objects, d) performing simple assembly tasks. All behaviors are realized under acoustic and haptic control by the human partner. This user interaction consists of speech commands and correction of the manipulator configuration via touch. All of C O RA’s capabilities are realized on the basis of the dynamic approach to robotics. In the following, we will describe the interaction channels realized using the described hardware. 3

INTERACTION CHANNELS

3.1 Speech The most natural way for a person to give on-line instructions to an assistant is using the most powerful human communication channel, natural speech. Conversely, speech feedback from the robot is an efficient channel through which information about the

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

state and current command mode of the robot may be passed while avoiding interference with other, ongoing activities of both user and robot. Speech input is processed by the speaker independent speech recognizer ESMERALDA [5] which we adjusted syntactically and lexically to our needs. The recognized words are identified by an ongoing process that realizes the behavioral organization. Each recognized word furnishes a condition that brings a particular behavior into execution. The appropriate behavior specified by an oral command is activated if all its conditions are fulfilled. For example, grasping behavior is only executed if an object has been found. A spoken command can also immediately terminate a running behavior at any time. Our speech interface is complemented by a speech output modul that is managed by the textto-phoneme converter hadifix [15] and the phoneme synthesizer mbrola [3]. 3.2 Arm movement The 7-degree-of-freedom arm can perform goal directed movements. This is controlled by a dynamical system formulated at the level of the endeffector’s movement direction in 3D [11], taking into account the movement goal and obstacles. The endeffector trajectory is transformed into a trajectory in 7-dimensional joint space using the analytical solution to the inverse kinematics problem [12]. Input from the haptic sensor is used to induce null-space motion, which is motion of the elbow around an axis linking the shoulder to the wrist joint of the anthropomorphic arm. For more details see [?]. 3.3 Haptics Input from the artificial skin is used to detect events when the user touches the arm. The direction on the arm rigid body is determined and is fed into the motor control algorithm to bring about motion in the null-space of the 7-degree-of-freedom arm. A force applied to the upper cuff is interpreted as an external force on the elbow. Therefore, such a force vector can be used to change the position of the elbow during a grasping trajectory. Due to C O RA’s mechanical redundancy, the elbow position can be varied without affecting the postion and the orientation of the end-effector. Controlling the position of the elbow by applying external haptic force on the artificial skin is used to teach in grasping trajectories. In the arm control dynamics, the force is directly proportional to the acceleration α¨ of the so-called elbow angle α which defines the angular position of the elbow relative to a rotation axis that goes through the wrist and shoulder [10]. The experimental results are shown in Fig. 2. In A the robot arm starts its trajectory towards the small object. To avoid a collision between the elbow and an obstacle that has not been detected by the vision

PREPRINT

A

D

3

B

C

E

F

Fig. 2. A human touches the artificial skin during grasping behavior.

sensor, the human operator touches the skin on the upper cuff of the robot’s arm ( A - D ), in order to force the robot to lift its elbow. The system detects the force, determines its magnitude and calculates the force-direction with respect to the current arm posture. On the basis of this estimate the robot starts ( B ) its elbow-movement in the direction of the detected force. In A - E the robot moves its elbow without changing the intended trajectory of the endeffector. In E the robot completes its trajectory and grasps the object ( F ). 3.4 Force In order to measure external forces that act on the gripper, the gravitational force which stems from the gripper modules’ own weight must be subtracted from the output of the force/torque sensor. This is done by computing the gravitational force and torque components for each arm posture using the geometrical model of the arm [11]. This gravitationally compensated input from the force/torque sensor is now used for several forcefeedback dependent tasks. When an external force is measured which exceeds a given threshold, for instance, during an hand-over task, the arm movent is stopped and the gripper opens to deliver the object into the operator’s hand. In other cases, based on the same kind of force sensing, the operator may manually guide the robot gripper to correct gripper position and orientation during a grasp task. 3.5 Vision Obtaining visual information about unstructured scenes is notoriously difficult. Rather than solving this general problem, we have designed relatively simple individual modules which provide low-level information about objects, gestures, or gaze, taking into account the specific task setting to make the

PREPRINT

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

4

200

180

pixel intensity

160

Fig. 3. Sequence of visual change images of a waving hand (timestep shown here is 0.1 seconds - normal reading order).

140

120

100

80

fast speed

medium speed slow speed

computer vision problem as simple as possible. It is then possible to combine several such elementary modules to achieve more advanced visual functions. For example, from two basic features in the image space, human skin color and image change, we obtain information about displacements of the hand of the human user. By extracting skin colored image areas we obtain an idea where gestural instructions may take place at all, or where the hand is located for object transfer. Because the human user often moves his or her hand in the visual field, while the rest of the scene might be largely stationary (unless objects are moved by the human user), visual change is another cue to the location of the human hand. While the visual detection and identification of objects may take some time and lead to an intermittent mode of operating, other aspects require close to real-time visual information. If reasonable manipulator speeds are used, then changes in the visual scene that may induce new obstacles (e. g., the human operator’s arm within the robot’s workspace) or shifted targets must be detected on a fast time scale. 3.5.1 Visual change detection Interaction of the user with the robot will be typically associated with motion in the workspace, leading to visual change. This is why it is a good idea to use visual change detection to control perceptual processing (see Fig. 3). To get a good and fast idea, where in the image space visual change has been we use an adaptive reference image [6]. For each pixel of the image we define an seperate range of variation for that pixel value. If the pixel value of the following captured image is inside this range, the pixel is classified as not displaying visual change. The variation range is adapted iteratively over time. The adaptation rate of each variation range border of each pixel depends on where the current pixel value is within the current range (see Fig. 4). If the value lies highly above the lower variation range border (in the following, the adaptation of the upper border has to be done vice versa), the lower border is increased. In case the value is only slightly higher the lower border is decreased. This adaptive

60

0

100

200

300

Time

400

500

600

700

Fig. 4. The intensity variation range adaptation for one pixel over time, caused by arm movements with fast (first 3 peaks), medium (2 peaks), and slow (last big peak) speed.

mechanism accomodates small and slow changes in scene luminance without triggering a change detection. If there is visual motion, then the current pixel value leaves that range of variation. Visual change is detected for this pixel. Subsequently, the lower variation border towards this change is decreased steeply, to enable adaptation to the changed scene. An advantage of this reference-image based approach is that for visual changes induced by movement of an object in the scene, we obtain not only a detection of parts of the image at which the movement induced change, but more specifically, the parts of the image to which the object has moved. 3.5.2 Gesture perception Although we are not always aware of it, gestures are a powerful human communication channel that may provide additional (in particular, metric) information not easily conveyed by spoken language. Even infants who are too young to speak, use their hands to express their desires, for instance, by pointing towards objects, sometimes even accompanying this with a symbolic grasping gesture [8]. We use this natural behavior for two interaction channels. First, an open hand directed towards the robot can be used to express the command that the object held by the robot be handed over. Second, from a pointing hand the pointing direction is extracted and used to identify which object must be picked up. Both approaches require segmentation of image patches containing the human user’s hand. We base this segmentation on skin color, extracting skin colored areas in the camera images. The skin colors are specified simply through a lower and upper limit of hue and saturation in the HSI color space. Based on hypotheses about expected hand positions [20], the relevant hand in the visual scene can be detected.

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

PREPRINT

5

Fig. 6. The detected iris and eye corners of two different users as used for the estimation gaze direction.

Fig. 5. Three pointing gestures with different hand configurations.

Depending on the communication context, the current hand position is interpreted as a hand-over position or as a pointing gesture. There are a number of different methods to determine human hand configurations [22]. Some of these approaches use complex hand models or reconstruct hand posture from threedimensional data sets, often with the goal of identifying symbolic gestures. In our case, detailed information about each finger’s position is not needed. We merely need the pointing direction of the hand as whole. We need this information at a fast rate, however. Our approach takes all segmented hand pixels into account and computes their principle axis. Due to the hand’s anatomy, the resulting line approximates qualitatively the hand’s pointing direction, projected into 2D (compare Fig. 5). Cases, in which the hand is pointing directly into the camera will lead to misinterpretations, but can typically be detected by using the principle axis of second order in combination with the persistence in time of the principle axis. To obtain the three dimensional pointing direction, we first determine the first and last intersection of the two dimensional principle axis with the skin colored hand cluster. These two points are transformed into world space by obtaining their disparity values from a shiftable windows disparity algorithm [9], [16]. The 3D world position of the two points then provide the support positions of the pointing direction. 3.5.3 Gaze direction Before even pointing at objects of interest, people typically direct their gaze towards an interesting part of the visual array. Therefore, observing gaze direction provides useful information about a user’s intentions. To build an algorithm that extracts gaze direction [19], the face first must be found in the imaged area. This can be done based on all skin colored areas in the image somewhat analogously to the detection of pointing direction with a number of additional com-

putations. Given a fixed face geometry, the eyes can be found by a corner detection algorithm. Based on a hypothesis about eye position, the location of the iris can be estimated in two ways. First, the iris area is segmented by filling the darkest area of the eye region using a region-growing algorithm. A pattern matching procedure using a geometrical eye model improves the result. Second, we apply a Hough transformation to get another estimate of the location of the iris. Fusing the two estimates leads to a precise determination of iris position. The eye contours can be reconstructed by creating an eye contour model that is then matched with the edge information, generated from the image using a Canny filter with deterministic annealing. In a final step, the iris position is related to the position of the eye corners, from which a rough estimate of gaze direction can be made (Fig. 6). 3.5.4 Object detection and recognition In addition to these various visual channels of information about user instructions and user attention, perceptual information is needed about the scene to enable autonomous reaching and grasping. In particular, we need a method to detect, segment, and recognize objects on the work table. This task is facilitated by specific assumptions we can make about the visual scene. All objects, which are not part of the human machine interaction (HMI), must lie within the valid working area, that is, on the surface of the work table. From the pan/tilt configuration of the camera head and the fixed position of the working surface, the camera geometry relative to the table can be computed and updated as the camera head moves. This makes it possible to extract the table area in the image. Within that area, the unoccupied parts of the table surface can be eliminated based on the homogeneous table color using a histogram analysis. Each remaining region in the image, seperated by the extracted table surface, contains one or more objects. Building a color histogram of each of those image regions, we get their characteristic color distribution. If a single object presented in isolation must be learned, this characteristic color distribution is a good feature to thin out the scene during the search phase. To merely detect objects, this additional information is not needed. In each case the left over image regions are now candidates for further object hypotheses. The deter-

PREPRINT

BEHAVIORAL ORGANIZATION

Organizing the human-machine interaction and the episodes of autonomous behavior requires organization at the level of the underlying computing processes and at the level of the action selection. 4.1 Software Infrastructure Having different motor and sensor hardware attached to dedicated PCs and in order to process the information provided by sensors and needed for motors in parallel, there is a need for interprocess communication across the computer network. For this task a communication system based on the Parallel Virtual Machine (PVM) [7] has been developed. In order to deal with a continuous flow of information (e. g., the user’s hand position when tracking ) and with events (e. g., a user’s speech command), a channelbased communication scheme was developed, called PAPS (Parallel Asynchronous Process Synchronization) (see Fig. 7). A specific type of information corresponds to a communication channel, like the current position of the user’s hand. Different processes can subscribe to these channels for receiving the corresponding data. Processes delivering information do not send this information to other processes directly but to a defined channel. Subscription here means that a process checks in a continuous loop if there is information on a channel, with the underlying PVM it is assured that messages on a channel are buffered until they are fetched by all processes subsribed to a channel. The informational flow is defined at start-up of all processes in a configuation script. Here for every process is defined to which channel of information it shall subscribe. This can be seen as wiring up the informational connections. By using a start-up script processes can be easily “rewired”, by just changing the configuration script and not recompiling all processes. With this scheme a flexibel and modular architecture can be build up.

Buffer 1 Buffer 2

Process 2

PC II Process 3

PC n Process k

Buffer k

4

Ch1 Ch2 Ch3 Ch4 . . .

PC I Process 1

6

Buffer 3

mination of the disparity values of all pixels in those remaining regions generates 3D world coordinates for the corresponding points on the objects. This is typically a very large set of points. To simplify the recognition process, we project all these points PSfrag into the table plane. In case of learning an objectreplacements the seperated projection near to an expected position is saved with different rotation angles. In case of detecting objects each seperated projection pattern is assumed to be an object. When searching an object, the saved projections of the learning phase are used to find the object in the projection plane by pattern matching. Due to the patterns saved with different rotation angles, the object can be found approximately invariant towards translation and rotation in the table plane.

Fast-Ethernet

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

Logical Channels Fig. 7. C O RA’s communication scheme PAPS (Parallel Asynchronous Process Synchronization). Several processes running on different networked PCs and connected to different logical channels for receiving or sending messages are depicted.

4.2 Action Selection In the behavior-based paradigm [1], intelligent highlevel behavior emerges from the coordination of single low-level behaviors by generating complex behavioral sequences with complete control flow from the sensors to the actuators. The implementation of such an approach requires the parallel and asynchronous execution of several subsystems like sensory- and motor-behavior modules. Using the underlying communication infrastructure as described in the previous subsection, all sensory interpretation and motor control processes are modeled as elementary behaviors. A particular ongoing process acts as an arbitrator, i. e. it continuously receives the output of the sensory behaviors, computes the so-called sensory context from it, decides which motor behavior is appropriate to be executed, and effects the activation of this behavior. One sensory context is defined for each elementary behavior and describes the sensory reality that renders the activation of this behavior appropriate. The sensory contexts of several elementary behaviors are allowed to coincide. By this means, we achieve a competition among suitable basic actions so that a concept of redundancy w. r. t. the different possibilities to complete a behavioral goal is realized. The behavioral organization of our robotic system C O RA is designed to provide for autonomous behav-

PREPRINT

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

ior in close interaction with the human partner. Furthermore, the overall behavior has to be both flexible and robust towards abrupt changes in the environment, that are mainly due to the interventions of the user into the application flow of the robot. Because of the continuous coupling of sensory information and motor control through the sensory context, a behavioral sequence can be interrupted and continued starting with another elementary behavior by just changing the sensory context, which is mostly achieved by a speech command that defines a new goal. The system also requests feedback from the user to avoid unintentional failures, e. g. it asks the user to move her or his hand if it is expected in the workspace but cannot be detected. Especially, any behavior can be immediately aborted anytime by the speech command “stop” without causing a system break-down. 5

RESULTS

Making use of the interaction channels the robot is equipped with, different complex behaviors are realized. By means of an assembly task, we show how the integration of the implemented interaction channels described above leads the system to the desired goal. The scenario requires the robot to assemble, cooperatively with a human operator, a work piece consisting of two parts. We discuss some characteristic sequences, focussing on the role of the interaction channels. At the beginning of an assembly session, the robot initializes its hardware and reports its internal state via synthetical speech output. Then the skin color of the human operator is determined to make the system robust against lighting changes or differences among operators. When the user needs a particular object to work on the assembly task, he or she instructs the robot by spoken command to deliver the desired object. C O RA now searches the workspace for the known object. In doing so, it moves its head and always directs the gaze onto the object of interest, so that the user understands which object was selected by the robot and may intervene. If the search is successful, the robot segments the found object, calculates its position and orientation, and then directs the gripper towards the position of the object, grasps it, and picks it up. Next, the system searches for the operator’s open hand to deposit the object. Once the hand has been identified, it is tracked, and as soon as the hand stops moving, the object is handed over into the operator’s hand. The opening of the gripper is triggered when the force is sensed that results when the object touches the operator’s hand (see Fig. 9). In case the robot cannot recognize or disambiguate the desired object from the workspace, it requests feedback from the user in order to specify the ob-

7

Fig. 8. Autonomous grasping with user intervention.

Fig. 9. The robot recognises the hand gesture, trying to put the object into the operator hand, waiting for a force feedback.

ject to be grasped and asks the operator to point at the item (see Fig. 12). The image of the user’s hand is segmented on the basis of skin color, the pointing direction is determined and the goal object is identified. C O RA waits for the operator to remove his or her hand from the workspace and completes the reaching and grasping behavior to deposite the designated object into the user’s hand. During all these actions of the robot, the possibility of physical interaction enables the human operator to prevent obstacle collision by correcting the position of the gripper itself (see Fig. 8) or by changing only the arm configuration without influencing the gripper. The first is done by manually guiding the gripper, the second by touching the artificial skin (see Fig.10 and 11). In the same way, the grasping position can be manipulated by directed force applied to the gripper. The arm pose is then controlled autonomously by the robot. 6

CONCLUSION

Using close analogies between the structure, elementary behavior, and organization of behavior of hu-

Fig. 10. Grasping the object from the side.

PREPRINT

IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2003), 2003.

[6]

[7] [8] Fig. 11. Grasping the object from the side and moving the elbow down due to user intervention.

[9]

[10]

[11]

Fig. 12. Pointing to an object to specify an object.

[12] [13]

man users and the robotic assistant, a robotic assistant was developed that interacts with users through speech, hand gestures, gaze, and mechanical interaction. Composite behaviors such as autonomous object retrieval and hand-over are organized from elementary behaviors, which can be initiated by the user with natural means of communication. The pervasive anthropomorphism makes the man-machine interface intuitive. The system was exposed at the industrial Hannover fair to an (informed) lay audience. Initial reactions confirmed that the interface is intuitive. Limitations, which also became obvious, are the still very limited behavioral repertoire, the limited range of objects we can handle so far, and the still rather rigid and inflexible behavioral organization of the system.

[14] [15] [16] [17]

[18]

[19]

ACKNOWLEDGMENT This work is supported by the BMBF grant MORPHA (grant no. 01 IL 902 H1).

[20]

REFERENCES

[21]

[1] [2]

[3]

[4]

[5]

R.C. Arkin. Behavior-Based Robotics. MIT Press, 1998. Rainer Bischoff, Arif Kazi, and Markus Seyfarth. The morpha style guide for icon-based programming. In Proceedings of the 2002 IEEE Int. Workshop ROMAN, pages 482–487, September 2002. B. Bozkurt, M. Bagein, and T. Dutoit. From mbrola to numbrola. In Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis Blair Atholl, Scotland, 2001, pages 127–130, 2001. M. Ehrenmann, R. Becher, B. Giesler, B. Z¨ollner, O. Rogalla, and R Dillmann. Interaction with robot assistants: Commanding ALBERT. In Proceedings of the 2002 IEEE/RSJ Int. Conference IROS, 2002. Gernot A. Fink. Developing HMM-based recognizers with ESMERALDA. In V´aclav Matouˇsek, Pavel Mautner, Jana Ocel´ıkov´a, and Petr Sojka, editors, Text, Speech and Dia-

[22] [23]

8

logue, volume 1692 of Lecture Notes in Artificial Intelligence, pages 229–234. Springer, Berlin Heidelberg, 1999. Udo Frese, Bertold B¨auml, Steffen Haidacher, Guenter Schreiber, Ingo Schaefer, Matthias H¨ahnle, and Gerd Hirzinger. Off-the-shelf vision for a robotic ball catcher, 2001. Al Geist. PVM - Parallel Virtual Machine: A User’s Guide & Tutorial for Network Parallel Computing (Scientific and Engineering Computation). MIT Press, 1994. S. Goldin-Meadow, M. Wagner Alibali, and Breckinridge Church. Transitions in concept acquisitions: Using the hand to read the mind. Psychological Review, 100:279–297, 1993. Heiko Hirschm¨uller. Improvements in real-time correlationbased stereo vision. In Proceedings of IEEE Workshop on Stereo and Multi-Baseline Vision, pages 141–148, December 2001. I. Iossifidis and A. Steinhage. Control of an 8 dof manipulator by means of neural fields. In FSR2001, International Conference on Field and Service Robotics, Helsinki, Finland, June 11-13 2001. Yleisjljenns-Painoprssi. Ioannis Iossifidis and Axel Steinhage. Controlling a redundant robot arm by means of a haptic sensor. In ROBOTIK 2002, Leistungsstand - Anwendungen - Visionen, VDIBerichte 1679, pages 269–274, Ludwigsburg, Germany, June 2002. VDI/VDE, VDI Verlag. K. Kreutz-Delgado, M. Long, and H. Seraji. Kinematic analysis of 7-dof manipulators. The International Journal of Robotic Research, 11:469–481, 1992. Maurizio Miozzo, Pietro Morasso, Antonio Sgorbissa, and Renato Zaccaria. Locomaid (the locomotion aid) - a distributed architecture for planning and control. In Proceedings of the 2002 IEEE Int. Workshop ROMAN, pages 164– 169, September 2002. Akihisa Ohya. Human robot interaction in mobile robot applications. In Proceedings of the 2002 IEEE Int. Workshop ROMAN, pages 5–10, September 2002. T. Portele, B. Steffan, R. Preuss, W.F. Sendlmeier, and W. Hess. Hadifix - a speech synthesis system for german. In Proc. ICSLP ’92, Banff, pages 1227–1230, 1992. Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondance algorithms. DCV, 47:7–42, April-June 2002. A. Stopp, T. Baldauf, R. Hantsche, S. Horstmann, S. Kristensen, F. Lohnert, C. Priem, and B. R¨uscher. The manufacturing assistant: Safe, interactive teaching of operation sequences. In Proceedings of the 2002 IEEE Int. Workshop ROMAN, pages 386–391, September 2002. Rio Suda. Handling of object by mobile robot helper in cooperation with a human using visual information and force information. In Proceedings of the 2002 IEEE/RSJ Int. Conference on Intelligent Robots and Systems, volume 2, pages 1102–1107, October 2002. Christoph Theis and Kathrin Hustadt. Detecting the gaze direction for a man machine interface. In Proceedings of the 11th IEEE International Workshop ROMAN, pages 536– 541, 2002. Christoph Theis, Ioannis Iossifidis, and Axel Steinhage. Image processing methods for interactive robot control. In Proceedings of the 10th IEEE International Workshop ROMAN, pages 424–429, 2001. G.v. Wichert, C. Klimowicz, W. Neubauer, Th. W¨osch, G. Lawitzky, R. Caspari, H.-J. Heger, P. Witschel, U. Handmann, and M. Rinne. The robotic bar - an integrated demonstration of man-robot interaction in a service scenario. In Proceedings of the 2002 IEEE Int. Workshop ROMAN, pages 374–379, September 2002. Ying Wu and Thomas S. Huang. Hand modeling, analysis, and recognition. IEEE Signal Processing Magazine, pages 51–60, Mai 2001. R. Z¨ollner, O. Rogalla, R. Dillmann, and M. Z¨ollner. Understanding users intention: Programming fine manipulation tasks by demonstration. In Proceedings of the 2002 IEEE/RSJ Int. Conference on Intelligent Robots and Systems, pages 1114–1119, October 2002.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.