Phd defense 2014 ppt
Descrição do Produto
Distributed Tracking and Re-‐iden3fica3on in a Camera Network Santhoshkumar Sunderrajan Advisor: Prof. B. S. Manjunath
Vision Research Lab, Department of Electrical and Computer Engineering, University of California, Santa Barbara.
Wide-‐Area Camera Networks
• Surveillance • Crowd Analy=cs for Business Intelligence
Thesis Focus Object Tracking in Camera Networks Overlapping
Object Tracking is difficult and challenging
Irregular illumina=on changes
Complex appearance and mo=on changes
Occlusions cause confusion
Automated Analysis of Camera Networks
Mul3-‐Camera Tracking Single Camera Tracking ICIP-‐13
Overlapping Views Thesis Outline
ICDSC-‐13 (Qualifiers) Journal (Today)
Non-‐ Overlapping Views ICDSC-‐13 (Qualifiers) Journal (Today)
Contribu3ons • Single Camera Tracking –
Robust Tracking by Detec=on (ICIP’13)
• Mul3-‐Camera Tracking with Overlapping Views – –
Mul=ple View Discrimina=ve Appearance Modeling (ICDSC’13, Excellent Paper Award) Robust Mul=-‐Camera Tracking with Appearance and Spa=al Contexts (to be submiQed to PAMI)
• Mul3-‐Camera Object Search and Retrieval with Non-‐Overlapping Views – –
Context-‐Aware Graph Modeling for Object Search and Retrieval (ICDSC’13) Context-‐Aware Hypergraph Modeling for Summariza=on (To be submiQed)
Object Tracking
• Associate objects from one frame to another frame • Object state “xt” at =me “t” is represented by: – Loca=on (xt, yt) – Scale (st)
Tracking with Overlapping Views • Es=mate object loca=on on both the image plane (xt) and the common ground plane (xt(g)) consistent across mul3ple camera views
xt
xt(g)
Ground Plane
Assump3ons
• Ground plane Homography is pre-‐computed • Cameras are 3me synchronized • Object associa3on across mul3ple cameras are known
Related Works • Mul=camera People Tracking with a Probabilis=c Occupancy Map (Fleuret et al. PAMI’09) – Dynamic programming with simple color based appearance modeling for global trajectory es=ma=on (Centralized)
• Distributed Mul=-‐target Tracking in a Self-‐ configuring Camera Network (Soto et al. CVPR’08) – Consensus Filtering on the ground plane for trajectory es=ma=on (Distributed)
Perform es=ma=on in the common ground plane
Tracker Failure
Ground Plane
Inaccurate ground plane fusion due to outliers
Observa3ons • Objects in a given scene exhibit similar mo=on paQerns – (1) Leveraging Contextual Informa=on to guide the ground plane fusion to reject outliers
• Feedback from the ground plane fusion could be used to improve image plane tracking – (2) Ac=ve Collabora=on
1. Leveraging Contextual Informa3on Loca=on and appearance of other co-‐occurring objects
Rela=ve distances and appearances vary similarly
1. Leveraging Contextual Informa3on Scene informa=on: e.g., entry and exit points, obstacles
Exit Point
Obstacle
2. Ac3ve Collabora3on
Ground Plane
Closed loop interac3on between image plane and ground plane trackers
c=1
Nota3ons
c=N
1:t
N – Number of cameras t – Time instance i,j – Object indices c – Camera index z – Image plane measurements x – Object es=mate on image/ground planes
1:t
Global Centralized Tracking • Es=mate the object loca=on given the measurements from all the cameras
Object Es=mate
Measurements from all the cameras
• Direct maximiza=on of the objec=ve func=on, e.g., MiQal IJCV’03, Kim ECCV’06 Fleuret PAMI’09, Khan PAMI’09, Eshel IJCV’10, Berclaz PAMI’11
Modeling Assump3ons for Distributed Tracking No raw image data is transferred
Object Es=mate
Measurements from camera “C” alone
We perform independent es3ma3on for every object “i”
Bayesian Formula3on • Assump=ons – Measurement of object “i” at =me “t” is condi=onally independent of measurements from =me “1:t-‐1” given the object es=mate at ground plane – Object es=mate at “t” do not depend on measurements of other objects (i≠j) at =me “t”
Likelihood Modeling An Appearance based Likelihood Model
Ground Plane Posterior Density
Solving MAP Es3ma3on from t-‐1 to t
• MAP es=ma=on using Par=cle Filters – Markov Chain Monte Carlo – Generalizes for Non-‐linear and Non-‐Gaussian
Par3cle Filters • Let Xt = [ xt, yt, st ] be the object state and Zt be the measurement
R-particles
t
• In Sequen3al Importance Resampling par=cle filters, the posterior distribu=on is approximated by a set of “R” weighted par=cles:
Update
Resample
Par3cle Filter Based Mul3-‐View Tracking
• Given a par=cle set from =me “t-‐1”:
– Predict the par=cle set at =me “t” using the proposal distribu=on (assumed to Gaussian) – At =me “t” Update par=cle Weights with Appearance Model
Update Appearance Model with Context
Ground Plane Fusion With Spa=al and Scene Contexts
Par=cle State Predic=on from =me t-1
Update Par=cle weight with classifier Ht1,i
Trajectory Database
Update classifier Ht1,i using Appearance Context
Update Par=cle Weight with Ground Plane Posterior Density
Camera 1 Share Weak Classifiers
Camera N
network channel
Update classifier HtN,i using Appearance Context Par=cle State Predic=on from =me t-1
Update Par=cle weight with classifier HtN,i
Update Par=cle Weight with Ground Plane Posterior Density
ADMM consensus with spa=al Context for Ground Plane Posterior Density Es=ma=on
Share Average Particle State ADMM consensus with spa=al Context for Ground Plane Posterior Density Es=ma=on Trajectory Database
Par=cle State Predic=on from =me t-1
Update Par3cle weight with classifier Ht1,i
Trajectory Database
Update classifier Ht1,i using Appearance Context
Update Par=cle Weight with Ground Plane Posterior Density
Camera 1 Share Weak Classifiers
Camera N
network channel
Update classifier HtN,i using Appearance Context Par=cle State Predic=on from =me t-1
Update Par3cle weight with classifier HtN,i
Update Par=cle Weight with Ground Plane Posterior Density
ADMM consensus with spa=al Context for Ground Plane Posterior Density Es=ma=on
Share Average Particle State ADMM consensus with spa=al Context for Ground Plane Posterior Density Es=ma=on Trajectory Database
Appearance Modeling Ensemble classifier is used for learning discrimina=ve appearance model Posi.ve Examples
Nega.ve Examples
Histogram of Oriented Gradients and LAB color pixels
Appearance Weigh3ng Par3cles
Confidence Map
Par=cle weights computed based on the classifier response
Appearance Context
Appearance of co-‐occurring objects is helpful in learning a beQer appearance model
Discrimina3ve Appearance Context
Use highly confident posi=ve examples of co-‐occurring objects as nega=ve examples for the object of interest
Missing Features
Exploit Mul3-‐view appearance informa3on
t
t+1
Objects undergo complex appearance (shape) changes between frames
Exploi3ng Mul3-‐View Appearance Camera 1
Camera 2
Share best performing weak classifiers across mul=ple views
Par=cle State Predic=on from =me t-1
Update Par=cle weight with classifier Ht1,i
Trajectory Database
Update classifier Ht1,i using Appearance Context
Update Par=cle Weight with Ground Plane Posterior Density
Camera 1 Share Weak Classifiers
Camera N
network channel
Update classifier HtN,i using Appearance Context Par=cle State Predic=on from =me t-1
Update Par=cle weight with classifier HtN,i
Update Par=cle Weight with Ground Plane Posterior Density
ADMM consensus with spa3al Context for Ground Plane Posterior Density Es3ma3on
Share Average Particle State ADMM consensus with spa3al Context for Ground Plane Posterior Density Es3ma3on Trajectory Database
Context-‐Aware Distributed Consensus • Enforces agreement of the ground plane es=mate across cameras
Ground Plane
Dynamic Scene Context C5
C5
C3
C4 C1
C6
C3
C2
C4
C1
C6
C2
Forces ground plane consensus es=mate to be closer to the predicted loca3on based on scene dynamics
Sta3c Scene Context
Closest Exit Loca3on
#exit zones
Guides the ground plane consensus es=mate towards the closest exit loca3ons
Spa3al Context
#co-‐occurring objects
Rela3ve distances between co-‐occurring objects to be consistent
Context-‐Aware Distributed Consensus
Solve the consensus cost func=on using Alterna=ng Direc=on Method of Mul=pliers (ADMM) Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed op=miza=on and sta=s=cal learning via the alterna=ng direc=on method of mul=pliers. Founda=ons and Trends® in Machine Learning, 3(1), 1-‐122.
Ac3ve Feedback • Ground plane Kalman filter is updated using the consensus es3mate • Par3cles are re-‐weighted using the posterior density of Kalman Filter • Final es3mate (bounding box) on the image plane is obtained by averaging par3cles
Ground Plane
Datasets
• Algorithms evaluated on outdoor sequences (Kirby), PETS-‐2009 and indoor sequences (HFH) • Videos (640x480) captured at a variable frame rate (~20 FPS)
Indoor Sequences Proposed
Mean Shi_
Struck
No Scene Informa=on
Mul3ple Instance Learning
Outdoor Sequences
View 2
View 1
Only Appearance
Online Adaboost
Mul3ple Instance Learning
With Ground Plane Fusion
PETS-‐2009 Sequences Appearance Based
Proposed
No Spa=al-‐Context
Distributed Filtering Based Tracker (PETS-‐2009 Sequences) Algorithm
MT
PT
ML
IDS
MOTA MOTP
Proposed
5
0
0
0
100
99.51
MTIC
1
4
0
0
45
58.28
ICF-‐NN
1
4
0
0
45
58.27
JPDA-‐KCF
1
4
0
0
45
55.58
CLEAR-‐MOT MT -‐ Mostly Tracked PT -‐ Par=ally Tracked ML -‐ Mostly Lost IDS -‐ ID Switches MOTA -‐ Mul=ple Object Tracking Accuracy MOTP -‐ Mul=ple Object Tracking Precision
PETS-‐2009 Sequences
Sta.c scene context forms the weakest constraint
xity e l p m al Co n o . uta ses a e r c Comp in
Evalua3on Metrics Bgt
Bp
• RMS pixel error and VOC detec=on scores are used for comparing single object appearance based trackers
Outdoor Sequences Algorithms
OAB
OAB-‐PF
MS
MIL
MIL-‐PF
Struck
Proposed
RMS
33
36
57
36
21
16
8
VOC
0.3
0.31
0.15
0.39
0.43
0.51
0.69
Indoor Sequences Algorithms
OAB
OAB-‐PF
MS
MIL
MIL-‐PF
Struck
Proposed
RMS
16
11
18
17
14
22
9
VOC
0.52
0.54
0.45
0.49
0.56
0.49
0.64
PETS-‐2009 Algorithms
OAB
OAB-‐PF
MS
MIL
MIL-‐PF
Struck
Proposed
RMS
63
128
93
92
128
104
16
VOC
0.38
0.1
0.22
0.4
0.1
0.2
0.66
Distributed Filtering Based Tracker • STRUCK tracker is used to generate measurements on the image plane • Ground plane measurements for other mul=ple camera trackers are obtained using a pre-‐ computed homographic transforma=on (Outdoor Sequences) Algorithm
Mean Error (m)
Error Standard Devia3on (m)
Proposed
0.31
0.45
ICF
11.3
1.8
GKCF
11.7
4.7
KCF
11.3
1.8
CKF
11.2
1.8
Discussion • A distributed tracking algorithm with Ac=ve Collabora=on • Proposed mul=-‐camera tracker takes approximately 1 second per frame with MATLAB implementa=on on a machine with 8GB RAM and 2.67 GHZ processor • Proposed approach sends 0.5kB of data per object per frame
Image Plane Tracking Image Plane Tracking Image Plane Tracking Image Plane Tracking
Ground Plane Tracking
Tracking with Non-‐Overlapping Views
• Associate objects between two different views – Appearance – Spa=al-‐Temporal Dynamics
Related Works
• KnightM: A Real Time Surveillance System For Mul=ple Overlapping and Non-‐overlapping Cameras (Javed et al. CVPR’05) – Associates global trajectories using appearance and spa=al informa=on
• Unsupervised Salience Learning for Person Re-‐ iden=fica=on (Zhao et al. CVPR’13) – Finds salient regions in image patches and performs pairwise similarity matching
• Clothing Co-‐segmenta=on for Recognizing People (Gallager et al. CVPR’08) – Performs clothing segmenta=on
Challenges • Appearance based associa=on – Viewpoint, pose and ligh=ng changes
• Clothing based associa=on – Almost impossible to parse clothing configura=on in surveillance videos
• Spa=al-‐Temporal based associa=on – Needs complete knowledge about the network
View 1
View 2
Observa3ons
Color drius due to illumina=on/ligh=ng changes
Contribu3ons • Color driu aware graph matching framework for associa=ng objects
Assump3ons • No live streaming of data available • Ground truth object associa=ons along with =mestamps are available • Cameras could perform primi=ve tasks TS = 1000
TS = 1020
Color Histogram Features • A mul=-‐dimensional LAB space color-‐ histogram (D=288) – Patch size of 10x10 on a grid with step size 4 pixels – Down-‐sampled images into 3-‐levels – Average pool the L2-‐normalized dense color histograms
Offline Learning of Mul3-‐view Color-‐Dri_ • At the offline training stage, color histogram associa=ons (zi, zj) are known
(ziTorso, zjTorso)
(ziLegs, zjLegs)
Training Random Forest Classifier Nega.ve Examples
(ziTorso, zjTorso)
(ziLegs, zjLegs)
Posi.ve Examples
(ziTorso, zjTorso)
(ziLegs, zjLegs)
Color-‐Dri_ Score • Given any two color histograms (zi, zj) , the color-‐ driV score is given by
Color Histogram
#Trees
Tree’s Posterior Score
Wide Area Camera Network
Camera 1
Transfer 1 Object Records
Human Network Analyst Color-‐dri_ Aware Hypergraph Modeling
2
3
Camera 2
Query (e.g., finding object instances in camera 1 and 4 between time 10am and 10:02am)
Hypergraph based ranking. 4 Camera 3
5 Transfer
Visualiza=on Module
Video Frames
Camera 1 at Time 10:01:25am
7
Camera 2 at Time 10:01:45am
6
Network Camera 10 Camera Network
Channel
Camera 4 at Time 10:01:35am
DB
DB
DB
Data Storage Units
Snapshots
1. Detec3on and Tracking • A mean-‐shiV with background subtrac3on based tracker is used for detec=on/tracking • Remote cameras send abstracted record to the central node
Camera ID
Object ID
Timestamp
Object Bounding Box
Object Image Data
2. Observa3on Modeling
• At the central node, rela=onship between tracklets “i” and “j” is computed based on – Appearance (wijAppearance) – Spa=al-‐Temporal (wijSpa=al-‐Temporal)
Appearance Weigh3ng
wijAppearance
Superpixel Graph Representa3on • At the tes=ng stage, object is over-‐segmented using SLIC superpixel segmenta=on View 1
View 2
Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk, SLIC Superpixels Compared to State-‐of-‐the-‐art Superpixel Methods, IEEE Transac=ons on PaQern Analysis and Machine Intelligence, vol. 34, num. 11, p. 2274 -‐ 2282, May 2012.
Color-‐dri_ Aware Graph Matching
• Given two superpixel based graphs from two different views • Match segments using Balanced Graph Matching
Graph Matching Results
Global Affinity Matrix
Affine Constraint
Node-‐to-‐Node Similarity
k, k’, k1, k2 – Superpixel Indices Edge-‐to-‐Edge Similarity Cour, T., Srinivasan, P., & Shi, J. (2007). Balanced graph matching. Advances in Neural Informa=on Processing Systems, 19, 313.
Node-‐to-‐Node Similarity
Likelihood of a color histogram geJng driKed
Edge-‐to-‐Edge Similarity
Similar Feature Difference Vectors get a larger score
Feature Difference Vector between two nodes
Appearance Weigh3ng
• Given the pairwise graph associa=ons (k,k*), the appearance weigh=ng score is given by
Color-‐dri_ Score x Histogram Intersec3on
#edges
Computa=onal complexity O(mimj)
Example Matches
Spa3al-‐Temporal Weigh3ng
l a r o p m e T -‐ l a 3 a Sp w ij
Building Spa3al-‐Temporal Topology Model • Let td be the =me delay for an object to travel across any two loca=ons (li, li*) between two cameras • The training samples (y =[li, li*, td]) are clustered using GMM
li
td
li*
Spa3al-‐Temporal Topology Weigh3ng
weights mean
co-‐variance
Wide Area Camera Network
Camera 1
Transfer 1 Object Records
Human Network Analyst Color-‐dri_ Aware Hypergraph Modeling
2
3
Camera 2
Query (e.g., finding object instances in camera 1 and 4 between time 10am and 10:02am)
Hypergraph based ranking. 4 Camera 3
5 Transfer
Visualiza=on Module
Video Frames
Camera 1 at Time 10:01:25am
7
Camera 2 at Time 10:01:45am
6
Network Camera 10 Camera Network
Channel
Camera 4 at Time 10:01:35am
DB
DB
DB
Data Storage Units
Snapshots
2. Hypergraph Representa3on • Hypergraph accounts for local grouping and models higher order rela3onship • Network wide graph, G=(V, E, W)
SpatialTemporal
V – Vertex (Object/tracklet) E -‐ Edge W -‐ Diagonal Hyperedge weight
• A pair of Hyperedges created per vertex EApp -‐ Appearance based Hyperedges EST -‐ Spa=al-‐Temporal based Hyperedges
Appearance
3. Ini3al Label Vector Construc3on
Object Searching “Find all objects related to the selected object from camera 8 at 3me 9:33:02am”
- A tracklet/object of interest is chosen as query to the system - An ini=al label vector is defined i.e., ri = 1
. . . 0 0 1 0 0 . .
Wide Area Camera Network
Camera 1
Transfer 1 Object Records
Human Network Analyst Color-‐dri_ Aware Hypergraph Modeling
2
3
Camera 2
Query (e.g., finding object instances in camera 1 and 4 between time 10am and 10:02am)
Hypergraph based ranking. 4 Camera 3
5 Transfer
Visualiza=on Module
Video Frames
Camera 1 at Time 10:01:25am
7
Camera 2 at Time 10:01:45am
6
Network Camera 10 Camera Network
Channel
Camera 4 at Time 10:01:35am
DB
DB
DB
Data Storage Units
Snapshots
4. Hypergraph Based Ranking
• Graph based semi-‐supervised ranking algorithm – nodes sharing many incidental hyperedges are guaranteed to obtain similar labels
Ranking Scalar factor Scores signifying the
Ini=al Label L=I – Θ is the Vector
contribu=on of ini=al label Hypergraph Laplacian vector
Huang, Y., Liu, Q., Zhang, S., & Metaxas, D. N. (2010, June). Image retrieval via probabilis=c hypergraph ranking. In Computer Vision and PaQern Recogni=on (CVPR), 2010 IEEE Conference on (pp. 3376-‐3383). IEEE.
Bike Path Dataset
• A wide area camera network consis=ng of 10 cameras (Cisco wireless-‐G WVC2300)
Test Bed
Searching Results
Cam ID 8
Cam ID 7
Searching Results
Cam ID 2
Cam ID 3
ICDSC’13
By averaging Appearance and ST weights
Number of results returned
VIPeR Dataset • Benchmark dataset for Person Re-‐iden=fica=on • Contains 632 pairs of person images • 316 pairs used for training and 316 pairs used for tes=ng • Cumula3ve Matching Curve (CMC) for comparing person re-‐iden=fica=on Shows percentage of queries ranked accurately at different ranking levels Gray, D., Brennan, S., & Tao, H. (2007, October). Evaluating appearance models for recognition, reacquisition, and tracking. In IEEE International workshop on performance evaluation of tracking and surveillance.
Person Re-‐iden3fica3on Results
Re-‐iden3fica3on results at different ranks
Marginal improvements due to learning color driK
Discussion
• Color driV aware appearance matching • Hypergraphs to encode contextual informa3on • Extensive Experimenta3on on real-‐world large scale distributed camera networks
Summary • Distributed Analysis of Big Data • Exploit the Contextual Informa=on to improve the robustness – Object tracking with overlapping views – Object search and retrieval with non-‐overlapping views
Future Works • Object Tracking – Mul3-‐view tracking by detec3on using human detector responses – Crowd Sourced object tracking with measurements obtained from mul3ple object tracking algorithms • Object Search and Retrieval – Human mo3vated weakly supervised saliency learning for person re-‐iden3fica3on
Publications
Journals: • Sunderrajan, S., Jagadeesh, V., Manjunath, B. S. (2014). Robust Mul=ple Camera Tracking with Spa=al And Appearance Contexts, IEEE PaQern Analysis and Machine Intelligence (to be submiked). • Sunderrajan, S., Manjunath, B. S. (2014) Context-‐Aware Hypergraph Modeling for Summariza=on (to be submiked). • Thakoor*, N., Sunderrajan*, S., Bhanu, B., Manjunath, B.S. (2014). Tracking People in Camera Networks, IEEE Computer (*equal contribu3on authors). • Kuo, T., Ni, Z., Sunderrajan, S., Manjunath, B. S. (2014). , Calibra=ng a Wide-‐Area Camera Network with Non-‐Overlapping Views using Mobile Devices, ACM Transac=ons on Sensor Networks. • Xu, J., Jagadeesh, V., Ni, Z., Sunderrajan, S., Manjunath, B. S. (2013). Graph-‐based Topic-‐focused Retrieval in Distributed Camera Network, IEEE Transac=ons of Mul=media. Conference Proceedings: • Summariza=on-‐Driven Ac=vity Analysis in Camera Networks (in prepara3on) • Sunderrajan, S., Manjunath, B. S. (2013). Mul=ple View Discrimina=ve Appearance Modeling with IMCMC for Distributed Tracking, ACM/ IEEE Interna=onal Conference on Distributed Smart Cameras (Excellent Paper Award). • Kuo, T., Sunderrajan, S., Manjunath, B. S. (2013). Camera Alignment using Trajectory Intersec=ons in Unsynchronized Videos, IEEE Interna=onal Conference on Computer Vision. • Sunderrajan, S., Xu, J., Manjunath, B. S. (2013). Context-‐Aware Graph Modeling for Object Search and Retrieval in a Wide Area Camera Network, ACM/IEEE Interna=onal Conference on Distributed Smart Cameras. • Sunderrajan, S., Karthikeyan, S., Manjunath, B. S. (2013). Robust Mul=ple Object Tracking by Detec=on with Interac=ng Markov Chain Monte Carlo, IEEE Interna=onal Conference on Image Processing. • Ni, Z., Sunderrajan, S., Rahimi, A., Manjunath, B. S. (2010). Distributed par=cle filter tracking with online mul=ple instance learning in a camera sensor network, IEEE Interna=onal Conference on Image Processing. • Ni, Z., Sunderrajan, S., Rahimi, A., Manjunath, B. S. (2010). Par=cle filter tracking with online mul=ple instance learning, IEEE Interna=onal Conference on PaQern Recogni=on. Workshop: • Sunderrajan, S., Pourian, N., Hasan, M., Zhu, Y., Manjunath, B.S., Chowdhury, A.R., Discrimina=ve Reranking based Video Object Retrieval (2012). TRECVID Workshop Technical Report. • Hasan, M., Zhu, Y., Sunderrajan, S., Pourian, N., Manjunath, B.S., Chowdhury, A.R., Ac=vity Analysis in Unconstrained Surveillance Videos (2012). TRECVID Workshop Technical Report.
Research Contribu3ons
• Datasets
– Indoor Mul3-‐Camera Tracking (HFH ) with 5 Cameras – Outdoor Mul3-‐Camera Tracking (Kirby) with 6 Cameras – 10 Cameras Bike Path Object Search and Retrieval
• Open Source Code – Framework for Mul=-‐Camera Tracking (C++)
Programming Language
#lines of code
C++
~12k
Matlab
~30k
hQp://vision.ece.ucsb.edu/~santhosh/souware.html
Acknowledgment • • • • •
Prof.B.S.Manjunath (Chair) Prof.Kenneth Rose Prof.MaQhew Turk Prof.Michael Liebling Fellow Lab Members
Lihat lebih banyak...
Comentários