File(s) under permanent embargo
Learning Spatial Fusion and Matching for Visual Object Tracking
Siamese network based trackers have achieved outstanding performance in visual object tracking, which in essence is the application of the efficient cross-correlation as the matching function. However, it is experimentally found that the cross-correlation based matching function is difficult to generate accurate tracking results in some challenging environments, such as background clutters and fast motion. Thus, a new Siamese-based tracker named SiamFAM is proposed. Specifically, from the perspective of feature fusion, a new matching function named Concatenation is introduced into our tracker, which can reduce the influence of background clutters by fine-grained matching with little computational overhead. Meanwhile, an adaptively spatial feature fusion (ASFF) module is proposed, which can take full use of multi-layer features and reduce poor prediction results during the prediction process. In addition, a refinement module is adopted to reduce the occurrence of tracking drift. Extensive experiments are conducted on six challenging benchmarks, including VOT2016, VOT2019, UAV123, NFS, OTB100, and LaSOT, demonstrating that our tracker is practical and can achieve a leading performance.
History
Volume
13631 LNCSPagination
352-367Publisher DOI
ISSN
0302-9743eISSN
1611-3349ISBN-13
9783031208676Publication classification
E1.1 Full written paper - refereedTitle of proceedings
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Publisher
Springer Nature SwitzerlandSeries
Lecture Notes in Computer ScienceUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC