Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution
and clean images. However, directly applying the parsers
trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport
or workplace, often gives non-satisfactory performance due
to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source
domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient crossdomain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our
proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain
differences. A discriminative feature adversarial network is
introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions
of two domains. Besides, our proposed model also introduces
a structured label adversarial network to guide the parsing
results of the target domain to follow the high-order relationships of the structured labels shared across domains. The
proposed framework is end-to-end trainable, practical and
scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target
domains. The results consistently confirm data efficiency and
performance advantages of the proposed method for the challenging cross-domain human parsing problem.
Abstract—This paper presents a robust Joint Discriminative
appearance model based Tracking method using online random
forests and mid-level feature (superpixels). To achieve superpixel-
wise discriminative ability, we propose a joint appearance model
that consists of two random forest based models, i.e., the
Background-Target discriminative Model (BTM) and Distractor-
Target discriminative Model (DTM). More specifically, the BTM
effectively learns discriminative information between the target
object and background. In contrast, the DTM is used to suppress
distracting superpixels which significantly improves the tracker’s
robustness and alleviates the drifting problem. A novel online
random forest regression algorithm is proposed to build the
two models. The BTM and DTM are linearly combined into
a joint model to compute a confidence map. Tracking results
are estimated using the confidence map, where the position
and scale of the target are estimated orderly. Furthermore,
we design a model updating strategy to adapt the appearance
changes over time by discarding degraded trees of the BTM and
DTM and initializing new trees as replacements. We test the
proposed tracking method on two large tracking benchmarks,
the CVPR2013 tracking benchmark and VOT2014 tracking
challenge. Experimental results show that the tracker runs at
real-time speed and achieves favorable tracking performance
compared with the state-of-the-art methods. The results also sug-
gest that the DTM improves tracking performance significantly
and plays an important role in robust tracking.
1