Video shots are often treated as the basic elements for retrieving information from videos. In recent years, video shotcategorization has received increasing attention, but most of the methods involve a procedure of supervised learning, i.e., training a multi-class predictor (classifier) on the labeled data. In this paper, we study a general framework to unsupervisedly discover video shot categories. The contributions are three-fold in feature, representation, and inference: (1) A new feat
1