With the quick development of social websites, there are more opportunities to have different media types (such as text, image, video, etc.) describing the same topic from large-scale heterogeneous data sources. To efficiently identify the inter-media correlations for multimedia retrieval, unsupervised cross-modal hashing (UCMH) has gained increased interest due to the significant reduction in computation and storage. However, most UCMH methods assume that the data from different modalities are well paired. As a result, existing UCMH methods may not achieve satisfactory performance when partially paired data are given only. In this article, we propose a new-type of UCMH method called robust unsupervised cross-modal hashing (RUCMH). The major contribution lies in jointly learning modal-specific hash function, exploring the correlations among modalities with partial or even without any pairwise correspondence, and preserving the information of original features as much as possible. The learning process can be modeled via a joint minimization problem, and the corresponding optimization algorithm is presented. A series of experiments is conducted on four real-world datasets (Wiki, MIRFlickr, NUS-WIDE, and MS-COCO). The results demonstrate that RUCMH can significantly outperform the state-of-the-art unsupervised cross-modal hashing methods, especially for the partially paired case, which validates the effectiveness of RUCMH.