In this paper, we present a novel Two-stEp Cross-modal Hashing method, TECH for short, for cross-modal retrieval tasks. As a two-step method, it first learns hash codes based on semantic labels, while preserving the similarity in the original space and exploiting the label correlations in the label space. In the light of this, it is able to make better use of label information and generate better binary codes. In addition, different from other two-step methods that mainly focus on the hash codes learning, TECH adopts a new hash function learning strategy in the second step, which also preserves the similarity in the original space. Moreover, with the help of well designed objective function and optimization scheme, it is able to generate hash codes discretely and scalable for large scale data. To the best of our knowledge, it is the first cross-modal hashing method exploiting label correlations, and also the first two-step hashing model preserving the similarity while leaning hash function. Extensive experiments demonstrate that the proposed approach outperforms some state-of-the-art cross-modal hashing methods.