Existing unsupervised domain adaption approaches for cross-user Wearable Human Activity Recognition (WHAR) typically assume that users utilize the uni-modal sensor deployment configuration and cannot transfer across different sensor modalities. In this paper, we consider the more realistic cross-modal wearable human activity recognition setting to investigate the unsupervised domain adaptation task. This new context presents two formidable challenges: (1) how to alleviate modality heterogeneity across users, and (2) how to explore cross-modal domain correlation for better unsupervised domain adaptation. We propose a cross-modal unsupervised domain Adaptation model with Class-Aware Sample Weight Learning (CASWL-Adapt) to address both challenges. First, a spherical modality discriminator is designed to capture modal-specific discriminative features of each user during domain adaptation, thus achieving a reduction of sample variance caused by modal heterogeneity. Given a user-specific modal, modality-independent domain-invariant features can be efficiently generated by the well-developed modality discrimination loss and adversarial training. Second, a class-aware weight network is devised to calculate sample weights through classification loss and activity class similarity for each sample. Furthermore, the network leverage end-to-end learning and meta-optimization update rules to explore inter-domain correlations. Cross-modal activity classes are expected to adaptively implement different weighting schemes based on their intrinsic bias characteristics to select the most appropriate samples for domain knowledge transfer. We demonstrate that CASWL-Adapt achieves state-of-the-art results on three challenging benchmarks: Epic-Kitchens, Multimodal-EA and RealWorld, especially effective for new users of unseen modality.