About me
My name is Gang Zhou, and I am currently pursuing a Ph.D. at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, with an expected graduation year of 2027.
Research
My research interests include trustworthy multimodal retrieval, agentic research and vision-language models.* denotes equal contributions.
We propose SAAH, an adaptive spectral-gating framework for robust deep hashing. By dynamically filtering out unstable frequency components while preserving discriminative semantic cues, SAAH effectively alleviates the accuracy-robustness trade-off. Experiments show it consistently outperforms existing defense methods under diverse adversarial attacks.
We propose SFCL, a cross-modal hashing framework leveraging spatial-frequency complementarity to overcome the robustness-accuracy trade-off. It filters robust features within each domain via a gating mechanism and integrates them through bidirectional interaction. Experiments show SFCL consistently outpaces state-of-the-art defenses under diverse adversarial attacks while maintaining or improving clean retrieval accuracy.
We propose FACH, a frequency domain adversarial attack method for deep cross-modal hashing retrieval systems, which combines low-frequency masking and multi-teacher gradient fusion to reveal the vulnerabilities of deep hashing models in the frequency domain. Experimental results show that FACH significantly outperforms existing transfer attack methods, enhancing the transferability and effectiveness of adversarial attacks.
We propose SAAT, an end-to-end adversarial training framework that enhances the robustness of deep hashing models for retrieval tasks. By integrating adversarial perturbations with hash code learning and similarity matrices, SAAT outperforms existing methods in cross-modal and image retrieval tasks while maintaining robustness against various attacks.
We propose BACH, an adversarial attack method for deep cross-modal hashing retrieval (DCMHR) models in black-box settings. By incorporating Random Gradient-Free Estimation (RGF) into deep hashing attacks, BACH generates effective adversarial samples without prior knowledge of the target model. Experiments show that BACH achieves attack success rates comparable to white-box attacks.
We propose STCH, a self-training-based cross-modal hashing framework designed to address semi-supervised and semi-paired problems. The framework leverages graph neural networks to capture inter-modality similarities and generate pseudo-labels, which are then refined using a heuristic filter to enhance label consistency. Through an alternating learning strategy for self-training, STCH outperforms existing methods in cross-modal hashing retrieval tasks.