Luo, YutongZhong, XinyueXie, JialanLiu, Guangyuan2025-04-032025-04-032025(2025). IEEE Transactions on Emerging Topics in Computational Intelligence, PP(99), 1-13.2471-285Xhttps://hdl.handle.net/2292/71765Image emotional responses of people to various stimuli in images, has attracted substantial attention in recent years with the proliferation of social media. As human emotion is a highly complex and abstract cognitive process, simply extracting local or global features from an image is not sufficient for recognizing the emotion of an image. The psychologist Moshe proposed that visual objects are usually embedded in a scene with other related objects during human visual comprehension of images. Therefore, we propose a twobranch emotion-recognition network known as the combined visual relationship feature and scene feature network (CVRSF-Net). In the scene feature-extraction branch, a pretrained CLIP model is adopted to extract the visual features of images, with a feature channel weighting module to extract the scene features. In the visual relationship feature-extraction branch, a visual relationship detection model is used to extract the visual relationships in the images, and a semantic fusion module fuses the scenes and visual relationship features. Furthermore, we spatially weight the visual relationship features using class activation maps. Finally, the implicit relationships between different visual relationship features are obtained using a graph attention network, and a two-branch network loss function is designed to train the model. The experimental results showed that the recognition rates of the proposed network were 79.80%, 69.81%, and 36.72% for the FI-8, Emotion6, and WEBEmo datasets, respectively. The proposed algorithm achieves state-of-the-art results compared to existing methods.Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm46 Information and Computing Sciences4608 Human-Centred Computing4603 Computer Vision and Multimedia ComputationBehavioral and Social ScienceEye Disease and Disorders of VisionClinical ResearchMind and Body4611 Machine learningCVRSF-Net: Image Emotion Recognition by Combining Visual Relationship Features and Scene FeaturesJournal Article10.1109/tetci.2025.3543300Copyright: IEEE2471-285X