In this paper, we address the major drawback of SMIC, by implementing a hallucination mechanism in order to remove the requirement for providing saliency images for training obtained using one of the existing algorithms. In other words, we show that the explicit saliency branch which requires training on a saliency image dataset, can be replaced with a branch which is trained end-to-end for the task of image classification (for which no saliency dataset is required). We replace the saliency image with the input RGB image. We then pre-train this network for the task of image classification using a subset from ImageNet validation dataset. During this process, the saliency branch will learn to identify which regions are more discriminative. In a second phase, we initialize the weights of the saliency branch with these pre-trained weights. We then train the system end-to-end on the fine-grained dataset using only the RGB images. Results show that the saliency branch improves fine-grained recognition significantly, especially for domains with few training images. We briefly summarize below our main contributions: • we propose an approach which hallucinates saliency maps that are fused together with the RGB modality via a modulation process, • our method does not require any saliency maps for training (like in these works [Murabito et al., 2018, Flores et al., 2019]) but instead is trained indirectly in an end-to-end fashion by training the network for image classification, • our method improves classification accuracy on three fine-grained datasets, especially for domains with limited data.