Recently, deep learning models, such as Convolutional Neural Networks, have shown to give good performance for various computer vision tasks. A pre-requisite for such models is to have access to lots of labeled data since the most successful ones are trained with supervised learning. The process of labeling data is expensive, time-consuming, tedious, and sometimes subjective, which can result in falsely labeled data, which has a negative effect on both the training and the validation. In this work, we propose a human-in-the-loop intelligent system that allows the agent and the human to collabo- rate to simultaneously solve the problem of labeling data and at the same time perform scene labeling of an unlabeled image data set with minimal guidance by a human teacher. We evaluate the proposed in- teractive learning system by comparing the labeled data set from the system to the human-provided labels. The results show that the learning system is capable of almost completely label an entire image data set starting from a few labeled examples provided by the human teacher.