Autonomous scene understanding by object classification today, crucially depends on the accuracy of appearance based robotic perception. However, this is prone to difficulties in object detection arising from unfavourable lighting conditions and vision unfriendly object properties. In our work, we propose a spatial context based system which infers object classes utilising solely structural information captured from the scenes to aid traditional perception systems. Our system operates on novel spatial features (IFRC) that are robust to noisy object detections; It also caters to on-the-fly learned knowledge modification improving performance with practise. IFRC are aligned with human expression of 3D space, thereby facilitating easy HRI and hence simpler supervised learning. We tested our spatial context based system to successfully conclude that it can capture spatio structural information to do joint object classification to not only act as a vision aide, but sometimes even perform on par with appearance based robotic vision.