In this paper we present a vision-based approach to self-localization that uses a novel scheme to integrate featurebased matching of panoramic images with Monte Carlo localization. A specially modified version of Lowe’s SIFT algorithm is used to match features extracted from local interest points in the image, rather than using global features calculated from the whole image. Experiments conducted in a large, populated indoor environment (up to 5 persons visible) over a period of several months demonstrate the robustness of the approach, including kidnapping and occlusion of up to 90% of the robot’s field of view.