In recent years, natural verbal and non-verbal human-robot interaction has attracted an increasing interest. Therefore, models for robustly detecting and describing visual attributes of objects such as, e.g., colors are of great importance. However, in order to learn robust models of visual attributes, large data sets are required. Based on the idea to overcome the shortage of annotated training data by acquiring images from the Internet, we propose a method for robustly learning natural color models. Its novel aspects with respect to prior art are: firstly, a randomized HSL transformation that reflects the slight variations and noise of colors observed in real-world imaging sensors; secondly, a probabilistic ranking and selection of the training samples, which removes a considerable amount of outliers from the training data. These two techniques allow us to estimate robust color models that better resemble the variances seen in real-world images. The advantages of the proposed method over the current state-of-the-art technique using the training data without proper transformation and selection are confirmed in experimental evaluations. In combination, for models learnt with pLSA-bg and HSL, the proposed techniques reduce the amount of mislabeled objects by 19.87% on the well-known E-Bay data set.