Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation

Abstract

Intrinsic image decomposition and inverse rendering are long-standing problems in computer vision. To evaluate albedo recovery, most algorithms report their quantitative performance with a mean Weighted Human Disagreement Rate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relative albedo values and often fails to capture overall quality of the albedo. In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR: intensity, chromaticity and texture metrics. We show that existing algorithms often improve WHDR metric but perform poorly on other metrics.

We then finetune different algorithms on our MAW dataset to significantly improve the quality of the reconstructed albedo both quantitatively and qualitatively. Since the proposed intensity, chromaticity, and texture metrics and the WHDR are all complementary we further introduce a relative performance measure that captures average performance. By analysing existing algorithms we show that there is significant room for improvement. Our dataset and evaluation metrics will enable researchers to develop algorithms that improve albedo reconstruction.

Overview of MAW Dataset Collection and Evaluation Pipeline

Overview of the data collection and evaluation pipeline of MAW dataset. For each region with homogeneous albedo, we measure albedo as a single RGB vector after averaging. For a given predicted albedo image, we select pixels that lie inside the region, yielding a single RGB vector after averaging. We then compare the predicted albedo against the ground-truth using our metrics.

WHDR Metric Alone

Many existing algorithm produce albedo with strong artifacts despite good IIW score, due to worse chromaticity (first row), intensity (second row), and texture (third row).

WHDR + Proposed Metrics

With WHDR + proposed metrics, we can capture these artifacts in chromaticity (first row), intensity (second row), and texture (third row).

Finetuning

Despite that our dataset only contain 888 images, we show finetuning significantly improves albedo prediction of state-of-the-art algorithms. Notice the improvements in color tinge.

BibTeX


	@misc{wu2023measured,
      title={Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation}, 
      author={Jiaye Wu and Sanjoy Chowdhury and Hariharmano Shanmugaraja and David Jacobs and Soumyadip Sengupta},
      year={2023},
      eprint={2306.15662},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}