Lossy data compression

A patch of a simulated image, shown on the top left before compression and shown on the top right after codec. The residuals (multiplied by a factor of 5 for clarity) are shown on the bottom.

A patch of a simulated image, shown on the top left before compression and shown on the top right after codec, for an aggressive level of compression. The residuals (multiplied by a factor of 5 for clarity) are shown on the bottom.

The utility of weak lensing studies depends critically on our ability to accurately measure galaxy shapes so that we can extract the gravitational shearing from our images. Unfortunately there are systematic effects that mimic this shear signal, stemming from the atmosphere, imperfect telescope optics, detector inefficiencies, and even the data analysis pipeline.

For instance, due to the large volume of data to be expected of future space-based weak lensing surveys, telemetry limitations will likely make it necessary to use lossy data compression. I led a study to quantify the effects, to ask: Can we get the necessary level of data compression without significantly biasing galaxy shapes, and thus the weak lensing signal?

To answer this question we must first find an optimal compression algorithm, and then test it on astronomical images. The former is covered in Bernstein et al. (2010), where we study and refine a popular compression scheme which involves taking a weighted square-root of pixel values and then rounding the result to the nearest integer. Such a scheme is ideal for astronomical imaging, as each pixel is compressed independently and the information discarded is by construction less than the Poisson error from photon shot noise. Also, importantly, the errors induced by the compression-decompression (“codec”) process are highest for the brightest objects (such as cosmic rays) and lowest for the faintest (such as the weakly-lensed galaxies of interest).

Shifts in the two components of ellipticity — e1 (blue, dotted line) and e2 (black, dashed line) — as a function of the mean e1 and e2, respectively, when the galaxies are sorted into ﬁve wide bins and the shifts (codec vs. original) are averaged, for the target level of data compression. For comparison, the same is plotted for e1 (green, solid line) and e2 (magenta, dot-dashed line) for a less-severe but insufficient codec scheme.

In Vanderveld et al. (2011), I tested this compression algorithm using simulated images. I demonstrated that it does not bias the sky background and induces a small amount of extra digitization noise to the images. As for galaxy shapes, which I measured using the RRG algorithm, I found and quantified a small increase in ellipticity measurement noise and a significant but removable ellipticity bias.

Vanderveld et al. (2011) and Bernstein et al. (2010). Published in Publications of the Astronomical Society of the Pacific.

http://adsabs.harvard.edu/abs/2011PASP..123..996V

http://adsabs.harvard.edu/abs/2010PASP..122..336B

R Ali Vanderveld

Data Scientist / Astrophysicist

Lossy data compression

Share this: