+++ date = "2023-04-24" draft = false path = "/blog/pre-commit-exif-safety" tags = ["workflow", "git"] title = "pre-commit for safe image handling" +++ Modern cameras put a *lot* of metadata into images: * GPS location * Device model * Camera software version * Subject distance * Facing direction * Colour profiles * Time, with timezone * Reasonable camera stuff like ISO, shutter speed, flash usage, focal length This is mostly good actually, since it is useful to you as the photographer. I've very frequently found use for photo geolocation of old photos. However, it can present a significant privacy risk if you ever post or send someone verbatim image files taken by such a camera; in particular, the GPS coordinates. Out of an abundance of caution, I would prefer to strip *all* of it that is not necessary to displaying the image. Image metadata, of course, is not the only way to cause yourself privacy problems with images. The *data* itself can be just as big of a problem: your OS vendor can [fuck up their filesystem APIs undocumentedly and cause an "acropalypse"][acropalypse], or particularly motivated stalkers can geolocate most distinctive things, especially if there's a background of "outside". That said, the phrase "your threat model is not my threat model but your threat model is OK" always rings true, and this may or may not actually be a consideration. [acropalypse]: https://arstechnica.com/gadgets/2023/03/google-pixel-bug-lets-you-uncrop-the-last-four-years-of-screenshots/ It's a well known bug class to forget to strip image metadata so relatively few web tools will make the mistake of not stripping it, but sometimes [people mess it up][bug1]. If I used a web-based content management system, I would double check it, but I would expect that it strips any private metadata off of images. [bug1]: https://gitlab.com/gitlab-org/gitlab/-/issues/239343 That said, this website (like many others run by computer dorks) is maintained with a static site generator, a [forked version][zola-fork] of [Zola], which takes text and image sources to generate HTML, as a compilation process: the source files are left untouched. Further, perhaps unfortunately, my [source files] are public so I had better not check in anything bad. [zola-fork]: https://github.com/lf-/zola/tree/tree-painter [source files]: https://github.com/lf-/blog [Zola]: https://getzola.org Ruh roh. Better not check in any images with metadata. I have, to date, succeeded in this purely by vigilance, but vigilance is not a robust process. Typically I stick the images into the GNU Image Manipulation Program and export fresh files without retaining EXIF metadata. Let's fix this by instituting an automated barrier that also fixes images: [`exiftool`][exiftool] conveniently supports most image formats, and can do arbitrary metadata editing. We can ensure that it is always run on files before they are checked in by using a tool like [pre-commit] to create user-friendly Git hooks. [exiftool]: https://exiftool.org [pre-commit]: https://pre-commit.com/ First, we need to find a `exiftool` invocation. We want to keep *some* metadata that is crucial to having the image display correctly: we need the colour profile so the colours are right, and we need the orientation (since phone cameras tend to rotate the image on the viewer side, probably because that makes rotation lossless). The [manual][exiftool-docs] states: [exiftool-docs]: https://exiftool.org/exiftool_pod.html > --TAG > > Exclude specified tag from extracted information. > > (...) > > May also be used following a `-tagsFromFile` option to exclude tags from being > copied (when redirecting to another tag, it is the source tag that should be > excluded), or to exclude groups from being deleted when deleting all > information (eg. `-all= --exif:all` deletes all but EXIF information). But note > that this will not exclude individual tags from a group delete (unless a > family 2 group is specified, see note 4 below). > > Instead, individual tags may be recovered using the > `-tagsFromFile` option (eg. `-all= -tagsfromfile @ -artist`). Hmm, so `-all= --icc_profile:all -tagsfromfile @ -orientation`, maybe? {% codesample(desc="exiftool output") %} ``` ยป exiftool PXL_20220116_223722991.jpg ExifTool Version Number : 12.50 File Name : PXL_20220116_223722991.jpg Directory : . File Size : 1551 kB File Modification Date/Time : 2023:04:24 15:05:02-07:00 File Access Date/Time : 2023:04:24 15:05:02-07:00 File Inode Change Date/Time : 2023:04:24 15:05:02-07:00 File Permissions : -rw-r--r-- File Type : JPEG File Type Extension : jpg MIME Type : image/jpeg Exif Byte Order : Big-endian (Motorola, MM) Orientation : Horizontal (normal) X Resolution : 72 Y Resolution : 72 Resolution Unit : inches Y Cb Cr Positioning : Centered Profile CMM Type : Profile Version : 4.0.0 Profile Class : Display Device Profile Color Space Data : RGB Profile Connection Space : XYZ Profile Date Time : 2016:12:08 09:38:28 Profile File Signature : acsp Primary Platform : Unknown () CMM Flags : Not Embedded, Independent Device Manufacturer : Google Device Model : Device Attributes : Reflective, Glossy, Positive, Color Rendering Intent : Perceptual Connection Space Illuminant : 0.9642 1 0.82491 Profile Creator : Google Profile ID : 75e1a6b13c34376310c8ab660632a28a Profile Description : sRGB IEC61966-2.1 Profile Copyright : Copyright (c) 2016 Google Inc. Media White Point : 0.95045 1 1.08905 Media Black Point : 0 0 0 Red Matrix Column : 0.43604 0.22249 0.01392 Green Matrix Column : 0.38512 0.7169 0.09706 Blue Matrix Column : 0.14305 0.06061 0.71391 Red Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract) Chromatic Adaptation : 1.04788 0.02292 -0.05019 0.02959 0.99048 -0.01704 -0.00922 0. 01508 0.75168 Blue Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract) Green Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract) Image Width : 4080 Image Height : 3072 Encoding Process : Baseline DCT, Huffman coding Bits Per Sample : 8 Color Components : 3 Y Cb Cr Sub Sampling : YCbCr4:2:0 (2 2) Image Size : 4080x3072 Megapixels : 12.5 ``` {% end %} Looks like it. It's not overwriting the file though, but it looks like there's `-overwrite_original` for that. Let's put it all together into pre-commit: we want a [repo-local hook][precommit-repolocal] because it's easier to manage, so something like this as `.pre-commit-config.yml`: [precommit-repolocal]: https://pre-commit.com/index.html#repository-local-hooks ```yaml repos: - repo: local hooks: - id: no-spicy-exif name: Ban spicy exif data description: Ensures that there is no sensitive exif data committed language: system entry: exiftool -all= --icc_profile:all -tagsfromfile @ -orientation -overwrite_original exclude_types: ["svg"] types: ["image"] ``` Check with `git add .pre-commit-config.yml image-with-gps.jpg && pre-commit run`, and it fails as expected. If we `git add` the file again, it will pass, and the file is now devoid of problematic metadata. Success!