From 955036240b290262fdae089fe5f56cef63ede454 Mon Sep 17 00:00:00 2001 From: Jade Lovelace Date: Mon, 24 Apr 2023 15:12:52 -0700 Subject: [PATCH] pre-commit for image handling --- .pre-commit-config.yaml | 2 +- content/posts/pre-commit-exif-safety.md | 181 ++++++++++++++++++++++++ 2 files changed, 182 insertions(+), 1 deletion(-) create mode 100644 content/posts/pre-commit-exif-safety.md diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 46af07e..cdc9f6c 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -7,6 +7,6 @@ repos: name: Ban spicy exif data description: Ensures that there is no sensitive exif data committed language: system - entry: exiftool -all= --icc_profile:all -overwrite_original + entry: exiftool -all= --icc_profile:all -tagsfromfile @ -orientation -overwrite_original exclude_types: ["svg"] types: ["image"] diff --git a/content/posts/pre-commit-exif-safety.md b/content/posts/pre-commit-exif-safety.md new file mode 100644 index 0000000..7e3d599 --- /dev/null +++ b/content/posts/pre-commit-exif-safety.md @@ -0,0 +1,181 @@ ++++ +date = "2023-04-24" +draft = false +path = "/blog/pre-commit-exif-safety" +tags = ["workflow", "git"] +title = "pre-commit for safe image handling" ++++ + +Modern cameras put a *lot* of metadata into images: + +* GPS location +* Device model +* Camera software version +* Subject distance +* Facing direction +* Colour profiles +* Time, with timezone +* Reasonable camera stuff like ISO, shutter speed, flash usage, focal length + +This is mostly good actually, since it is useful to you as the photographer. +I've very frequently found use for photo geolocation of old photos. However, it +can present a significant privacy risk if you ever post or send someone +verbatim image files taken by such a camera; in particular, the GPS +coordinates. Out of an abundance of caution, I would prefer to strip *all* of +it that is not necessary to displaying the image. + +Image metadata, of course, is not the only way to cause yourself privacy +problems with images. The *data* itself can be just as big of a problem: your +OS vendor can [fuck up their filesystem APIs undocumentedly and cause an +"acropalypse"][acropalypse], or particularly motivated stalkers can geolocate +most distinctive things, especially if there's a background of "outside". That +said, the phrase "your threat model is not my threat model but your threat +model is OK" always rings true, and this may or may not actually be a +consideration. + +[acropalypse]: https://arstechnica.com/gadgets/2023/03/google-pixel-bug-lets-you-uncrop-the-last-four-years-of-screenshots/ + +It's a well known bug class to forget to strip image metadata so relatively few +web tools will make the mistake of not stripping it, but sometimes [people mess +it up][bug1]. If I used a web-based content management system, I would double +check it, but I would expect that it strips any private metadata off of images. + +[bug1]: https://gitlab.com/gitlab-org/gitlab/-/issues/239343 + +That said, this website (like many others run by computer dorks) is maintained +with a static site generator, a [forked version][zola-fork] of [Zola], which +takes text and image sources to generate HTML, as a compilation process: the +source files are left untouched. Further, perhaps unfortunately, my [source +files] are public so I had better not check in anything bad. + +[zola-fork]: https://github.com/lf-/zola/tree/tree-painter +[source files]: https://github.com/lf-/blog +[Zola]: https://getzola.org + +Ruh roh. Better not check in any images with metadata. I have, to date, +succeeded in this purely by vigilance, but vigilance is not a robust process. +Typically I stick the images into the GNU Image Manipulation Program and export +fresh files without retaining EXIF metadata. + +Let's fix this by instituting an automated barrier that also fixes images: +[`exiftool`][exiftool] conveniently supports most image formats, and can do +arbitrary metadata editing. We can ensure that it is always run on files before +they are checked in by using a tool like [pre-commit] to create user-friendly +Git hooks. + +[exiftool]: https://exiftool.org +[pre-commit]: https://pre-commit.com/ + +First, we need to find a `exiftool` invocation. We want to keep *some* metadata +that is crucial to having the image display correctly: we need the colour +profile so the colours are right, and we need the orientation (since phone +cameras tend to rotate the image on the viewer side, probably because that +makes rotation lossless). + +The [manual][exiftool-docs] states: + +[exiftool-docs]: https://exiftool.org/exiftool_pod.html + +> --TAG +> +> Exclude specified tag from extracted information. +> +> (...) +> +> May also be used following a `-tagsFromFile` option to exclude tags from being +> copied (when redirecting to another tag, it is the source tag that should be +> excluded), or to exclude groups from being deleted when deleting all +> information (eg. `-all= --exif:all` deletes all but EXIF information). But note +> that this will not exclude individual tags from a group delete (unless a +> family 2 group is specified, see note 4 below). +> +> Instead, individual tags may be recovered using the +> `-tagsFromFile` option (eg. `-all= -tagsfromfile @ -artist`). + +Hmm, so `-all= --icc_profile:all -tagsfromfile @ -orientation`, maybe? + +{% codesample(desc="exiftool output") %} + +``` + ยป exiftool PXL_20220116_223722991.jpg +ExifTool Version Number : 12.50 +File Name : PXL_20220116_223722991.jpg +Directory : . +File Size : 1551 kB +File Modification Date/Time : 2023:04:24 15:05:02-07:00 +File Access Date/Time : 2023:04:24 15:05:02-07:00 +File Inode Change Date/Time : 2023:04:24 15:05:02-07:00 +File Permissions : -rw-r--r-- +File Type : JPEG +File Type Extension : jpg +MIME Type : image/jpeg +Exif Byte Order : Big-endian (Motorola, MM) +Orientation : Horizontal (normal) +X Resolution : 72 +Y Resolution : 72 +Resolution Unit : inches +Y Cb Cr Positioning : Centered +Profile CMM Type : +Profile Version : 4.0.0 +Profile Class : Display Device Profile +Color Space Data : RGB +Profile Connection Space : XYZ +Profile Date Time : 2016:12:08 09:38:28 +Profile File Signature : acsp +Primary Platform : Unknown () +CMM Flags : Not Embedded, Independent +Device Manufacturer : Google +Device Model : +Device Attributes : Reflective, Glossy, Positive, Color +Rendering Intent : Perceptual +Connection Space Illuminant : 0.9642 1 0.82491 +Profile Creator : Google +Profile ID : 75e1a6b13c34376310c8ab660632a28a +Profile Description : sRGB IEC61966-2.1 +Profile Copyright : Copyright (c) 2016 Google Inc. +Media White Point : 0.95045 1 1.08905 +Media Black Point : 0 0 0 +Red Matrix Column : 0.43604 0.22249 0.01392 +Green Matrix Column : 0.38512 0.7169 0.09706 +Blue Matrix Column : 0.14305 0.06061 0.71391 +Red Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract) +Chromatic Adaptation : 1.04788 0.02292 -0.05019 0.02959 0.99048 -0.01704 -0.00922 0. +01508 0.75168 +Blue Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract) +Green Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract) +Image Width : 4080 +Image Height : 3072 +Encoding Process : Baseline DCT, Huffman coding +Bits Per Sample : 8 +Color Components : 3 +Y Cb Cr Sub Sampling : YCbCr4:2:0 (2 2) +Image Size : 4080x3072 +Megapixels : 12.5 +``` +{% end %} + +Looks like it. It's not overwriting the file though, but it looks like there's +`-overwrite_original` for that. + +Let's put it all together into pre-commit: we want a [repo-local +hook][precommit-repolocal] because it's easier to manage, so something like +this as `.pre-commit-config.yml`: + +[precommit-repolocal]: https://pre-commit.com/index.html#repository-local-hooks + +```yaml +repos: + - repo: local + hooks: + - id: no-spicy-exif + name: Ban spicy exif data + description: Ensures that there is no sensitive exif data committed + language: system + entry: exiftool -all= --icc_profile:all -tagsfromfile @ -orientation -overwrite_original + exclude_types: ["svg"] + types: ["image"] +``` + +Check with `git add .pre-commit-config.yml image-with-gps.jpg && pre-commit +run`, and it fails as expected. If we `git add` the file again, it will pass, +and the file is now devoid of problematic metadata. Success!