pre-commit for image handling
This commit is contained in:
parent
e78fbaade0
commit
955036240b
2 changed files with 182 additions and 1 deletions
|
|
@ -7,6 +7,6 @@ repos:
|
|||
name: Ban spicy exif data
|
||||
description: Ensures that there is no sensitive exif data committed
|
||||
language: system
|
||||
entry: exiftool -all= --icc_profile:all -overwrite_original
|
||||
entry: exiftool -all= --icc_profile:all -tagsfromfile @ -orientation -overwrite_original
|
||||
exclude_types: ["svg"]
|
||||
types: ["image"]
|
||||
|
|
|
|||
181
content/posts/pre-commit-exif-safety.md
Normal file
181
content/posts/pre-commit-exif-safety.md
Normal file
|
|
@ -0,0 +1,181 @@
|
|||
+++
|
||||
date = "2023-04-24"
|
||||
draft = false
|
||||
path = "/blog/pre-commit-exif-safety"
|
||||
tags = ["workflow", "git"]
|
||||
title = "pre-commit for safe image handling"
|
||||
+++
|
||||
|
||||
Modern cameras put a *lot* of metadata into images:
|
||||
|
||||
* GPS location
|
||||
* Device model
|
||||
* Camera software version
|
||||
* Subject distance
|
||||
* Facing direction
|
||||
* Colour profiles
|
||||
* Time, with timezone
|
||||
* Reasonable camera stuff like ISO, shutter speed, flash usage, focal length
|
||||
|
||||
This is mostly good actually, since it is useful to you as the photographer.
|
||||
I've very frequently found use for photo geolocation of old photos. However, it
|
||||
can present a significant privacy risk if you ever post or send someone
|
||||
verbatim image files taken by such a camera; in particular, the GPS
|
||||
coordinates. Out of an abundance of caution, I would prefer to strip *all* of
|
||||
it that is not necessary to displaying the image.
|
||||
|
||||
Image metadata, of course, is not the only way to cause yourself privacy
|
||||
problems with images. The *data* itself can be just as big of a problem: your
|
||||
OS vendor can [fuck up their filesystem APIs undocumentedly and cause an
|
||||
"acropalypse"][acropalypse], or particularly motivated stalkers can geolocate
|
||||
most distinctive things, especially if there's a background of "outside". That
|
||||
said, the phrase "your threat model is not my threat model but your threat
|
||||
model is OK" always rings true, and this may or may not actually be a
|
||||
consideration.
|
||||
|
||||
[acropalypse]: https://arstechnica.com/gadgets/2023/03/google-pixel-bug-lets-you-uncrop-the-last-four-years-of-screenshots/
|
||||
|
||||
It's a well known bug class to forget to strip image metadata so relatively few
|
||||
web tools will make the mistake of not stripping it, but sometimes [people mess
|
||||
it up][bug1]. If I used a web-based content management system, I would double
|
||||
check it, but I would expect that it strips any private metadata off of images.
|
||||
|
||||
[bug1]: https://gitlab.com/gitlab-org/gitlab/-/issues/239343
|
||||
|
||||
That said, this website (like many others run by computer dorks) is maintained
|
||||
with a static site generator, a [forked version][zola-fork] of [Zola], which
|
||||
takes text and image sources to generate HTML, as a compilation process: the
|
||||
source files are left untouched. Further, perhaps unfortunately, my [source
|
||||
files] are public so I had better not check in anything bad.
|
||||
|
||||
[zola-fork]: https://github.com/lf-/zola/tree/tree-painter
|
||||
[source files]: https://github.com/lf-/blog
|
||||
[Zola]: https://getzola.org
|
||||
|
||||
Ruh roh. Better not check in any images with metadata. I have, to date,
|
||||
succeeded in this purely by vigilance, but vigilance is not a robust process.
|
||||
Typically I stick the images into the GNU Image Manipulation Program and export
|
||||
fresh files without retaining EXIF metadata.
|
||||
|
||||
Let's fix this by instituting an automated barrier that also fixes images:
|
||||
[`exiftool`][exiftool] conveniently supports most image formats, and can do
|
||||
arbitrary metadata editing. We can ensure that it is always run on files before
|
||||
they are checked in by using a tool like [pre-commit] to create user-friendly
|
||||
Git hooks.
|
||||
|
||||
[exiftool]: https://exiftool.org
|
||||
[pre-commit]: https://pre-commit.com/
|
||||
|
||||
First, we need to find a `exiftool` invocation. We want to keep *some* metadata
|
||||
that is crucial to having the image display correctly: we need the colour
|
||||
profile so the colours are right, and we need the orientation (since phone
|
||||
cameras tend to rotate the image on the viewer side, probably because that
|
||||
makes rotation lossless).
|
||||
|
||||
The [manual][exiftool-docs] states:
|
||||
|
||||
[exiftool-docs]: https://exiftool.org/exiftool_pod.html
|
||||
|
||||
> --TAG
|
||||
>
|
||||
> Exclude specified tag from extracted information.
|
||||
>
|
||||
> (...)
|
||||
>
|
||||
> May also be used following a `-tagsFromFile` option to exclude tags from being
|
||||
> copied (when redirecting to another tag, it is the source tag that should be
|
||||
> excluded), or to exclude groups from being deleted when deleting all
|
||||
> information (eg. `-all= --exif:all` deletes all but EXIF information). But note
|
||||
> that this will not exclude individual tags from a group delete (unless a
|
||||
> family 2 group is specified, see note 4 below).
|
||||
>
|
||||
> Instead, individual tags may be recovered using the
|
||||
> `-tagsFromFile` option (eg. `-all= -tagsfromfile @ -artist`).
|
||||
|
||||
Hmm, so `-all= --icc_profile:all -tagsfromfile @ -orientation`, maybe?
|
||||
|
||||
{% codesample(desc="exiftool output") %}
|
||||
|
||||
```
|
||||
» exiftool PXL_20220116_223722991.jpg
|
||||
ExifTool Version Number : 12.50
|
||||
File Name : PXL_20220116_223722991.jpg
|
||||
Directory : .
|
||||
File Size : 1551 kB
|
||||
File Modification Date/Time : 2023:04:24 15:05:02-07:00
|
||||
File Access Date/Time : 2023:04:24 15:05:02-07:00
|
||||
File Inode Change Date/Time : 2023:04:24 15:05:02-07:00
|
||||
File Permissions : -rw-r--r--
|
||||
File Type : JPEG
|
||||
File Type Extension : jpg
|
||||
MIME Type : image/jpeg
|
||||
Exif Byte Order : Big-endian (Motorola, MM)
|
||||
Orientation : Horizontal (normal)
|
||||
X Resolution : 72
|
||||
Y Resolution : 72
|
||||
Resolution Unit : inches
|
||||
Y Cb Cr Positioning : Centered
|
||||
Profile CMM Type :
|
||||
Profile Version : 4.0.0
|
||||
Profile Class : Display Device Profile
|
||||
Color Space Data : RGB
|
||||
Profile Connection Space : XYZ
|
||||
Profile Date Time : 2016:12:08 09:38:28
|
||||
Profile File Signature : acsp
|
||||
Primary Platform : Unknown ()
|
||||
CMM Flags : Not Embedded, Independent
|
||||
Device Manufacturer : Google
|
||||
Device Model :
|
||||
Device Attributes : Reflective, Glossy, Positive, Color
|
||||
Rendering Intent : Perceptual
|
||||
Connection Space Illuminant : 0.9642 1 0.82491
|
||||
Profile Creator : Google
|
||||
Profile ID : 75e1a6b13c34376310c8ab660632a28a
|
||||
Profile Description : sRGB IEC61966-2.1
|
||||
Profile Copyright : Copyright (c) 2016 Google Inc.
|
||||
Media White Point : 0.95045 1 1.08905
|
||||
Media Black Point : 0 0 0
|
||||
Red Matrix Column : 0.43604 0.22249 0.01392
|
||||
Green Matrix Column : 0.38512 0.7169 0.09706
|
||||
Blue Matrix Column : 0.14305 0.06061 0.71391
|
||||
Red Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract)
|
||||
Chromatic Adaptation : 1.04788 0.02292 -0.05019 0.02959 0.99048 -0.01704 -0.00922 0.
|
||||
01508 0.75168
|
||||
Blue Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract)
|
||||
Green Tone Reproduction Curve : (Binary data 32 bytes, use -b option to extract)
|
||||
Image Width : 4080
|
||||
Image Height : 3072
|
||||
Encoding Process : Baseline DCT, Huffman coding
|
||||
Bits Per Sample : 8
|
||||
Color Components : 3
|
||||
Y Cb Cr Sub Sampling : YCbCr4:2:0 (2 2)
|
||||
Image Size : 4080x3072
|
||||
Megapixels : 12.5
|
||||
```
|
||||
{% end %}
|
||||
|
||||
Looks like it. It's not overwriting the file though, but it looks like there's
|
||||
`-overwrite_original` for that.
|
||||
|
||||
Let's put it all together into pre-commit: we want a [repo-local
|
||||
hook][precommit-repolocal] because it's easier to manage, so something like
|
||||
this as `.pre-commit-config.yml`:
|
||||
|
||||
[precommit-repolocal]: https://pre-commit.com/index.html#repository-local-hooks
|
||||
|
||||
```yaml
|
||||
repos:
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: no-spicy-exif
|
||||
name: Ban spicy exif data
|
||||
description: Ensures that there is no sensitive exif data committed
|
||||
language: system
|
||||
entry: exiftool -all= --icc_profile:all -tagsfromfile @ -orientation -overwrite_original
|
||||
exclude_types: ["svg"]
|
||||
types: ["image"]
|
||||
```
|
||||
|
||||
Check with `git add .pre-commit-config.yml image-with-gps.jpg && pre-commit
|
||||
run`, and it fails as expected. If we `git add` the file again, it will pass,
|
||||
and the file is now devoid of problematic metadata. Success!
|
||||
Loading…
Add table
Add a link
Reference in a new issue