Blank page detection on the command line with imagemagick

Tagged:  

So you've got some scanned pages or images in some format like PDF or jpg and you want to find all the blank ones.

You can use imagemagick to do this. This solution should be able to handle variations in page color, page noise, etc.

Here is how to do it in the shell. You could do something similar with imagemagick bindings for some programming language you like.

Adjust values as needed.

 convert scanned_page.pdf -shave 1%x1% -resize 40% -fuzz 10% -trim +repage info: | grep ' 1x1 '

read more to see what this means

convert scanned_page.pdf

We are only considering the case where the pdf file has one page only. I am not sure how multiple pages would work, my workflow only involves single paged pdfs so I have not looked into it. You can use something like pdftk to split a pdf into pages.

Imagemagick can read nearly any image format you throw at it, I just happen to be using a pdf file.

-shave 1%x1%

My scanner includes the edge of the paper in the scan, so lets not even consider the outer 1%.

-resize 40%

Resizing denoises the image and makes the trim faster (i assume)

-fuzz 10% -trim +repage

Use the trim option with fuzz 10% to try to trim the page.

info: | grep ' 1x1 '

The info: is an argument to convert that makes it output image information to stdout instead of outputing an actual image. If convert didn't find anything to trim (in otherwords we have a blank page) then imagemagick spits out a 1 by 1 pixel image. The grep will see that in the info: output, don'te forget the spaces inside the quotes.
There is probably a better way to do this.

Links

Imagemagick documentation covering trim

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is to verify that you are a human.