by
43 10
1
7
489k
15
Top 1% !
Popular
Famous
Pearl of Wisdom
Easy-to-find
Specified
OpenSource
Popularity: 1426th place
Languagepython

Extract JPG images from a PDF

A command line tool to extract jpg images from pdf files.
Copy Embed Code
<iframe id="embedFrame" style="width:600px; height:300px;"
src="https://www.snip2code.com/Embed/608791/Extract-JPG-images-from-a-PDF?startLine=0"></iframe>
Click on the embed code to copy it into your clipboard Width Height
Leave empty to retrieve all the content Start End
#!/usr/bin/env python import sys def main(): """Extract JPG's from PSD's. Usage: python extract.py filename.pdf Note: All extracted images will be saved to the directory the script is initialized in. """ try: pdf = file(sys.argv[1], "rb").read() except Exception: print "Usage: `python extract.py filename.pdf`" return startmark, endmark = "\xff\xd8", "\xff\xd9" startfix, endfix, i, njpg = 0, 2, 0, 0 while True: istream = pdf.find("stream", i) if istream < 0: break istart = pdf.find(startmark, istream, istream+20) if istart < 0: i = istream+20 continue iend = pdf.find("endstream", istart) if iend < 0: raise Exception("Couldn't find end of stream.") iend = pdf.find(endmark, iend-20) if iend < 0: raise Exception("Couldn't find end of JPG.") istart += startfix iend += endfix jpg = pdf[istart:iend] with open("jpg%d.jpg" % njpg, "wb") as _f: _f.write(jpg) njpg += 1 i = iend print "Extracted %s JPG files." % njpg main()
If you want to be updated about similar snippets, Sign in and follow our Channels

blog comments powered by Disqus