Using ChatGPT Code Interpreter to Extract Text from PDF Files
Happy 30th anniversary to Adobe Acrobat - born June 15, 1993.
It is, of course, a global standard and useful format, but the tools have always been cumbersome and multi-step in my opinion. This morning I wanted to extract some text via OCR on something scanned that was sent to me in someone's very interesting handwriting. It was ALL CaPS so the OCR should have a good time with it but I find the built-in Acrobat OCR scan and enhancement just gets a bit bogged down.
Using ChatGPT you can do an OCR extract on a PDF file! A Plus, this gives you the quick python code to reuse! Simple upload and ask.
But wait, there's more!. The OCR text, as I mentioned, was in some interesting ALL-CAPS writing, and so the extraction was not perfect. But I was able to take the text and ask chatGPT to rewrite it, giving its best guess as to what was there. And it did masterfully.
For example, it was able to take this short paragraph:
—T WHEN DIO A BATON AND 4% TASER BEcows 4H PEnoiy Wewpon
and decipher it as:
Question: When did a baton and a "Taser" become a deadly weapon?
It did 9 pages of interpretation JUST LIKE THAT! Amazing.