FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » PDF extract text
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: PDF extract text [message #185515 is a reply to message #185508] Mon, 07 April 2014 15:53 Go to previous messageGo to previous message
Michael Vilain is currently offline  Michael Vilain
Messages: 88
Registered: September 2010
Karma:
Member
In article <lhtavi$dh$1(at)online(dot)de>,
Philipp Kraus <philipp(dot)kraus(at)flashpixx(dot)de> wrote:

> Hello,
>
> how can I extract text, images and other structures can be ignored,
> with PHP from a PDF file?
> We have a lot of LaTeX PDFs and Powerpoint PDFs and would like to
> extract only the text content
> to create a text analysis of the content eg for LaTeX scripts we would
> like the chapter structure as well.
>
> Is there any solution to do this with build-in PHP functions?
>
> Thanks
>
> Phil

I tried a bunch of stuff to read some bank statements that were in PDF
format so I could import them via CSV. Didn't work out so well. Adobe's
OCR feature only works if the PDFs are unlocked to allow it. I found an
application that would do that but the OCRed text was unusable.

So, my question is "what's generating the PDF files?" Can you get
whomever to do it in text or some other format? If they're encrypted
images, then you've got a lot of work to do in order to get some output.
Maybe.

Good luck with this...

--
DeeDee, don't press that button! DeeDee! NO! Dee...
[I filter all Goggle Groups posts, so any reply may be automatically ignored]
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: MYSQL PHP Query Not Working
Next Topic: Install Apache Php Windows 64 bit
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sun Dec 01 00:20:44 GMT 2024

Total time taken to generate the page: 0.08361 seconds