FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » insert PDF table in database
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
insert PDF table in database [message #181540] Tue, 21 May 2013 07:04 Go to next message
sarika is currently offline  sarika
Messages: 1
Registered: May 2013
Karma: 0
Junior Member
Hi All

What i want is to read content in PDF table and convert it into either XML or associative array to be inserted in database on the fly.

I have gone through many libraries on net providing text extraction from PDF and converting in array but that array does not seem to be useful as its not associative array and array indexing is also not proper.

Thanks in advance for the replies but i am really stuck with this major issue.
My project manager wants me to implement as soon as possible.
Re: insert PDF table in database [message #181541 is a reply to message #181540] Tue, 21 May 2013 08:48 Go to previous messageGo to next message
Goran is currently offline  Goran
Messages: 38
Registered: January 2011
Karma: 0
Member
On 21.5.2013 9:04, sarika wrote:
> My project manager wants me to implement as soon as possible.
>

Tell him "you do it" :)
Re: insert PDF table in database [message #181547 is a reply to message #181540] Tue, 21 May 2013 10:07 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
On 5/21/2013 3:04 AM, sarika wrote:
> Hi All
>
> What i want is to read content in PDF table and convert it into either XML or associative array to be inserted in database on the fly.
>
> I have gone through many libraries on net providing text extraction from PDF and converting in array but that array does not seem to be useful as its not associative array and array indexing is also not proper.
>
> Thanks in advance for the replies but i am really stuck with this major issue.
> My project manager wants me to implement as soon as possible.
>

It's impossible to help you when you don't show any code and don't tell
us *exactly* what's wrong with your code. "It's not an associative
array" and "array indexing is not proper" don't provide enough information.

If you want help, we need things like:

1). A sample input document
2). The code you're using
3). The expected output
4). The output you got

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Re: insert PDF table in database [message #181548 is a reply to message #181540] Tue, 21 May 2013 10:08 Go to previous messageGo to next message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
sarika, 2013-05-21 09:04:

[...]
> My project manager wants me to implement as soon as possible.

And what is your manager willing to pay for a solution? ;-)


--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
Re: insert PDF table in database [message #181550 is a reply to message #181548] Tue, 21 May 2013 11:46 Go to previous messageGo to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 993
Registered: September 2010
Karma: 0
Senior Member
On 21/05/13 11:08, Arno Welzel wrote:
> sarika, 2013-05-21 09:04:
>
> [...]
>> My project manager wants me to implement as soon as possible.
> And what is your manager willing to pay for a solution? ;-)
>
>
The problem is that two apparently identical PDFs can be totally
different internally.

Yea even down to being simply bitmaps with no text, that at intended
resolution appear identical.

at best you can strip out most of the text and with a bit of
intelligence, sometimes even get it in the right order.


--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to lead are elected by the least capable of producing, and where the members of society least likely to sustain themselves or succeed, are rewarded with goods and services paid for by the confiscated wealth of a diminishing number of producers.
Re: insert PDF table in database [message #181555 is a reply to message #181540] Tue, 21 May 2013 16:27 Go to previous messageGo to next message
Michael Vilain is currently offline  Michael Vilain
Messages: 88
Registered: September 2010
Karma: 0
Member
In article <b6c1cfb3-1f8b-48c5-8822-25d10402d896(at)googlegroups(dot)com>,
sarika <sarikasoni12(at)gmail(dot)com> wrote:

> Hi All
>
> What i want is to read content in PDF table and convert it into either XML or
> associative array to be inserted in database on the fly.
>
> I have gone through many libraries on net providing text extraction from PDF
> and converting in array but that array does not seem to be useful as its not
> associative array and array indexing is also not proper.
>
> Thanks in advance for the replies but i am really stuck with this major
> issue.
> My project manager wants me to implement as soon as possible.

I ran across this problem with various bank statements that I downloaded
via my bank's personal web site. The PDFs were encrypted and set with
certain properties that didn't allow scanning of the text layer. Unless
you are able to decrypt and do OCR on the PDFs, you're wasting your time
here. The problem isn't as simple as your manager would think. At
best, you could offer a partial solution of being able to scan "some"
PDF files but without libraries to decrypt and OCR the text, that's all
you can do.

Those libraries are probably on-line somewhere for a fee. Buy the
solution if you're in a time crunch. Beating the fastest horse on your
team is poor project management skills and won't get him the code any
faster.

--
DeeDee, don't press that button! DeeDee! NO! Dee...
[I filter all Goggle Groups posts, so any reply may be automatically ignored]
Re: insert PDF table in database [message #181560 is a reply to message #181555] Tue, 21 May 2013 19:48 Go to previous message
J.O. Aho is currently offline  J.O. Aho
Messages: 194
Registered: September 2010
Karma: 0
Senior Member
On 21/05/13 18:27, Michael Vilain wrote:
> In article <b6c1cfb3-1f8b-48c5-8822-25d10402d896(at)googlegroups(dot)com>,
> sarika <sarikasoni12(at)gmail(dot)com> wrote:
>
>> Hi All
>>
>> What i want is to read content in PDF table and convert it into either XML or
>> associative array to be inserted in database on the fly.
>>
>> I have gone through many libraries on net providing text extraction from PDF
>> and converting in array but that array does not seem to be useful as its not
>> associative array and array indexing is also not proper.
>>
>> Thanks in advance for the replies but i am really stuck with this major
>> issue.
>> My project manager wants me to implement as soon as possible.
>
> I ran across this problem with various bank statements that I downloaded
> via my bank's personal web site. The PDFs were encrypted and set with
> certain properties that didn't allow scanning of the text layer. Unless
> you are able to decrypt and do OCR on the PDFs, you're wasting your time
> here. The problem isn't as simple as your manager would think. At
> best, you could offer a partial solution of being able to scan "some"
> PDF files but without libraries to decrypt and OCR the text, that's all
> you can do.
>
> Those libraries are probably on-line somewhere for a fee. Buy the
> solution if you're in a time crunch. Beating the fastest horse on your
> team is poor project management skills and won't get him the code any
> faster.
>

Most likely the company said they can do this to their western customer,
then a manager gets the task to see to that his team solves the problem,
the work is then pushed to a "shadow resource" who looks for solutions
online. If not managing to solve the issue, there is always hundreds of
others to replace that person with. At least that is my experience how
things work in India.

--

//Aho
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: no date header in mail()
Next Topic: values not changing
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Nov 30 20:00:02 GMT 2024

Total time taken to generate the page: 0.01995 seconds