in_array performance in unsorted vs sorted array [message #178204] |
Wed, 23 May 2012 06:20 |
William Gill
Messages: 31 Registered: March 2011
Karma: 0
|
Member |
|
|
I am reading transaction records from files. Each record has an
alphanumeric GUID but that record may be repeated in more than one file
(because of overlapping samples). I don't want to process duplicate
records, so I am considering a simple flat file to store the GUID's of
previously processed records.
To keep things simple I plan to use $done=file() to read the flat file,
and a simple if in_array to see if the current GUID has already been
processed, if not process the current record and add its GUID to $done.
Does anyone know if sorting an array has any significant impact on
in_array, or can I simply push push values into $done?
Also is there a better way than foreach() write() to get $done back into
the flat file?
|
|
|
Re: in_array performance in unsorted vs sorted array [message #178205 is a reply to message #178204] |
Wed, 23 May 2012 12:16 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 5/23/2012 2:20 AM, William Gill wrote:
> I am reading transaction records from files. Each record has an
> alphanumeric GUID but that record may be repeated in more than one file
> (because of overlapping samples). I don't want to process duplicate
> records, so I am considering a simple flat file to store the GUID's of
> previously processed records.
>
> To keep things simple I plan to use $done=file() to read the flat file,
> and a simple if in_array to see if the current GUID has already been
> processed, if not process the current record and add its GUID to $done.
>
> Does anyone know if sorting an array has any significant impact on
> in_array, or can I simply push push values into $done?
>
> Also is there a better way than foreach() write() to get $done back into
> the flat file?
>
Arrays in PHP are associative; their keys are handled as hash values.
So I suspect it makes no difference on whether the array is sorted or not.
Also, what's wrong with foreach() write()? That's how you get arrays
back into a file. Ensure you lock the file so that you don't have two
scripts running against it at the same time.
But it sounds like you should be using a database. That would solve a
lot of your problems.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|
Re: in_array performance in unsorted vs sorted array [message #178211 is a reply to message #178205] |
Wed, 23 May 2012 14:57 |
William Gill
Messages: 31 Registered: March 2011
Karma: 0
|
Member |
|
|
On 5/23/2012 8:16 AM, Jerry Stuckle wrote:
>
> Arrays in PHP are associative; their keys are handled as hash values. So
> I suspect it makes no difference on whether the array is sorted or not.
>
OK, That seems to be in sync with what I'm (not) finding in Google.
> Also, what's wrong with foreach() write()? That's how you get arrays
> back into a file. Ensure you lock the file so that you don't have two
> scripts running against it at the same time.
>
Nothing. I just wanted to be sure I wasn't unaware of something better,
like a function inverse to file(). Probably looking at less then 100k
records at any time so I guess performance won't be a problem.
> But it sounds like you should be using a database. That would solve a
> lot of your problems.
>
Yes it does, and that's where this is heading. Right now it is just
summarizing some information, but eventually this will become a
pre-processor for a db.
|
|
|
Re: in_array performance in unsorted vs sorted array [message #178212 is a reply to message #178211] |
Wed, 23 May 2012 15:01 |
William Gill
Messages: 31 Registered: March 2011
Karma: 0
|
Member |
|
|
On 5/23/2012 10:57 AM, William Gill wrote:
> On 5/23/2012 8:16 AM, Jerry Stuckle wrote:
>>
>> Arrays in PHP are associative; their keys are handled as hash values. So
>> I suspect it makes no difference on whether the array is sorted or not.
>>
> OK, That seems to be in sync with what I'm (not) finding in Google.
>
>> Also, what's wrong with foreach() write()? That's how you get arrays
>> back into a file. Ensure you lock the file so that you don't have two
>> scripts running against it at the same time.
>>
> Nothing. I just wanted to be sure I wasn't unaware of something better,
> like a function inverse to file(). Probably looking at less then 100k
> records at any time so I guess performance won't be a problem.
>
>> But it sounds like you should be using a database. That would solve a
>> lot of your problems.
>>
> Yes it does, and that's where this is heading. Right now it is just
> summarizing some information, but eventually this will become a
> pre-processor for a db.
Pulled the trigger too soon. forgot to say Thanks.
|
|
|
Re: in_array performance in unsorted vs sorted array [message #178237 is a reply to message #178204] |
Wed, 23 May 2012 09:36 |
Captain Paralytic
Messages: 204 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On May 23, 7:20 am, William Gill <nore...@domain.invalid> wrote:
> I am reading transaction records from files. Each record has an
> alphanumeric GUID but that record may be repeated in more than one file
> (because of overlapping samples). I don't want to process duplicate
> records, so I am considering a simple flat file to store the GUID's of
> previously processed records.
>
> To keep things simple I plan to use $done=file() to read the flat file,
> and a simple if in_array to see if the current GUID has already been
> processed, if not process the current record and add its GUID to $done.
>
> Does anyone know if sorting an array has any significant impact on
> in_array, or can I simply push push values into $done?
>
> Also is there a better way than foreach() write() to get $done back into
> the flat file?
Why not load the files into a database table which has a primary key
of the GUID, then you have one record for each GUID
|
|
|
Re: in_array performance in unsorted vs sorted array [message #178248 is a reply to message #178205] |
Thu, 24 May 2012 20:20 |
Thomas Mlynarczyk
Messages: 131 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
Jerry Stuckle schrieb:
>> [Performance of in_array()]
> Arrays in PHP are associative; their keys are handled as hash values. So
> I suspect it makes no difference on whether the array is sorted or not.
The /keys/ are hash values, yes. But in_array() searches through the
/values/, not the keys, so I suppose (haven't tested it) the performance
is O(n) rather than O(1). On the other hand, I doubt if PHP keeps track
of whether the array is sorted or not, so it probably makes no
difference indeed, as you said.
Greetings,
Thomas
--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
|
|
|