Re: Record pagination with count of total rows [message #170494 is a reply to message #170480] |
Wed, 03 November 2010 06:31 |
Norman Peelman
Messages: 126 Registered: September 2010
Karma:
|
Senior Member |
|
|
matt wrote:
> On Nov 2, 4:51 am, Erwin Moller
> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>> On 11/1/2010 3:26 PM, matt wrote:
>>
>>> On Nov 1, 9:33 am, Erwin Moller
>>> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>>>> On 11/1/2010 1:42 PM, matt wrote:
>>>> > Hi all,
>>>> Hi Matt,
>> <snip>
>>
>>>> > I'm just curious to see how other people have addressed this problem
>>>> > and what you've found to be the up/down sides of your chosen solution.
>>>> The way I approach this:
>>>> 1) Run the query once, and store the ID's of something that will enable
>>>> you to fetch rest of the tuple based on that number.
>>>> (For example: store a userid, and you know you can easily find the rest
>>>> of the accompanying information based on a userid)
>>> An interesting concept. Unfortunately, it doesn't address the time of
>>> the first query. Although, I'm surprised I didn't think of it...
>> True. But the query must run anyway, I expect.
>> So if it takes 70 seconds, I don't see how you can eliminate that.
>> The best you can do is trying to make it run only once, I expect.
>
> Yeah...I was hoping to save some execution time using LIMIT, but I'm
> beginning to think no matter what I implement, it does need to run the
> full query up front.
>
> [snip]
>
>>> The synchronization problem occurs when I amend the previous list:
>>> 1.5 Other user deletes record #8.
>>> Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
>>> should start on record #11 and my total count should be 19.
>>> Same problem occurs if new data is entered that should be included in
>>> the results after the user's initial query.
>>> I think that even with this approach, there needs to be a way of
>>> telling individual sessions "your results are no longer valid, you
>>> need to rerun the full query."
>> Yes and no.
>> You say two things: Record deleted and underlying data changed.
>> (Well, actually a deleted record is also 'data changed' of course, but
>> they can be treated separately I think.)
>>
>> 1) Record deleted
>> This is no problem. If you write your code right, you will end up with 9
>> results instead of 10. (Number 8 was deleted).
>> That is no big problem in most cases: pagination is more for convenience
>> in most cases. (Of course, again, I cannot make that decision: you and
>> your client must decide if that is acceptable.)
>
> I can hear it now: "I said 25 rows per page and I only got 24!!! The
> web site's broken!!!"
>
> I'm actually OK with re-running the full query if there is a set flag
> that says the data has been changed in any way.
>
>> Anyway: number 2 is the bigger problem:
>>
>> 2) Underlying data changed
>> This is tricky. I can think of three approaches:
>> a) re-query every time, like you did earlier, but that is unacceptable
>> slow.
>
> Exactly.
>
>> b) Place triggers on all relevant tables, that fill some other table
>> with the changed records and also timestamps.
>> Your process can then look up each invocation if there is a need to
>> requery and comes back with what you wrote: "your results are no longer
>> valid, you need to rerun the full query."
>
> That's interesting...I was thinking that a change in any JOIN'ed table
> would trigger a full refresh, but you're right--we may be able to
> implement something that's a little smarter and more granular.
>
>> The second solution (b) is by far the most elegant, but also the most
>> complex/bug-prone: you are actually building your own custom caching
>> system, that triggers a fresh query when needed.
>>
>> c) A hybrid: Do a full search ONLY on the IDs you have. So apply all
>> other searchcriteria but include the:
>> "WHERE pkTable IN (10, 11 ... 19)"
>> This will ensure for you that the results are maybe not the right number
>> (10 in our example), but they ARE guaranteed accurate.
>
> That's not bad either. I'd almost venture to say that we use
> temporary tables to flatten the lookups in conjunction with this as
> well. So, first execution populates a temp (memory?) table and then
> queries that table for the full result set. Subsequent page loads
> first check a flag that's set by the insert/update/delete logic that
> says either (a) this whole temporary table is invalid or (b) these
> rows are invalid and regenerates the requisite data into the temp
> table.
>
> Then, we could use Alvaro's suggestion to both limit rows in the query
> and provide full result counts much more efficiently (no joins).
>
> I doubt this is going to be implemented for the current app that's
> supposed to roll on Dec. 1, but this whole app suite is on a custom
> Framework, so this could easily be built in for all future reporting.
>
>> I have never done that last one (c), so here my experience ends. :-/
>> But it doesn't sound impossible at all.
>> Smart database engines can use that extra information to seriously
>> reduce the time needed to scan all included tables.
>> (And also: some databases, eg Postgresql, can accept hints.)
>
> Switching the RDBMS is a last-ditch option, but not one that's off the
> table. We'd probably be talking Oracle, but I'll keep pg in mind.
>
> [snip]
>
> Thanks again.
Other options to consider:
MySQL - make sure that the result (query) buffer is sufficiently large
enough to hold many such result sets so that the full scan queries run
once and your paginated queries pull from the buffer as MySQL doesn't go
back to disk if the result set is already buffered (unless something has
changed). If something has changed MySQL should go back to disk for the
result set - this should be handled by MySQL automatically.
Using 'LIMIT offset, numrows' during pagination should dramatically
reduce query time when the full result set is already in the buffer.
PHP - use APC (Alternative PHP Cache) to store result sets in shared
memory with reasonable time to live. Probably about the same as using a
session variable but faster/less resources when multiple people doing
same query (how often would that happen?). This could allow (with the
proper logic) users to store/bookmark queries and come back to them
without hitting the database again (good for queries of data that may
not change).
--
Norman
Registered Linux user #461062
-Have you been to www.php.net yet?-
|
|
|