Re: Record pagination with count of total rows [message #170480 is a reply to message #170474] |
Tue, 02 November 2010 12:28 |
matt[1]
Messages: 40 Registered: September 2010
Karma:
|
Member |
|
|
On Nov 2, 4:51 am, Erwin Moller
<Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
> On 11/1/2010 3:26 PM, matt wrote:
>
>> On Nov 1, 9:33 am, Erwin Moller
>> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>>> On 11/1/2010 1:42 PM, matt wrote:
>
>>>> Hi all,
>
>>> Hi Matt,
>
> <snip>
>
>>>> I'm just curious to see how other people have addressed this problem
>>>> and what you've found to be the up/down sides of your chosen solution..
>
>>> The way I approach this:
>
>>> 1) Run the query once, and store the ID's of something that will enable
>>> you to fetch rest of the tuple based on that number.
>>> (For example: store a userid, and you know you can easily find the rest
>>> of the accompanying information based on a userid)
>
>> An interesting concept. Unfortunately, it doesn't address the time of
>> the first query. Although, I'm surprised I didn't think of it...
>
> True. But the query must run anyway, I expect.
> So if it takes 70 seconds, I don't see how you can eliminate that.
> The best you can do is trying to make it run only once, I expect.
Yeah...I was hoping to save some execution time using LIMIT, but I'm
beginning to think no matter what I implement, it does need to run the
full query up front.
[snip]
>> The synchronization problem occurs when I amend the previous list:
>
>> 1.5 Other user deletes record #8.
>
>> Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
>> should start on record #11 and my total count should be 19.
>
>> Same problem occurs if new data is entered that should be included in
>> the results after the user's initial query.
>
>> I think that even with this approach, there needs to be a way of
>> telling individual sessions "your results are no longer valid, you
>> need to rerun the full query."
>
> Yes and no.
> You say two things: Record deleted and underlying data changed.
> (Well, actually a deleted record is also 'data changed' of course, but
> they can be treated separately I think.)
>
> 1) Record deleted
> This is no problem. If you write your code right, you will end up with 9
> results instead of 10. (Number 8 was deleted).
> That is no big problem in most cases: pagination is more for convenience
> in most cases. (Of course, again, I cannot make that decision: you and
> your client must decide if that is acceptable.)
I can hear it now: "I said 25 rows per page and I only got 24!!! The
web site's broken!!!"
I'm actually OK with re-running the full query if there is a set flag
that says the data has been changed in any way.
> Anyway: number 2 is the bigger problem:
>
> 2) Underlying data changed
> This is tricky. I can think of three approaches:
> a) re-query every time, like you did earlier, but that is unacceptable
> slow.
Exactly.
> b) Place triggers on all relevant tables, that fill some other table
> with the changed records and also timestamps.
> Your process can then look up each invocation if there is a need to
> requery and comes back with what you wrote: "your results are no longer
> valid, you need to rerun the full query."
That's interesting...I was thinking that a change in any JOIN'ed table
would trigger a full refresh, but you're right--we may be able to
implement something that's a little smarter and more granular.
> The second solution (b) is by far the most elegant, but also the most
> complex/bug-prone: you are actually building your own custom caching
> system, that triggers a fresh query when needed.
>
> c) A hybrid: Do a full search ONLY on the IDs you have. So apply all
> other searchcriteria but include the:
> "WHERE pkTable IN (10, 11 ... 19)"
> This will ensure for you that the results are maybe not the right number
> (10 in our example), but they ARE guaranteed accurate.
That's not bad either. I'd almost venture to say that we use
temporary tables to flatten the lookups in conjunction with this as
well. So, first execution populates a temp (memory?) table and then
queries that table for the full result set. Subsequent page loads
first check a flag that's set by the insert/update/delete logic that
says either (a) this whole temporary table is invalid or (b) these
rows are invalid and regenerates the requisite data into the temp
table.
Then, we could use Alvaro's suggestion to both limit rows in the query
and provide full result counts much more efficiently (no joins).
I doubt this is going to be implemented for the current app that's
supposed to roll on Dec. 1, but this whole app suite is on a custom
Framework, so this could easily be built in for all future reporting.
> I have never done that last one (c), so here my experience ends. :-/
> But it doesn't sound impossible at all.
> Smart database engines can use that extra information to seriously
> reduce the time needed to scan all included tables.
> (And also: some databases, eg Postgresql, can accept hints.)
Switching the RDBMS is a last-ditch option, but not one that's off the
table. We'd probably be talking Oracle, but I'll keep pg in mind.
[snip]
Thanks again.
|
|
|