Re: Record pagination with count of total rows [message #170474 is a reply to message #170432] |
Tue, 02 November 2010 08:51 |
Erwin Moller
Messages: 228 Registered: September 2010
Karma:
|
Senior Member |
|
|
On 11/1/2010 3:26 PM, matt wrote:
> On Nov 1, 9:33 am, Erwin Moller
> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>> On 11/1/2010 1:42 PM, matt wrote:
>>
>>> Hi all,
>>
>> Hi Matt,
>>
<snip>
>>> I'm just curious to see how other people have addressed this problem
>>> and what you've found to be the up/down sides of your chosen solution.
>>
>> The way I approach this:
>>
>> 1) Run the query once, and store the ID's of something that will enable
>> you to fetch rest of the tuple based on that number.
>> (For example: store a userid, and you know you can easily find the rest
>> of the accompanying information based on a userid)
>
> An interesting concept. Unfortunately, it doesn't address the time of
> the first query. Although, I'm surprised I didn't think of it...
True. But the query must run anyway, I expect.
So if it takes 70 seconds, I don't see how you can eliminate that.
The best you can do is trying to make it run only once, I expect.
>
>> 2) Store the id's in the session.
>
> ...because I'm already doing this--for another purpose. In a lot of
> reports, you can pop-up a detail page by clicking on a row. Then,
> from that page, you can navigate forward/back through your result set
> in the order of how you sorted your report. To do that, when you run
> the report, we populate an array of primary keys in the session.
>
> I never thought of using that for subsequent queries.
That is what newsgroups are good for sometime: Fresh angles.
:-)
>
>> 3) write some (basic)code that will produce the pagination and number of
>> results per page.
>> If your visitors click>>next>> or something, you can use the userids
>> you stored in the session to fetch the needed results from the database.
>> You will use something like:
>>
>> SELECT userid, username FROM tbluser WHERE (userid IN (xx,xx,xx,xx,xx));
>>
>> Where the xx are userids you pull from the session.
>>
>> That way you will only fetch the rows you need from the backend (database).
>> Will such a query be slow?
>
> It should be faster than searching on fields in joined tables. We
> already have permission columns in each primary table which provide a
> fast way to limit the rows at the beginning of the query, but this
> would certainly be faster.
Good. Sound like (part of) the solution then.
>
>> (If the answer to that is YES, then this approach won't work, but in
>> most cases I encountered this worked because of the eliminated complex
>> table scans for each search.)
>
> The question is will this impact performance significantly enough.
> I'm not sure. There are also some synchronization issues to overcome.
Absolutely.
>
> Let's say I get 20 rows back:
>
> 1. Store all my main table PKs in a _SESSION array.
> 2. Click "next"
> 3. Display count($_SESSION['reportKeys']) as total row count.
> 4. Build query from array:
>
> SELECT [...] WHERE pkTable IN (10, 11 ... 19);
>
Yes.
> So, I'm not only automatically filtering by the previously entered
> search criteria, but I'm also only grabbing the specific subset of the
> session array that will be viewed.
exactly.
>
> The synchronization problem occurs when I amend the previous list:
>
> 1.5 Other user deletes record #8.
>
> Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
> should start on record #11 and my total count should be 19.
>
> Same problem occurs if new data is entered that should be included in
> the results after the user's initial query.
>
> I think that even with this approach, there needs to be a way of
> telling individual sessions "your results are no longer valid, you
> need to rerun the full query."
Yes and no.
You say two things: Record deleted and underlying data changed.
(Well, actually a deleted record is also 'data changed' of course, but
they can be treated separately I think.)
1) Record deleted
This is no problem. If you write your code right, you will end up with 9
results instead of 10. (Number 8 was deleted).
That is no big problem in most cases: pagination is more for convenience
in most cases. (Of course, again, I cannot make that decision: you and
your client must decide if that is acceptable.)
Anyway: number 2 is the bigger problem:
2) Underlying data changed
This is tricky. I can think of three approaches:
a) re-query every time, like you did earlier, but that is unacceptable
slow.
b) Place triggers on all relevant tables, that fill some other table
with the changed records and also timestamps.
Your process can then look up each invocation if there is a need to
requery and comes back with what you wrote: "your results are no longer
valid, you need to rerun the full query."
The second solution (b) is by far the most elegant, but also the most
complex/bug-prone: you are actually building your own custom caching
system, that triggers a fresh query when needed.
c) A hybrid: Do a full search ONLY on the IDs you have. So apply all
other searchcriteria but include the:
"WHERE pkTable IN (10, 11 ... 19)"
This will ensure for you that the results are maybe not the right number
(10 in our example), but they ARE guaranteed accurate.
I have never done that last one (c), so here my experience ends. :-/
But it doesn't sound impossible at all.
Smart database engines can use that extra information to seriously
reduce the time needed to scan all included tables.
(And also: some databases, eg Postgresql, can accept hints.)
I can imagine your client can agree with the fact the data (s)he views
is stale (stale as in: a snapshot from the moment the query ran for the
first time.)
To sum it up: If your client demands up-to-date information, I think I
would go for the above 2c approach. It is much simpler than 2b.
>
>> Drawback:
>> - If you have a really huge resultset, your session will grow too of
>> course. But I consider that a lesser problem than repetitive needless
>> queries to the database from which only a few will be shown.
>
> Not to worry. We're barely making a dent in our hardware right now
> and our average session runs 500k-1MB. An array of 30k ints won't be
> noticeable.
Ah nice. :-)
>
>>> And, yes, the DBA is actively investigating why ndb is running much
>>> slower than the same queries on InnoDB or MyISAM. That's a separate
>>> post. For the purposes of this discussion, I'm interested in software
>>> solutions.
>>
>>> Thanks,
>>> Matt
>>
>> Good luck.
>
> Thanks a lot. That's a great tip.
You are welcome and thanks.
Good luck.
Regards,
Erwin Moller
--
"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare
|
|
|