Re: Record pagination with count of total rows [message #170432 is a reply to message #170427] |
Mon, 01 November 2010 14:26 |
matt[1]
Messages: 40 Registered: September 2010
Karma:
|
Member |
|
|
On Nov 1, 9:33 am, Erwin Moller
<Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
> On 11/1/2010 1:42 PM, matt wrote:
>
>> Hi all,
>
> Hi Matt,
>
>
>
>> I'm struggling with a concept here, and I'd like to pick some brains.
>> I've got a series of report pages with complicated queries that return
>> row counts in the 10s of thousands. These are paginated, and after
>> accepting search and sort criteria from the user, I'm displaying a
>> header that looks like this:
>
>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>> Page [1] 2 3 ... 1186 1187 Next>>
>
>> To get record totals and then deduce page numbers, I'm pulling a full
>> recordset from MySQL and using PHP to display the appropriate rows.
>> The problem is that, in the example above, at least, that query takes
>> about 70 seconds to run. If I were to limit the search criteria so
>> that it only returns<100 rows, then it's lightning fast. We're using
>> ndb-cluster, so JOINs really have a performance impact.
>
> Clear (and common) problem.
>
>> This leads me to believe I could take advantage of LIMIT, however, I
>> don't see how I'd get my total row count anymore.
>
> That aside: if you rerun your whole query every invocation of that page,
> then you will probably still need the 70 seconds to produce the result
> (at the database), and only give back 10 to PHP.
Right, although since PHP is imposing the limit, the query looks
exactly the same to MySQL and query_caching begins to assist us.
Also, we have a custom page-caching mechanism that looks for HTTP
request uniqueness, so any reloads, back buttons or navigations to
previously viewed pages are fed from an HTML cache source.
>> Usually these reports are one-to-one with a particular table, so I've
>> entertained the idea of running a separate COUNT query against that
>> table with no JOINs. This won't suit my purposes because in a lot of
>> cases, the search criteria are against fields in JOINed tables.
>
> ok.
>
>
>
>> I've also thought about using temporary tables that are periodically
>> updated (think Oracle's static views.) The problem there is that a
>> lot of times, our users will go directly from a data entry page to a
>> report and expect it to reflect the new data that was just entered.
>> If I use these "snapshot" tables, I would also need to implement a
>> mechanism that refreshed them whenever data is inserted/updated/
>> deleted elsewhere.
>
>> Finally, the last approach I've considered is what I call the
>> "PeopleSoft" method. That is, if you want a report, you enter all of
>> your criteria, choose a format (XLS, HTML, etc.) and the report goes
>> into a queue. You don't wait for your results, but instead go to a
>> process management page, which would likely be AJAX enabled to inform
>> you when the report was built and ready for viewing. I think this
>> would be the least convenient solution for our users.
>
>> I'm just curious to see how other people have addressed this problem
>> and what you've found to be the up/down sides of your chosen solution.
>
> The way I approach this:
>
> 1) Run the query once, and store the ID's of something that will enable
> you to fetch rest of the tuple based on that number.
> (For example: store a userid, and you know you can easily find the rest
> of the accompanying information based on a userid)
An interesting concept. Unfortunately, it doesn't address the time of
the first query. Although, I'm surprised I didn't think of it...
> 2) Store the id's in the session.
....because I'm already doing this--for another purpose. In a lot of
reports, you can pop-up a detail page by clicking on a row. Then,
from that page, you can navigate forward/back through your result set
in the order of how you sorted your report. To do that, when you run
the report, we populate an array of primary keys in the session.
I never thought of using that for subsequent queries.
> 3) write some (basic)code that will produce the pagination and number of
> results per page.
> If your visitors click >>next>> or something, you can use the userids
> you stored in the session to fetch the needed results from the database.
> You will use something like:
>
> SELECT userid, username FROM tbluser WHERE (userid IN (xx,xx,xx,xx,xx));
>
> Where the xx are userids you pull from the session.
>
> That way you will only fetch the rows you need from the backend (database).
> Will such a query be slow?
It should be faster than searching on fields in joined tables. We
already have permission columns in each primary table which provide a
fast way to limit the rows at the beginning of the query, but this
would certainly be faster.
> (If the answer to that is YES, then this approach won't work, but in
> most cases I encountered this worked because of the eliminated complex
> table scans for each search.)
The question is will this impact performance significantly enough.
I'm not sure. There are also some synchronization issues to overcome.
Let's say I get 20 rows back:
1. Store all my main table PKs in a _SESSION array.
2. Click "next"
3. Display count($_SESSION['reportKeys']) as total row count.
4. Build query from array:
SELECT [...] WHERE pkTable IN (10, 11 ... 19);
So, I'm not only automatically filtering by the previously entered
search criteria, but I'm also only grabbing the specific subset of the
session array that will be viewed.
The synchronization problem occurs when I amend the previous list:
1.5 Other user deletes record #8.
Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
should start on record #11 and my total count should be 19.
Same problem occurs if new data is entered that should be included in
the results after the user's initial query.
I think that even with this approach, there needs to be a way of
telling individual sessions "your results are no longer valid, you
need to rerun the full query."
> Drawback:
> - If you have a really huge resultset, your session will grow too of
> course. But I consider that a lesser problem than repetitive needless
> queries to the database from which only a few will be shown.
Not to worry. We're barely making a dent in our hardware right now
and our average session runs 500k-1MB. An array of 30k ints won't be
noticeable.
>> And, yes, the DBA is actively investigating why ndb is running much
>> slower than the same queries on InnoDB or MyISAM. That's a separate
>> post. For the purposes of this discussion, I'm interested in software
>> solutions.
>
>> Thanks,
>> Matt
>
> Good luck.
Thanks a lot. That's a great tip.
|
|
|