Record pagination with count of total rows [message #170423] |
Mon, 01 November 2010 12:42 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
Hi all,
I'm struggling with a concept here, and I'd like to pick some brains.
I've got a series of report pages with complicated queries that return
row counts in the 10s of thousands. These are paginated, and after
accepting search and sort criteria from the user, I'm displaying a
header that looks like this:
Search contains 29,657 records. Records per page [select 25, 50, 100]
Page [1] 2 3 ... 1186 1187 Next >>
To get record totals and then deduce page numbers, I'm pulling a full
recordset from MySQL and using PHP to display the appropriate rows.
The problem is that, in the example above, at least, that query takes
about 70 seconds to run. If I were to limit the search criteria so
that it only returns <100 rows, then it's lightning fast. We're using
ndb-cluster, so JOINs really have a performance impact.
This leads me to believe I could take advantage of LIMIT, however, I
don't see how I'd get my total row count anymore.
Usually these reports are one-to-one with a particular table, so I've
entertained the idea of running a separate COUNT query against that
table with no JOINs. This won't suit my purposes because in a lot of
cases, the search criteria are against fields in JOINed tables.
I've also thought about using temporary tables that are periodically
updated (think Oracle's static views.) The problem there is that a
lot of times, our users will go directly from a data entry page to a
report and expect it to reflect the new data that was just entered.
If I use these "snapshot" tables, I would also need to implement a
mechanism that refreshed them whenever data is inserted/updated/
deleted elsewhere.
Finally, the last approach I've considered is what I call the
"PeopleSoft" method. That is, if you want a report, you enter all of
your criteria, choose a format (XLS, HTML, etc.) and the report goes
into a queue. You don't wait for your results, but instead go to a
process management page, which would likely be AJAX enabled to inform
you when the report was built and ready for viewing. I think this
would be the least convenient solution for our users.
I'm just curious to see how other people have addressed this problem
and what you've found to be the up/down sides of your chosen solution.
And, yes, the DBA is actively investigating why ndb is running much
slower than the same queries on InnoDB or MyISAM. That's a separate
post. For the purposes of this discussion, I'm interested in software
solutions.
Thanks,
Matt
|
|
|
Re: Record pagination with count of total rows [message #170425 is a reply to message #170423] |
Mon, 01 November 2010 12:58 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 11/1/2010 8:42 AM, matt wrote:
> Hi all,
>
> I'm struggling with a concept here, and I'd like to pick some brains.
> I've got a series of report pages with complicated queries that return
> row counts in the 10s of thousands. These are paginated, and after
> accepting search and sort criteria from the user, I'm displaying a
> header that looks like this:
>
> Search contains 29,657 records. Records per page [select 25, 50, 100]
> Page [1] 2 3 ... 1186 1187 Next>>
>
> To get record totals and then deduce page numbers, I'm pulling a full
> recordset from MySQL and using PHP to display the appropriate rows.
> The problem is that, in the example above, at least, that query takes
> about 70 seconds to run. If I were to limit the search criteria so
> that it only returns<100 rows, then it's lightning fast. We're using
> ndb-cluster, so JOINs really have a performance impact.
>
> This leads me to believe I could take advantage of LIMIT, however, I
> don't see how I'd get my total row count anymore.
>
> Usually these reports are one-to-one with a particular table, so I've
> entertained the idea of running a separate COUNT query against that
> table with no JOINs. This won't suit my purposes because in a lot of
> cases, the search criteria are against fields in JOINed tables.
>
> I've also thought about using temporary tables that are periodically
> updated (think Oracle's static views.) The problem there is that a
> lot of times, our users will go directly from a data entry page to a
> report and expect it to reflect the new data that was just entered.
> If I use these "snapshot" tables, I would also need to implement a
> mechanism that refreshed them whenever data is inserted/updated/
> deleted elsewhere.
>
> Finally, the last approach I've considered is what I call the
> "PeopleSoft" method. That is, if you want a report, you enter all of
> your criteria, choose a format (XLS, HTML, etc.) and the report goes
> into a queue. You don't wait for your results, but instead go to a
> process management page, which would likely be AJAX enabled to inform
> you when the report was built and ready for viewing. I think this
> would be the least convenient solution for our users.
>
> I'm just curious to see how other people have addressed this problem
> and what you've found to be the up/down sides of your chosen solution.
>
> And, yes, the DBA is actively investigating why ndb is running much
> slower than the same queries on InnoDB or MyISAM. That's a separate
> post. For the purposes of this discussion, I'm interested in software
> solutions.
>
> Thanks,
> Matt
My first question would be - who's going to look at "10's of thousands
of rows" - especially if they have to go through thousands of pages to
read them.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|
Re: Record pagination with count of total rows [message #170427 is a reply to message #170423] |
Mon, 01 November 2010 13:33 |
Erwin Moller
Messages: 228 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 11/1/2010 1:42 PM, matt wrote:
> Hi all,
>
Hi Matt,
> I'm struggling with a concept here, and I'd like to pick some brains.
> I've got a series of report pages with complicated queries that return
> row counts in the 10s of thousands. These are paginated, and after
> accepting search and sort criteria from the user, I'm displaying a
> header that looks like this:
>
> Search contains 29,657 records. Records per page [select 25, 50, 100]
> Page [1] 2 3 ... 1186 1187 Next>>
>
> To get record totals and then deduce page numbers, I'm pulling a full
> recordset from MySQL and using PHP to display the appropriate rows.
> The problem is that, in the example above, at least, that query takes
> about 70 seconds to run. If I were to limit the search criteria so
> that it only returns<100 rows, then it's lightning fast. We're using
> ndb-cluster, so JOINs really have a performance impact.
>
Clear (and common) problem.
> This leads me to believe I could take advantage of LIMIT, however, I
> don't see how I'd get my total row count anymore.
That aside: if you rerun your whole query every invocation of that page,
then you will probably still need the 70 seconds to produce the result
(at the database), and only give back 10 to PHP.
>
> Usually these reports are one-to-one with a particular table, so I've
> entertained the idea of running a separate COUNT query against that
> table with no JOINs. This won't suit my purposes because in a lot of
> cases, the search criteria are against fields in JOINed tables.
>
ok.
> I've also thought about using temporary tables that are periodically
> updated (think Oracle's static views.) The problem there is that a
> lot of times, our users will go directly from a data entry page to a
> report and expect it to reflect the new data that was just entered.
> If I use these "snapshot" tables, I would also need to implement a
> mechanism that refreshed them whenever data is inserted/updated/
> deleted elsewhere.
>
> Finally, the last approach I've considered is what I call the
> "PeopleSoft" method. That is, if you want a report, you enter all of
> your criteria, choose a format (XLS, HTML, etc.) and the report goes
> into a queue. You don't wait for your results, but instead go to a
> process management page, which would likely be AJAX enabled to inform
> you when the report was built and ready for viewing. I think this
> would be the least convenient solution for our users.
>
> I'm just curious to see how other people have addressed this problem
> and what you've found to be the up/down sides of your chosen solution.
>
The way I approach this:
1) Run the query once, and store the ID's of something that will enable
you to fetch rest of the tuple based on that number.
(For example: store a userid, and you know you can easily find the rest
of the accompanying information based on a userid)
2) Store the id's in the session.
3) write some (basic)code that will produce the pagination and number of
results per page.
If your visitors click >>next>> or something, you can use the userids
you stored in the session to fetch the needed results from the database.
You will use something like:
SELECT userid, username FROM tbluser WHERE (userid IN (xx,xx,xx,xx,xx));
Where the xx are userids you pull from the session.
That way you will only fetch the rows you need from the backend (database).
Will such a query be slow?
(If the answer to that is YES, then this approach won't work, but in
most cases I encountered this worked because of the eliminated complex
table scans for each search.)
Drawback:
- If you have a really huge resultset, your session will grow too of
course. But I consider that a lesser problem than repetitive needless
queries to the database from which only a few will be shown.
> And, yes, the DBA is actively investigating why ndb is running much
> slower than the same queries on InnoDB or MyISAM. That's a separate
> post. For the purposes of this discussion, I'm interested in software
> solutions.
>
> Thanks,
> Matt
Good luck.
Regards,
Erwin Moller
--
"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare
|
|
|
Re: Record pagination with count of total rows [message #170428 is a reply to message #170425] |
Mon, 01 November 2010 13:34 |
Erwin Moller
Messages: 228 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 11/1/2010 1:58 PM, Jerry Stuckle wrote:
> On 11/1/2010 8:42 AM, matt wrote:
>> Hi all,
>>
>> I'm struggling with a concept here, and I'd like to pick some brains.
>> I've got a series of report pages with complicated queries that return
>> row counts in the 10s of thousands. These are paginated, and after
>> accepting search and sort criteria from the user, I'm displaying a
>> header that looks like this:
>>
>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>> Page [1] 2 3 ... 1186 1187 Next>>
>>
>> To get record totals and then deduce page numbers, I'm pulling a full
>> recordset from MySQL and using PHP to display the appropriate rows.
>> The problem is that, in the example above, at least, that query takes
>> about 70 seconds to run. If I were to limit the search criteria so
>> that it only returns<100 rows, then it's lightning fast. We're using
>> ndb-cluster, so JOINs really have a performance impact.
>>
>> This leads me to believe I could take advantage of LIMIT, however, I
>> don't see how I'd get my total row count anymore.
>>
>> Usually these reports are one-to-one with a particular table, so I've
>> entertained the idea of running a separate COUNT query against that
>> table with no JOINs. This won't suit my purposes because in a lot of
>> cases, the search criteria are against fields in JOINed tables.
>>
>> I've also thought about using temporary tables that are periodically
>> updated (think Oracle's static views.) The problem there is that a
>> lot of times, our users will go directly from a data entry page to a
>> report and expect it to reflect the new data that was just entered.
>> If I use these "snapshot" tables, I would also need to implement a
>> mechanism that refreshed them whenever data is inserted/updated/
>> deleted elsewhere.
>>
>> Finally, the last approach I've considered is what I call the
>> "PeopleSoft" method. That is, if you want a report, you enter all of
>> your criteria, choose a format (XLS, HTML, etc.) and the report goes
>> into a queue. You don't wait for your results, but instead go to a
>> process management page, which would likely be AJAX enabled to inform
>> you when the report was built and ready for viewing. I think this
>> would be the least convenient solution for our users.
>>
>> I'm just curious to see how other people have addressed this problem
>> and what you've found to be the up/down sides of your chosen solution.
>>
>> And, yes, the DBA is actively investigating why ndb is running much
>> slower than the same queries on InnoDB or MyISAM. That's a separate
>> post. For the purposes of this discussion, I'm interested in software
>> solutions.
>>
>> Thanks,
>> Matt
>
> My first question would be - who's going to look at "10's of thousands
> of rows" - especially if they have to go through thousands of pages to
> read them.
>
*That* is good question.
I know I won't feel very motivated to do so.
Regards,
Erwin Moller
--
"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare
|
|
|
Re: Record pagination with count of total rows [message #170429 is a reply to message #170425] |
Mon, 01 November 2010 13:58 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Nov 1, 8:58 am, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> On 11/1/2010 8:42 AM, matt wrote:
>
>
>
>> Hi all,
>
>> I'm struggling with a concept here, and I'd like to pick some brains.
>> I've got a series of report pages with complicated queries that return
>> row counts in the 10s of thousands. These are paginated, and after
>> accepting search and sort criteria from the user, I'm displaying a
>> header that looks like this:
>
>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>> Page [1] 2 3 ... 1186 1187 Next>>
>
>> To get record totals and then deduce page numbers, I'm pulling a full
>> recordset from MySQL and using PHP to display the appropriate rows.
>> The problem is that, in the example above, at least, that query takes
>> about 70 seconds to run. If I were to limit the search criteria so
>> that it only returns<100 rows, then it's lightning fast. We're using
>> ndb-cluster, so JOINs really have a performance impact.
>
>> This leads me to believe I could take advantage of LIMIT, however, I
>> don't see how I'd get my total row count anymore.
>
>> Usually these reports are one-to-one with a particular table, so I've
>> entertained the idea of running a separate COUNT query against that
>> table with no JOINs. This won't suit my purposes because in a lot of
>> cases, the search criteria are against fields in JOINed tables.
>
>> I've also thought about using temporary tables that are periodically
>> updated (think Oracle's static views.) The problem there is that a
>> lot of times, our users will go directly from a data entry page to a
>> report and expect it to reflect the new data that was just entered.
>> If I use these "snapshot" tables, I would also need to implement a
>> mechanism that refreshed them whenever data is inserted/updated/
>> deleted elsewhere.
>
>> Finally, the last approach I've considered is what I call the
>> "PeopleSoft" method. That is, if you want a report, you enter all of
>> your criteria, choose a format (XLS, HTML, etc.) and the report goes
>> into a queue. You don't wait for your results, but instead go to a
>> process management page, which would likely be AJAX enabled to inform
>> you when the report was built and ready for viewing. I think this
>> would be the least convenient solution for our users.
>
>> I'm just curious to see how other people have addressed this problem
>> and what you've found to be the up/down sides of your chosen solution.
>
>> And, yes, the DBA is actively investigating why ndb is running much
>> slower than the same queries on InnoDB or MyISAM. That's a separate
>> post. For the purposes of this discussion, I'm interested in software
>> solutions.
>
>> Thanks,
>> Matt
>
> My first question would be - who's going to look at "10's of thousands
> of rows" - especially if they have to go through thousands of pages to
> read them.
Accountants.
|
|
|
Re: Record pagination with count of total rows [message #170432 is a reply to message #170427] |
Mon, 01 November 2010 14:26 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Nov 1, 9:33 am, Erwin Moller
<Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
> On 11/1/2010 1:42 PM, matt wrote:
>
>> Hi all,
>
> Hi Matt,
>
>
>
>> I'm struggling with a concept here, and I'd like to pick some brains.
>> I've got a series of report pages with complicated queries that return
>> row counts in the 10s of thousands. These are paginated, and after
>> accepting search and sort criteria from the user, I'm displaying a
>> header that looks like this:
>
>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>> Page [1] 2 3 ... 1186 1187 Next>>
>
>> To get record totals and then deduce page numbers, I'm pulling a full
>> recordset from MySQL and using PHP to display the appropriate rows.
>> The problem is that, in the example above, at least, that query takes
>> about 70 seconds to run. If I were to limit the search criteria so
>> that it only returns<100 rows, then it's lightning fast. We're using
>> ndb-cluster, so JOINs really have a performance impact.
>
> Clear (and common) problem.
>
>> This leads me to believe I could take advantage of LIMIT, however, I
>> don't see how I'd get my total row count anymore.
>
> That aside: if you rerun your whole query every invocation of that page,
> then you will probably still need the 70 seconds to produce the result
> (at the database), and only give back 10 to PHP.
Right, although since PHP is imposing the limit, the query looks
exactly the same to MySQL and query_caching begins to assist us.
Also, we have a custom page-caching mechanism that looks for HTTP
request uniqueness, so any reloads, back buttons or navigations to
previously viewed pages are fed from an HTML cache source.
>> Usually these reports are one-to-one with a particular table, so I've
>> entertained the idea of running a separate COUNT query against that
>> table with no JOINs. This won't suit my purposes because in a lot of
>> cases, the search criteria are against fields in JOINed tables.
>
> ok.
>
>
>
>> I've also thought about using temporary tables that are periodically
>> updated (think Oracle's static views.) The problem there is that a
>> lot of times, our users will go directly from a data entry page to a
>> report and expect it to reflect the new data that was just entered.
>> If I use these "snapshot" tables, I would also need to implement a
>> mechanism that refreshed them whenever data is inserted/updated/
>> deleted elsewhere.
>
>> Finally, the last approach I've considered is what I call the
>> "PeopleSoft" method. That is, if you want a report, you enter all of
>> your criteria, choose a format (XLS, HTML, etc.) and the report goes
>> into a queue. You don't wait for your results, but instead go to a
>> process management page, which would likely be AJAX enabled to inform
>> you when the report was built and ready for viewing. I think this
>> would be the least convenient solution for our users.
>
>> I'm just curious to see how other people have addressed this problem
>> and what you've found to be the up/down sides of your chosen solution.
>
> The way I approach this:
>
> 1) Run the query once, and store the ID's of something that will enable
> you to fetch rest of the tuple based on that number.
> (For example: store a userid, and you know you can easily find the rest
> of the accompanying information based on a userid)
An interesting concept. Unfortunately, it doesn't address the time of
the first query. Although, I'm surprised I didn't think of it...
> 2) Store the id's in the session.
....because I'm already doing this--for another purpose. In a lot of
reports, you can pop-up a detail page by clicking on a row. Then,
from that page, you can navigate forward/back through your result set
in the order of how you sorted your report. To do that, when you run
the report, we populate an array of primary keys in the session.
I never thought of using that for subsequent queries.
> 3) write some (basic)code that will produce the pagination and number of
> results per page.
> If your visitors click >>next>> or something, you can use the userids
> you stored in the session to fetch the needed results from the database.
> You will use something like:
>
> SELECT userid, username FROM tbluser WHERE (userid IN (xx,xx,xx,xx,xx));
>
> Where the xx are userids you pull from the session.
>
> That way you will only fetch the rows you need from the backend (database).
> Will such a query be slow?
It should be faster than searching on fields in joined tables. We
already have permission columns in each primary table which provide a
fast way to limit the rows at the beginning of the query, but this
would certainly be faster.
> (If the answer to that is YES, then this approach won't work, but in
> most cases I encountered this worked because of the eliminated complex
> table scans for each search.)
The question is will this impact performance significantly enough.
I'm not sure. There are also some synchronization issues to overcome.
Let's say I get 20 rows back:
1. Store all my main table PKs in a _SESSION array.
2. Click "next"
3. Display count($_SESSION['reportKeys']) as total row count.
4. Build query from array:
SELECT [...] WHERE pkTable IN (10, 11 ... 19);
So, I'm not only automatically filtering by the previously entered
search criteria, but I'm also only grabbing the specific subset of the
session array that will be viewed.
The synchronization problem occurs when I amend the previous list:
1.5 Other user deletes record #8.
Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
should start on record #11 and my total count should be 19.
Same problem occurs if new data is entered that should be included in
the results after the user's initial query.
I think that even with this approach, there needs to be a way of
telling individual sessions "your results are no longer valid, you
need to rerun the full query."
> Drawback:
> - If you have a really huge resultset, your session will grow too of
> course. But I consider that a lesser problem than repetitive needless
> queries to the database from which only a few will be shown.
Not to worry. We're barely making a dent in our hardware right now
and our average session runs 500k-1MB. An array of 30k ints won't be
noticeable.
>> And, yes, the DBA is actively investigating why ndb is running much
>> slower than the same queries on InnoDB or MyISAM. That's a separate
>> post. For the purposes of this discussion, I'm interested in software
>> solutions.
>
>> Thanks,
>> Matt
>
> Good luck.
Thanks a lot. That's a great tip.
|
|
|
Re: Record pagination with count of total rows [message #170438 is a reply to message #170429] |
Mon, 01 November 2010 15:08 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 11/1/2010 9:58 AM, matt wrote:
> On Nov 1, 8:58 am, Jerry Stuckle<jstuck...@attglobal.net> wrote:
>> On 11/1/2010 8:42 AM, matt wrote:
>>
>>
>>
>>> Hi all,
>>
>>> I'm struggling with a concept here, and I'd like to pick some brains.
>>> I've got a series of report pages with complicated queries that return
>>> row counts in the 10s of thousands. These are paginated, and after
>>> accepting search and sort criteria from the user, I'm displaying a
>>> header that looks like this:
>>
>>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>>> Page [1] 2 3 ... 1186 1187 Next>>
>>
>>> To get record totals and then deduce page numbers, I'm pulling a full
>>> recordset from MySQL and using PHP to display the appropriate rows.
>>> The problem is that, in the example above, at least, that query takes
>>> about 70 seconds to run. If I were to limit the search criteria so
>>> that it only returns<100 rows, then it's lightning fast. We're using
>>> ndb-cluster, so JOINs really have a performance impact.
>>
>>> This leads me to believe I could take advantage of LIMIT, however, I
>>> don't see how I'd get my total row count anymore.
>>
>>> Usually these reports are one-to-one with a particular table, so I've
>>> entertained the idea of running a separate COUNT query against that
>>> table with no JOINs. This won't suit my purposes because in a lot of
>>> cases, the search criteria are against fields in JOINed tables.
>>
>>> I've also thought about using temporary tables that are periodically
>>> updated (think Oracle's static views.) The problem there is that a
>>> lot of times, our users will go directly from a data entry page to a
>>> report and expect it to reflect the new data that was just entered.
>>> If I use these "snapshot" tables, I would also need to implement a
>>> mechanism that refreshed them whenever data is inserted/updated/
>>> deleted elsewhere.
>>
>>> Finally, the last approach I've considered is what I call the
>>> "PeopleSoft" method. That is, if you want a report, you enter all of
>>> your criteria, choose a format (XLS, HTML, etc.) and the report goes
>>> into a queue. You don't wait for your results, but instead go to a
>>> process management page, which would likely be AJAX enabled to inform
>>> you when the report was built and ready for viewing. I think this
>>> would be the least convenient solution for our users.
>>
>>> I'm just curious to see how other people have addressed this problem
>>> and what you've found to be the up/down sides of your chosen solution.
>>
>>> And, yes, the DBA is actively investigating why ndb is running much
>>> slower than the same queries on InnoDB or MyISAM. That's a separate
>>> post. For the purposes of this discussion, I'm interested in software
>>> solutions.
>>
>>> Thanks,
>>> Matt
>>
>> My first question would be - who's going to look at "10's of thousands
>> of rows" - especially if they have to go through thousands of pages to
>> read them.
>
> Accountants.
None that I know. They might go through that much on paper, where they
can make notes (even that would be suspect). But not on web page.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|
Re: Record pagination with count of total rows [message #170440 is a reply to message #170438] |
Mon, 01 November 2010 15:29 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Nov 1, 11:08 am, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> On 11/1/2010 9:58 AM, matt wrote:
>
>
>
>> On Nov 1, 8:58 am, Jerry Stuckle<jstuck...@attglobal.net> wrote:
>>> On 11/1/2010 8:42 AM, matt wrote:
>
>>>> Hi all,
>
>>>> I'm struggling with a concept here, and I'd like to pick some brains.
>>>> I've got a series of report pages with complicated queries that return
>>>> row counts in the 10s of thousands. These are paginated, and after
>>>> accepting search and sort criteria from the user, I'm displaying a
>>>> header that looks like this:
>
>>>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>>>> Page [1] 2 3 ... 1186 1187 Next>>
>
>>>> To get record totals and then deduce page numbers, I'm pulling a full
>>>> recordset from MySQL and using PHP to display the appropriate rows.
>>>> The problem is that, in the example above, at least, that query takes
>>>> about 70 seconds to run. If I were to limit the search criteria so
>>>> that it only returns<100 rows, then it's lightning fast. We're using
>>>> ndb-cluster, so JOINs really have a performance impact.
>
>>>> This leads me to believe I could take advantage of LIMIT, however, I
>>>> don't see how I'd get my total row count anymore.
>
>>>> Usually these reports are one-to-one with a particular table, so I've
>>>> entertained the idea of running a separate COUNT query against that
>>>> table with no JOINs. This won't suit my purposes because in a lot of
>>>> cases, the search criteria are against fields in JOINed tables.
>
>>>> I've also thought about using temporary tables that are periodically
>>>> updated (think Oracle's static views.) The problem there is that a
>>>> lot of times, our users will go directly from a data entry page to a
>>>> report and expect it to reflect the new data that was just entered.
>>>> If I use these "snapshot" tables, I would also need to implement a
>>>> mechanism that refreshed them whenever data is inserted/updated/
>>>> deleted elsewhere.
>
>>>> Finally, the last approach I've considered is what I call the
>>>> "PeopleSoft" method. That is, if you want a report, you enter all of
>>>> your criteria, choose a format (XLS, HTML, etc.) and the report goes
>>>> into a queue. You don't wait for your results, but instead go to a
>>>> process management page, which would likely be AJAX enabled to inform
>>>> you when the report was built and ready for viewing. I think this
>>>> would be the least convenient solution for our users.
>
>>>> I'm just curious to see how other people have addressed this problem
>>>> and what you've found to be the up/down sides of your chosen solution..
>
>>>> And, yes, the DBA is actively investigating why ndb is running much
>>>> slower than the same queries on InnoDB or MyISAM. That's a separate
>>>> post. For the purposes of this discussion, I'm interested in software
>>>> solutions.
>
>>>> Thanks,
>>>> Matt
>
>>> My first question would be - who's going to look at "10's of thousands
>>> of rows" - especially if they have to go through thousands of pages to
>>> read them.
>
>> Accountants.
>
> None that I know. They might go through that much on paper, where they
> can make notes (even that would be suspect). But not on web page.
Preaching to the choir. It's not like I want them pulling 50k
results, and I've tried to explain how to filter their searches down,
but they insist. I can only advise project management, but ultimately
the business rules are not mine to make.
|
|
|
Re: Record pagination with count of total rows [message #170452 is a reply to message #170423] |
Mon, 01 November 2010 18:47 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
matt wrote:
> I'm struggling with a concept here, and I'd like to pick some brains.
> I've got a series of report pages with complicated queries that return
> row counts in the 10s of thousands. These are paginated, and after
> accepting search and sort criteria from the user, I'm displaying a
> header that looks like this:
>
> Search contains 29,657 records. Records per page [select 25, 50, 100]
> Page [1] 2 3 ... 1186 1187 Next >>
>
> To get record totals and then deduce page numbers, I'm pulling a full
> recordset from MySQL and using PHP to display the appropriate rows.
> The problem is that, in the example above, at least, that query takes
> about 70 seconds to run. If I were to limit the search criteria so
> that it only returns <100 rows, then it's lightning fast. We're using
> ndb-cluster, so JOINs really have a performance impact.
>
> This leads me to believe I could take advantage of LIMIT, however, I
> don't see how I'd get my total row count anymore.
Temporary tables (or views) do not scale well. So I recommend to make *two*
queries; one with LIMIT and all the fields that you need for displaying one
page of the table, the other without LIMIT (but with a computed COUNT(…)
field if you GROUP BY values for a WHERE filter) and only one primary key
field (since it is only one field per record to retrieve, retrieval of the
row count should be a lot faster).
My approach was to pass a special, optional parameter to the retrieval
method of the model mapper class, so that it built a slightly different
query (very easy to do with Zend Framework), and returned either the number
of rows in the result (without filter) or the value of the computed field
(with filter) then.
Depending on your application and the complexity of the query (the more you
JOIN, the slower it is going to be), it might be a good idea to
create/update flat tables when data is saved, and run the query for
displaying the data only on the flat table. Magento E-Commerce Shop (?AMP)
does that; it appears to work fine, and queries are quite fast considering
the complexity of the Magento database.
Evidently your question has nothing to do with PHP programming as such, and
is therefore off-topic here. Please read
<http://www.catb.org/esr/faqs/smart-questions.html> and adhere to it in the
future.
PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
|
|
|
Re: Record pagination with count of total rows [message #170453 is a reply to message #170452] |
Mon, 01 November 2010 20:05 |
Hamish Campbell
Messages: 15 Registered: September 2010
Karma: 0
|
Junior Member |
|
|
On Nov 2, 7:47 am, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
> matt wrote:
>> I'm struggling with a concept here, and I'd like to pick some brains.
>> I've got a series of report pages with complicated queries that return
>> row counts in the 10s of thousands. These are paginated, and after
>> accepting search and sort criteria from the user, I'm displaying a
>> header that looks like this:
>
>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>> Page [1] 2 3 ... 1186 1187 Next >>
>
>> To get record totals and then deduce page numbers, I'm pulling a full
>> recordset from MySQL and using PHP to display the appropriate rows.
>> The problem is that, in the example above, at least, that query takes
>> about 70 seconds to run. If I were to limit the search criteria so
>> that it only returns <100 rows, then it's lightning fast. We're using
>> ndb-cluster, so JOINs really have a performance impact.
>
>> This leads me to believe I could take advantage of LIMIT, however, I
>> don't see how I'd get my total row count anymore.
>
> Temporary tables (or views) do not scale well. So I recommend to make *two*
> queries; one with LIMIT and all the fields that you need for displaying one
> page of the table, the other without LIMIT (but with a computed COUNT(…)
> field if you GROUP BY values for a WHERE filter) and only one primary key
> field (since it is only one field per record to retrieve, retrieval of the
> row count should be a lot faster).
>
> My approach was to pass a special, optional parameter to the retrieval
> method of the model mapper class, so that it built a slightly different
> query (very easy to do with Zend Framework), and returned either the number
> of rows in the result (without filter) or the value of the computed field
> (with filter) then.
>
> Depending on your application and the complexity of the query (the more you
> JOIN, the slower it is going to be), it might be a good idea to
> create/update flat tables when data is saved, and run the query for
> displaying the data only on the flat table. Magento E-Commerce Shop (?AMP)
> does that; it appears to work fine, and queries are quite fast considering
> the complexity of the Magento database.
+1 to P.E.s suggestions.
For a complex model, de-normalization (flat tables) is probably the
best option after query plan improvements. Since you're using MySQL
you'll need to add the triggers to your ORM layer, but it is usually
straight-forward to maintain a flat version that is kept in sync (or,
nearly in sync) with the source data. PostgreSQL is particularly
excellent for setting this sort of behaviour up at a database level -
but then, PostgreSQL has views and very efficient query optimization
too :D
That said, 30,000 records is not a huge result set. I note that you've
got your DBA looking into the database performance, so I wouldn't make
any code changes until the DBA has done what (s)he can.
Hamish
|
|
|
Re: Record pagination with count of total rows [message #170454 is a reply to message #170452] |
Mon, 01 November 2010 20:33 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Nov 1, 2:47 pm, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
> matt wrote:
>> I'm struggling with a concept here, and I'd like to pick some brains.
>> I've got a series of report pages with complicated queries that return
>> row counts in the 10s of thousands. These are paginated, and after
>> accepting search and sort criteria from the user, I'm displaying a
>> header that looks like this:
>
>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>> Page [1] 2 3 ... 1186 1187 Next >>
>
>> To get record totals and then deduce page numbers, I'm pulling a full
>> recordset from MySQL and using PHP to display the appropriate rows.
>> The problem is that, in the example above, at least, that query takes
>> about 70 seconds to run. If I were to limit the search criteria so
>> that it only returns <100 rows, then it's lightning fast. We're using
>> ndb-cluster, so JOINs really have a performance impact.
>
>> This leads me to believe I could take advantage of LIMIT, however, I
>> don't see how I'd get my total row count anymore.
>
> Temporary tables (or views) do not scale well. So I recommend to make *two*
> queries; one with LIMIT and all the fields that you need for displaying one
> page of the table, the other without LIMIT (but with a computed COUNT(…)
> field if you GROUP BY values for a WHERE filter) and only one primary key
> field (since it is only one field per record to retrieve, retrieval of the
> row count should be a lot faster).
We're actually using procedures. Views that use aggregate functions
won't take advantage of the query optimizer and instead will only be
processed after MySQL has built the full result set. And it scales
linearly, but NOW, this is off topic.
> My approach was to pass a special, optional parameter to the retrieval
> method of the model mapper class, so that it built a slightly different
> query (very easy to do with Zend Framework), and returned either the number
> of rows in the result (without filter) or the value of the computed field
> (with filter) then.
We use a custom Framework, not Zend. I'm looking for general
implementation methods opposed to solutions implemented by specific
technologies.
> Depending on your application and the complexity of the query (the more you
> JOIN, the slower it is going to be), it might be a good idea to
> create/update flat tables when data is saved, and run the query for
> displaying the data only on the flat table. Magento E-Commerce Shop (?AMP)
> does that; it appears to work fine, and queries are quite fast considering
> the complexity of the Magento database.
I suggested this in my post. The problem is that they would need to
be refreshed very often. So often in fact that there may not be any
benefit to this.
> Evidently your question has nothing to do with PHP programming as such, and
> is therefore off-topic here. Please read
> <http://www.catb.org/esr/faqs/smart-questions.html> and adhere to it in the
> future.
Had I done that, I wouldn't have gotten an excellent idea from Erwin
regarding using $_SESSION to hold primary keys. This site is
implemented using PHP. I specifically addressed the fact that DB
solutions are being investigated on another path and that I was
looking specifically for software-based solutions. I went out of my
way, in fact, to demonstrate that this is on topic.
Since you also seem to have missed the part where I explained what
methodologies I had considered, and only quote the first few
paragraphs of my question, I get the feeling you only read so far
before you replied and snipped everything else that might have given
you a little more context.
I appreciate your willingness to reply, but go ahead and spare me the
condescension next time.
|
|
|
|
Re: Record pagination with count of total rows [message #170474 is a reply to message #170432] |
Tue, 02 November 2010 08:51 |
Erwin Moller
Messages: 228 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 11/1/2010 3:26 PM, matt wrote:
> On Nov 1, 9:33 am, Erwin Moller
> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>> On 11/1/2010 1:42 PM, matt wrote:
>>
>>> Hi all,
>>
>> Hi Matt,
>>
<snip>
>>> I'm just curious to see how other people have addressed this problem
>>> and what you've found to be the up/down sides of your chosen solution.
>>
>> The way I approach this:
>>
>> 1) Run the query once, and store the ID's of something that will enable
>> you to fetch rest of the tuple based on that number.
>> (For example: store a userid, and you know you can easily find the rest
>> of the accompanying information based on a userid)
>
> An interesting concept. Unfortunately, it doesn't address the time of
> the first query. Although, I'm surprised I didn't think of it...
True. But the query must run anyway, I expect.
So if it takes 70 seconds, I don't see how you can eliminate that.
The best you can do is trying to make it run only once, I expect.
>
>> 2) Store the id's in the session.
>
> ...because I'm already doing this--for another purpose. In a lot of
> reports, you can pop-up a detail page by clicking on a row. Then,
> from that page, you can navigate forward/back through your result set
> in the order of how you sorted your report. To do that, when you run
> the report, we populate an array of primary keys in the session.
>
> I never thought of using that for subsequent queries.
That is what newsgroups are good for sometime: Fresh angles.
:-)
>
>> 3) write some (basic)code that will produce the pagination and number of
>> results per page.
>> If your visitors click>>next>> or something, you can use the userids
>> you stored in the session to fetch the needed results from the database.
>> You will use something like:
>>
>> SELECT userid, username FROM tbluser WHERE (userid IN (xx,xx,xx,xx,xx));
>>
>> Where the xx are userids you pull from the session.
>>
>> That way you will only fetch the rows you need from the backend (database).
>> Will such a query be slow?
>
> It should be faster than searching on fields in joined tables. We
> already have permission columns in each primary table which provide a
> fast way to limit the rows at the beginning of the query, but this
> would certainly be faster.
Good. Sound like (part of) the solution then.
>
>> (If the answer to that is YES, then this approach won't work, but in
>> most cases I encountered this worked because of the eliminated complex
>> table scans for each search.)
>
> The question is will this impact performance significantly enough.
> I'm not sure. There are also some synchronization issues to overcome.
Absolutely.
>
> Let's say I get 20 rows back:
>
> 1. Store all my main table PKs in a _SESSION array.
> 2. Click "next"
> 3. Display count($_SESSION['reportKeys']) as total row count.
> 4. Build query from array:
>
> SELECT [...] WHERE pkTable IN (10, 11 ... 19);
>
Yes.
> So, I'm not only automatically filtering by the previously entered
> search criteria, but I'm also only grabbing the specific subset of the
> session array that will be viewed.
exactly.
>
> The synchronization problem occurs when I amend the previous list:
>
> 1.5 Other user deletes record #8.
>
> Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
> should start on record #11 and my total count should be 19.
>
> Same problem occurs if new data is entered that should be included in
> the results after the user's initial query.
>
> I think that even with this approach, there needs to be a way of
> telling individual sessions "your results are no longer valid, you
> need to rerun the full query."
Yes and no.
You say two things: Record deleted and underlying data changed.
(Well, actually a deleted record is also 'data changed' of course, but
they can be treated separately I think.)
1) Record deleted
This is no problem. If you write your code right, you will end up with 9
results instead of 10. (Number 8 was deleted).
That is no big problem in most cases: pagination is more for convenience
in most cases. (Of course, again, I cannot make that decision: you and
your client must decide if that is acceptable.)
Anyway: number 2 is the bigger problem:
2) Underlying data changed
This is tricky. I can think of three approaches:
a) re-query every time, like you did earlier, but that is unacceptable
slow.
b) Place triggers on all relevant tables, that fill some other table
with the changed records and also timestamps.
Your process can then look up each invocation if there is a need to
requery and comes back with what you wrote: "your results are no longer
valid, you need to rerun the full query."
The second solution (b) is by far the most elegant, but also the most
complex/bug-prone: you are actually building your own custom caching
system, that triggers a fresh query when needed.
c) A hybrid: Do a full search ONLY on the IDs you have. So apply all
other searchcriteria but include the:
"WHERE pkTable IN (10, 11 ... 19)"
This will ensure for you that the results are maybe not the right number
(10 in our example), but they ARE guaranteed accurate.
I have never done that last one (c), so here my experience ends. :-/
But it doesn't sound impossible at all.
Smart database engines can use that extra information to seriously
reduce the time needed to scan all included tables.
(And also: some databases, eg Postgresql, can accept hints.)
I can imagine your client can agree with the fact the data (s)he views
is stale (stale as in: a snapshot from the moment the query ran for the
first time.)
To sum it up: If your client demands up-to-date information, I think I
would go for the above 2c approach. It is much simpler than 2b.
>
>> Drawback:
>> - If you have a really huge resultset, your session will grow too of
>> course. But I consider that a lesser problem than repetitive needless
>> queries to the database from which only a few will be shown.
>
> Not to worry. We're barely making a dent in our hardware right now
> and our average session runs 500k-1MB. An array of 30k ints won't be
> noticeable.
Ah nice. :-)
>
>>> And, yes, the DBA is actively investigating why ndb is running much
>>> slower than the same queries on InnoDB or MyISAM. That's a separate
>>> post. For the purposes of this discussion, I'm interested in software
>>> solutions.
>>
>>> Thanks,
>>> Matt
>>
>> Good luck.
>
> Thanks a lot. That's a great tip.
You are welcome and thanks.
Good luck.
Regards,
Erwin Moller
--
"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare
|
|
|
Re: Record pagination with count of total rows [message #170475 is a reply to message #170423] |
Tue, 02 November 2010 10:32 |
alvaro.NOSPAMTHANX
Messages: 277 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
El 01/11/2010 13:42, matt escribió/wrote:
> I'm struggling with a concept here, and I'd like to pick some brains.
> I've got a series of report pages with complicated queries that return
> row counts in the 10s of thousands. These are paginated, and after
> accepting search and sort criteria from the user, I'm displaying a
> header that looks like this:
>
> Search contains 29,657 records. Records per page [select 25, 50, 100]
> Page [1] 2 3 ... 1186 1187 Next>>
>
> To get record totals and then deduce page numbers, I'm pulling a full
> recordset from MySQL and using PHP to display the appropriate rows.
> The problem is that, in the example above, at least, that query takes
> about 70 seconds to run. If I were to limit the search criteria so
> that it only returns<100 rows, then it's lightning fast. We're using
> ndb-cluster, so JOINs really have a performance impact.
>
> This leads me to believe I could take advantage of LIMIT, however, I
> don't see how I'd get my total row count anymore.
In MySQL, you can use the SQL_CALC_FOUND_ROWS option:
SELECT SQL_CALC_FOUND_ROWS foo
FROM bar
LIMIT 200,10;
SELECT FOUND_ROWS() AS total_rows;
For further reference:
http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_ found-rows
In other DBMS, good old COUNT(*) tends to work well enough.
>
> Usually these reports are one-to-one with a particular table, so I've
> entertained the idea of running a separate COUNT query against that
> table with no JOINs. This won't suit my purposes because in a lot of
> cases, the search criteria are against fields in JOINed tables.
>
> I've also thought about using temporary tables that are periodically
> updated (think Oracle's static views.) The problem there is that a
> lot of times, our users will go directly from a data entry page to a
> report and expect it to reflect the new data that was just entered.
> If I use these "snapshot" tables, I would also need to implement a
> mechanism that refreshed them whenever data is inserted/updated/
> deleted elsewhere.
>
> Finally, the last approach I've considered is what I call the
> "PeopleSoft" method. That is, if you want a report, you enter all of
> your criteria, choose a format (XLS, HTML, etc.) and the report goes
> into a queue. You don't wait for your results, but instead go to a
> process management page, which would likely be AJAX enabled to inform
> you when the report was built and ready for viewing. I think this
> would be the least convenient solution for our users.
>
> I'm just curious to see how other people have addressed this problem
> and what you've found to be the up/down sides of your chosen solution.
>
> And, yes, the DBA is actively investigating why ndb is running much
> slower than the same queries on InnoDB or MyISAM. That's a separate
> post. For the purposes of this discussion, I'm interested in software
> solutions.
You can also try MySQL's result cache:
http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
I don't know if it'll work in your set-up but you can have a look at it.
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
|
|
|
Re: Record pagination with count of total rows [message #170478 is a reply to message #170475] |
Tue, 02 November 2010 12:13 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Nov 2, 6:32 am, "Álvaro G. Vicario"
<alvaro.NOSPAMTH...@demogracia.com.invalid> wrote:
> El 01/11/2010 13:42, matt escribió/wrote:
>
>
>
>> I'm struggling with a concept here, and I'd like to pick some brains.
>> I've got a series of report pages with complicated queries that return
>> row counts in the 10s of thousands. These are paginated, and after
>> accepting search and sort criteria from the user, I'm displaying a
>> header that looks like this:
>
>> Search contains 29,657 records. Records per page [select 25, 50, 100]
>> Page [1] 2 3 ... 1186 1187 Next>>
>
>> To get record totals and then deduce page numbers, I'm pulling a full
>> recordset from MySQL and using PHP to display the appropriate rows.
>> The problem is that, in the example above, at least, that query takes
>> about 70 seconds to run. If I were to limit the search criteria so
>> that it only returns<100 rows, then it's lightning fast. We're using
>> ndb-cluster, so JOINs really have a performance impact.
>
>> This leads me to believe I could take advantage of LIMIT, however, I
>> don't see how I'd get my total row count anymore.
>
> In MySQL, you can use the SQL_CALC_FOUND_ROWS option:
>
> SELECT SQL_CALC_FOUND_ROWS foo
> FROM bar
> LIMIT 200,10;
>
> SELECT FOUND_ROWS() AS total_rows;
>
> For further reference:
>
> http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#fun...
Thanks for the tip. I wasn't aware of that functionality.
Unfortunately, it doesn't seem to be any faster than grabbing the full
result set.
mysql> SELECT field FROM [bunch of joined tables] LIMIT 10;
10 rows in set (0.01 sec)
mysql> SELECT SQL_CALC_FOUND_ROWS field FROM [bunch of joined tables]
LIMIT 10;
10 rows in set (7.55 sec)
mysql> SELECT field FROM [bunch of joined tables];
25000 rows in set (7.34 sec)
> In other DBMS, good old COUNT(*) tends to work well enough.
IMHO works just as well in MySQL too, however, still takes just as
much time.
>> Usually these reports are one-to-one with a particular table, so I've
>> entertained the idea of running a separate COUNT query against that
>> table with no JOINs. This won't suit my purposes because in a lot of
>> cases, the search criteria are against fields in JOINed tables.
>
>> I've also thought about using temporary tables that are periodically
>> updated (think Oracle's static views.) The problem there is that a
>> lot of times, our users will go directly from a data entry page to a
>> report and expect it to reflect the new data that was just entered.
>> If I use these "snapshot" tables, I would also need to implement a
>> mechanism that refreshed them whenever data is inserted/updated/
>> deleted elsewhere.
>
>> Finally, the last approach I've considered is what I call the
>> "PeopleSoft" method. That is, if you want a report, you enter all of
>> your criteria, choose a format (XLS, HTML, etc.) and the report goes
>> into a queue. You don't wait for your results, but instead go to a
>> process management page, which would likely be AJAX enabled to inform
>> you when the report was built and ready for viewing. I think this
>> would be the least convenient solution for our users.
>
>> I'm just curious to see how other people have addressed this problem
>> and what you've found to be the up/down sides of your chosen solution.
>
>> And, yes, the DBA is actively investigating why ndb is running much
>> slower than the same queries on InnoDB or MyISAM. That's a separate
>> post. For the purposes of this discussion, I'm interested in software
>> solutions.
>
> You can also try MySQL's result cache:
>
> http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
I think we've already done everything we can here. DBA's domain, not
mine.
This is really getting OT though...I didn't want to turn this thread
into a MySQL efficiency discussion.
> I don't know if it'll work in your set-up but you can have a look at it.
I appreciate the insight.
|
|
|
Re: Record pagination with count of total rows [message #170480 is a reply to message #170474] |
Tue, 02 November 2010 12:28 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Nov 2, 4:51 am, Erwin Moller
<Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
> On 11/1/2010 3:26 PM, matt wrote:
>
>> On Nov 1, 9:33 am, Erwin Moller
>> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>>> On 11/1/2010 1:42 PM, matt wrote:
>
>>>> Hi all,
>
>>> Hi Matt,
>
> <snip>
>
>>>> I'm just curious to see how other people have addressed this problem
>>>> and what you've found to be the up/down sides of your chosen solution..
>
>>> The way I approach this:
>
>>> 1) Run the query once, and store the ID's of something that will enable
>>> you to fetch rest of the tuple based on that number.
>>> (For example: store a userid, and you know you can easily find the rest
>>> of the accompanying information based on a userid)
>
>> An interesting concept. Unfortunately, it doesn't address the time of
>> the first query. Although, I'm surprised I didn't think of it...
>
> True. But the query must run anyway, I expect.
> So if it takes 70 seconds, I don't see how you can eliminate that.
> The best you can do is trying to make it run only once, I expect.
Yeah...I was hoping to save some execution time using LIMIT, but I'm
beginning to think no matter what I implement, it does need to run the
full query up front.
[snip]
>> The synchronization problem occurs when I amend the previous list:
>
>> 1.5 Other user deletes record #8.
>
>> Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
>> should start on record #11 and my total count should be 19.
>
>> Same problem occurs if new data is entered that should be included in
>> the results after the user's initial query.
>
>> I think that even with this approach, there needs to be a way of
>> telling individual sessions "your results are no longer valid, you
>> need to rerun the full query."
>
> Yes and no.
> You say two things: Record deleted and underlying data changed.
> (Well, actually a deleted record is also 'data changed' of course, but
> they can be treated separately I think.)
>
> 1) Record deleted
> This is no problem. If you write your code right, you will end up with 9
> results instead of 10. (Number 8 was deleted).
> That is no big problem in most cases: pagination is more for convenience
> in most cases. (Of course, again, I cannot make that decision: you and
> your client must decide if that is acceptable.)
I can hear it now: "I said 25 rows per page and I only got 24!!! The
web site's broken!!!"
I'm actually OK with re-running the full query if there is a set flag
that says the data has been changed in any way.
> Anyway: number 2 is the bigger problem:
>
> 2) Underlying data changed
> This is tricky. I can think of three approaches:
> a) re-query every time, like you did earlier, but that is unacceptable
> slow.
Exactly.
> b) Place triggers on all relevant tables, that fill some other table
> with the changed records and also timestamps.
> Your process can then look up each invocation if there is a need to
> requery and comes back with what you wrote: "your results are no longer
> valid, you need to rerun the full query."
That's interesting...I was thinking that a change in any JOIN'ed table
would trigger a full refresh, but you're right--we may be able to
implement something that's a little smarter and more granular.
> The second solution (b) is by far the most elegant, but also the most
> complex/bug-prone: you are actually building your own custom caching
> system, that triggers a fresh query when needed.
>
> c) A hybrid: Do a full search ONLY on the IDs you have. So apply all
> other searchcriteria but include the:
> "WHERE pkTable IN (10, 11 ... 19)"
> This will ensure for you that the results are maybe not the right number
> (10 in our example), but they ARE guaranteed accurate.
That's not bad either. I'd almost venture to say that we use
temporary tables to flatten the lookups in conjunction with this as
well. So, first execution populates a temp (memory?) table and then
queries that table for the full result set. Subsequent page loads
first check a flag that's set by the insert/update/delete logic that
says either (a) this whole temporary table is invalid or (b) these
rows are invalid and regenerates the requisite data into the temp
table.
Then, we could use Alvaro's suggestion to both limit rows in the query
and provide full result counts much more efficiently (no joins).
I doubt this is going to be implemented for the current app that's
supposed to roll on Dec. 1, but this whole app suite is on a custom
Framework, so this could easily be built in for all future reporting.
> I have never done that last one (c), so here my experience ends. :-/
> But it doesn't sound impossible at all.
> Smart database engines can use that extra information to seriously
> reduce the time needed to scan all included tables.
> (And also: some databases, eg Postgresql, can accept hints.)
Switching the RDBMS is a last-ditch option, but not one that's off the
table. We'd probably be talking Oracle, but I'll keep pg in mind.
[snip]
Thanks again.
|
|
|
Re: Record pagination with count of total rows [message #170494 is a reply to message #170480] |
Wed, 03 November 2010 06:31 |
Norman Peelman
Messages: 126 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
matt wrote:
> On Nov 2, 4:51 am, Erwin Moller
> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>> On 11/1/2010 3:26 PM, matt wrote:
>>
>>> On Nov 1, 9:33 am, Erwin Moller
>>> <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>>>> On 11/1/2010 1:42 PM, matt wrote:
>>>> > Hi all,
>>>> Hi Matt,
>> <snip>
>>
>>>> > I'm just curious to see how other people have addressed this problem
>>>> > and what you've found to be the up/down sides of your chosen solution.
>>>> The way I approach this:
>>>> 1) Run the query once, and store the ID's of something that will enable
>>>> you to fetch rest of the tuple based on that number.
>>>> (For example: store a userid, and you know you can easily find the rest
>>>> of the accompanying information based on a userid)
>>> An interesting concept. Unfortunately, it doesn't address the time of
>>> the first query. Although, I'm surprised I didn't think of it...
>> True. But the query must run anyway, I expect.
>> So if it takes 70 seconds, I don't see how you can eliminate that.
>> The best you can do is trying to make it run only once, I expect.
>
> Yeah...I was hoping to save some execution time using LIMIT, but I'm
> beginning to think no matter what I implement, it does need to run the
> full query up front.
>
> [snip]
>
>>> The synchronization problem occurs when I amend the previous list:
>>> 1.5 Other user deletes record #8.
>>> Now, my"first ten results should be 0,1,2,3,4,5,6,7,9,10. Page two
>>> should start on record #11 and my total count should be 19.
>>> Same problem occurs if new data is entered that should be included in
>>> the results after the user's initial query.
>>> I think that even with this approach, there needs to be a way of
>>> telling individual sessions "your results are no longer valid, you
>>> need to rerun the full query."
>> Yes and no.
>> You say two things: Record deleted and underlying data changed.
>> (Well, actually a deleted record is also 'data changed' of course, but
>> they can be treated separately I think.)
>>
>> 1) Record deleted
>> This is no problem. If you write your code right, you will end up with 9
>> results instead of 10. (Number 8 was deleted).
>> That is no big problem in most cases: pagination is more for convenience
>> in most cases. (Of course, again, I cannot make that decision: you and
>> your client must decide if that is acceptable.)
>
> I can hear it now: "I said 25 rows per page and I only got 24!!! The
> web site's broken!!!"
>
> I'm actually OK with re-running the full query if there is a set flag
> that says the data has been changed in any way.
>
>> Anyway: number 2 is the bigger problem:
>>
>> 2) Underlying data changed
>> This is tricky. I can think of three approaches:
>> a) re-query every time, like you did earlier, but that is unacceptable
>> slow.
>
> Exactly.
>
>> b) Place triggers on all relevant tables, that fill some other table
>> with the changed records and also timestamps.
>> Your process can then look up each invocation if there is a need to
>> requery and comes back with what you wrote: "your results are no longer
>> valid, you need to rerun the full query."
>
> That's interesting...I was thinking that a change in any JOIN'ed table
> would trigger a full refresh, but you're right--we may be able to
> implement something that's a little smarter and more granular.
>
>> The second solution (b) is by far the most elegant, but also the most
>> complex/bug-prone: you are actually building your own custom caching
>> system, that triggers a fresh query when needed.
>>
>> c) A hybrid: Do a full search ONLY on the IDs you have. So apply all
>> other searchcriteria but include the:
>> "WHERE pkTable IN (10, 11 ... 19)"
>> This will ensure for you that the results are maybe not the right number
>> (10 in our example), but they ARE guaranteed accurate.
>
> That's not bad either. I'd almost venture to say that we use
> temporary tables to flatten the lookups in conjunction with this as
> well. So, first execution populates a temp (memory?) table and then
> queries that table for the full result set. Subsequent page loads
> first check a flag that's set by the insert/update/delete logic that
> says either (a) this whole temporary table is invalid or (b) these
> rows are invalid and regenerates the requisite data into the temp
> table.
>
> Then, we could use Alvaro's suggestion to both limit rows in the query
> and provide full result counts much more efficiently (no joins).
>
> I doubt this is going to be implemented for the current app that's
> supposed to roll on Dec. 1, but this whole app suite is on a custom
> Framework, so this could easily be built in for all future reporting.
>
>> I have never done that last one (c), so here my experience ends. :-/
>> But it doesn't sound impossible at all.
>> Smart database engines can use that extra information to seriously
>> reduce the time needed to scan all included tables.
>> (And also: some databases, eg Postgresql, can accept hints.)
>
> Switching the RDBMS is a last-ditch option, but not one that's off the
> table. We'd probably be talking Oracle, but I'll keep pg in mind.
>
> [snip]
>
> Thanks again.
Other options to consider:
MySQL - make sure that the result (query) buffer is sufficiently large
enough to hold many such result sets so that the full scan queries run
once and your paginated queries pull from the buffer as MySQL doesn't go
back to disk if the result set is already buffered (unless something has
changed). If something has changed MySQL should go back to disk for the
result set - this should be handled by MySQL automatically.
Using 'LIMIT offset, numrows' during pagination should dramatically
reduce query time when the full result set is already in the buffer.
PHP - use APC (Alternative PHP Cache) to store result sets in shared
memory with reasonable time to live. Probably about the same as using a
session variable but faster/less resources when multiple people doing
same query (how often would that happen?). This could allow (with the
proper logic) users to store/bookmark queries and come back to them
without hitting the database again (good for queries of data that may
not change).
--
Norman
Registered Linux user #461062
-Have you been to www.php.net yet?-
|
|
|
Re: Record pagination with count of total rows [message #170501 is a reply to message #170494] |
Wed, 03 November 2010 12:14 |
Matthew Leonhardt
Messages: 9 Registered: November 2010
Karma: 0
|
Junior Member |
|
|
"Norman Peelman" <npeelman(at)cfl(dot)rr(dot)com> wrote in message
news:iaqvjv$l1s$1(at)news(dot)eternal-september(dot)org...
[snip]
> Other options to consider:
>
> MySQL - make sure that the result (query) buffer is sufficiently large
> enough to hold many such result sets so that the full scan queries run
> once and your paginated queries pull from the buffer as MySQL doesn't go
> back to disk if the result set is already buffered (unless something has
> changed). If something has changed MySQL should go back to disk for the
> result set - this should be handled by MySQL automatically.
> Using 'LIMIT offset, numrows' during pagination should dramatically
> reduce query time when the full result set is already in the buffer.
Like I said, DB server configuration is being explored in parallel. I don't
want this discussion to get off topic.
> PHP - use APC (Alternative PHP Cache) to store result sets in shared
> memory with reasonable time to live. Probably about the same as using a
> session variable but faster/less resources when multiple people doing same
> query (how often would that happen?). This could allow (with the proper
> logic) users to store/bookmark queries and come back to them without
> hitting the database again (good for queries of data that may not change).
We have our own caching mechanism in place, however, I'll check out APC. I
doubt this will help us much, however. Our permissions are very granular
and there are rarely two users that have the same permissions (and would see
the same data returned from the query). For this reason, also, MySQL's
buffers and cache don't do too much good with multiple-user scenarios--the
portion of the WHERE clause that filters results by user permissions is
almost guaranteed to be different with each user.
|
|
|
Re: Record pagination with count of total rows [message #170502 is a reply to message #170501] |
Wed, 03 November 2010 13:22 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 11/3/2010 8:14 AM, Matthew Leonhardt wrote:
> "Norman Peelman"<npeelman(at)cfl(dot)rr(dot)com> wrote in message
> news:iaqvjv$l1s$1(at)news(dot)eternal-september(dot)org...
>
> [snip]
>
>> Other options to consider:
>>
>> MySQL - make sure that the result (query) buffer is sufficiently large
>> enough to hold many such result sets so that the full scan queries run
>> once and your paginated queries pull from the buffer as MySQL doesn't go
>> back to disk if the result set is already buffered (unless something has
>> changed). If something has changed MySQL should go back to disk for the
>> result set - this should be handled by MySQL automatically.
>> Using 'LIMIT offset, numrows' during pagination should dramatically
>> reduce query time when the full result set is already in the buffer.
>
> Like I said, DB server configuration is being explored in parallel. I don't
> want this discussion to get off topic.
>
>> PHP - use APC (Alternative PHP Cache) to store result sets in shared
>> memory with reasonable time to live. Probably about the same as using a
>> session variable but faster/less resources when multiple people doing same
>> query (how often would that happen?). This could allow (with the proper
>> logic) users to store/bookmark queries and come back to them without
>> hitting the database again (good for queries of data that may not change).
>
> We have our own caching mechanism in place, however, I'll check out APC. I
> doubt this will help us much, however. Our permissions are very granular
> and there are rarely two users that have the same permissions (and would see
> the same data returned from the query). For this reason, also, MySQL's
> buffers and cache don't do too much good with multiple-user scenarios--the
> portion of the WHERE clause that filters results by user permissions is
> almost guaranteed to be different with each user.
>
>
You missed the point. When the user displays the second page, the same
query would be run - and if the db buffer is big enough, the data will
already be there, making the query very quick.
And if you have multiple people running concurrently, increase the db
buffer accordingly.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|
Re: Record pagination with count of total rows [message #170503 is a reply to message #170501] |
Wed, 03 November 2010 13:55 |
Peter H. Coffin
Messages: 245 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On Wed, 3 Nov 2010 08:14:00 -0400, Matthew Leonhardt wrote:
>
> "Norman Peelman" <npeelman(at)cfl(dot)rr(dot)com> wrote in message
> news:iaqvjv$l1s$1(at)news(dot)eternal-september(dot)org...
>
> [snip]
>
>> Other options to consider:
>>
>> MySQL - make sure that the result (query) buffer is sufficiently large
>> enough to hold many such result sets so that the full scan queries run
>> once and your paginated queries pull from the buffer as MySQL doesn't go
>> back to disk if the result set is already buffered (unless something has
>> changed). If something has changed MySQL should go back to disk for the
>> result set - this should be handled by MySQL automatically.
>> Using 'LIMIT offset, numrows' during pagination should dramatically
>> reduce query time when the full result set is already in the buffer.
>
> Like I said, DB server configuration is being explored in parallel. I don't
> want this discussion to get off topic.
I can understand that, but you've got one problem, not a problem of
code, a problem of db layout, and a problem of db server config that are
separate. What you do in one may/will affect what you do for the others.
>> PHP - use APC (Alternative PHP Cache) to store result sets in shared
>> memory with reasonable time to live. Probably about the same as using a
>> session variable but faster/less resources when multiple people doing same
>> query (how often would that happen?). This could allow (with the proper
>> logic) users to store/bookmark queries and come back to them without
>> hitting the database again (good for queries of data that may not change).
>
> We have our own caching mechanism in place, however, I'll check out APC. I
> doubt this will help us much, however. Our permissions are very granular
> and there are rarely two users that have the same permissions (and would see
> the same data returned from the query).
This smells like trouble too. At least such that the security that it
may afford is likely to adversely affect performance. But that'd require
a lot of detail about precisely what you're limiting from whom and with
which and by how. Mostly, people are unwilling to share that kind of
information.
> For this reason, also, MySQL's buffers and cache don't do too much
> good with multiple-user scenarios--the portion of the WHERE clause
> that filters results by user permissions is almost guaranteed to be
> different with each user.
It depends how and where you've flattened the data, and how it's
indexed. [hypothetical] If your access is determined by a record code
that links to a table of values that are used as a bitwise array to
check against specific flags in a user-record bitwise array, then yes,
it's going to be horrible. If you're using a float-value to describe the
"sensitivity" of a record and comparing that to another float value,
that's going to suck too. If you've got a massive intersection table
between your data and your user records, that caches poorly if the whole
intersection table doesn't fit in the cache and you've got multiple
users fussing over the table at the same time, but may work just fine if
there's usually only one person grovelling over the list of records at a
time. Similarly if you've got nearly unique views over the data for each
user, limiting which records they can see, and also have more than one
working at the same time. Some of those scenarios may sound silly, but
every one of them is a highly-granular permissions scheme that's been
proposed to me at one point or another in the past. ... And, of course,
there's the factor that most of the time, tracing back the development
of the very granular user permissions ended up showing that the problem
and requirements were very different from the scheme under
consideration.
--
When you have a thermic lance, everything looks like hours of fun.
-- Christian Wagner in a.s.r
|
|
|
Re: Record pagination with count of total rows [message #170504 is a reply to message #170502] |
Wed, 03 November 2010 14:09 |
Matthew Leonhardt
Messages: 9 Registered: November 2010
Karma: 0
|
Junior Member |
|
|
"Jerry Stuckle" <jstucklex(at)attglobal(dot)net> wrote in message
news:iarnmg$rar$1(at)news(dot)eternal-september(dot)org...
> On 11/3/2010 8:14 AM, Matthew Leonhardt wrote:
>> "Norman Peelman"<npeelman(at)cfl(dot)rr(dot)com> wrote in message
>> news:iaqvjv$l1s$1(at)news(dot)eternal-september(dot)org...
>>
>> [snip]
>>
>>> Other options to consider:
>>>
>>> MySQL - make sure that the result (query) buffer is sufficiently large
>>> enough to hold many such result sets so that the full scan queries run
>>> once and your paginated queries pull from the buffer as MySQL doesn't go
>>> back to disk if the result set is already buffered (unless something has
>>> changed). If something has changed MySQL should go back to disk for the
>>> result set - this should be handled by MySQL automatically.
>>> Using 'LIMIT offset, numrows' during pagination should dramatically
>>> reduce query time when the full result set is already in the buffer.
>>
>> Like I said, DB server configuration is being explored in parallel. I
>> don't
>> want this discussion to get off topic.
>>
>>> PHP - use APC (Alternative PHP Cache) to store result sets in shared
>>> memory with reasonable time to live. Probably about the same as using a
>>> session variable but faster/less resources when multiple people doing
>>> same
>>> query (how often would that happen?). This could allow (with the proper
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Perhaps I should have clarified. Not often.
>>> logic) users to store/bookmark queries and come back to them without
>>> hitting the database again (good for queries of data that may not
>>> change).
>>
>> We have our own caching mechanism in place, however, I'll check out APC.
>> I
>> doubt this will help us much, however. Our permissions are very granular
>> and there are rarely two users that have the same permissions (and would
>> see
>> the same data returned from the query). For this reason, also, MySQL's
>> buffers and cache don't do too much good with multiple-user
>> scenarios--the
>> portion of the WHERE clause that filters results by user permissions is
>> almost guaranteed to be different with each user.
>>
>>
>
> You missed the point. When the user displays the second page, the same
> query would be run - and if the db buffer is big enough, the data will
> already be there, making the query very quick.
No I didn't. I was specifically addressing his question about multiple
users executing identical queries.
> And if you have multiple people running concurrently, increase the db
> buffer accordingly.
Agreed, but OT.
|
|
|
Re: Record pagination with count of total rows [message #170505 is a reply to message #170503] |
Wed, 03 November 2010 14:26 |
Matthew Leonhardt
Messages: 9 Registered: November 2010
Karma: 0
|
Junior Member |
|
|
"Peter H. Coffin" <hellsop(at)ninehells(dot)com> wrote in message
news:slrnid2pkh(dot)1g0(dot)hellsop(at)abyss(dot)ninehells(dot)com...
> On Wed, 3 Nov 2010 08:14:00 -0400, Matthew Leonhardt wrote:
>>
>> "Norman Peelman" <npeelman(at)cfl(dot)rr(dot)com> wrote in message
>> news:iaqvjv$l1s$1(at)news(dot)eternal-september(dot)org...
>>
>> [snip]
>>
>>> Other options to consider:
>>>
>>> MySQL - make sure that the result (query) buffer is sufficiently large
>>> enough to hold many such result sets so that the full scan queries run
>>> once and your paginated queries pull from the buffer as MySQL doesn't go
>>> back to disk if the result set is already buffered (unless something has
>>> changed). If something has changed MySQL should go back to disk for the
>>> result set - this should be handled by MySQL automatically.
>>> Using 'LIMIT offset, numrows' during pagination should dramatically
>>> reduce query time when the full result set is already in the buffer.
>>
>> Like I said, DB server configuration is being explored in parallel. I
>> don't
>> want this discussion to get off topic.
>
> I can understand that, but you've got one problem, not a problem of
> code, a problem of db layout, and a problem of db server config that are
> separate. What you do in one may/will affect what you do for the others.
I'll grant you that--I also don't want to anger the c.l.p topicality police
:)
>>> PHP - use APC (Alternative PHP Cache) to store result sets in shared
>>> memory with reasonable time to live. Probably about the same as using a
>>> session variable but faster/less resources when multiple people doing
>>> same
>>> query (how often would that happen?). This could allow (with the proper
>>> logic) users to store/bookmark queries and come back to them without
>>> hitting the database again (good for queries of data that may not
>>> change).
>>
>> We have our own caching mechanism in place, however, I'll check out APC.
>> I
>> doubt this will help us much, however. Our permissions are very granular
>> and there are rarely two users that have the same permissions (and would
>> see
>> the same data returned from the query).
>
> This smells like trouble too. At least such that the security that it
> may afford is likely to adversely affect performance. But that'd require
> a lot of detail about precisely what you're limiting from whom and with
> which and by how. Mostly, people are unwilling to share that kind of
> information.
Basically, we've added three fields to each primary table that we search
against. These fields are fk's to our permission table. Users carry a data
structure enveloping these permission keys in their session as well, so a
typical query looks like this:
SELECT ... FROM ...
WHERE fkPermission_area IN (...)
AND fkPermission_dept IN (...)
AND fkPermission_bus IN (...)
Where permissions are divided into three categories: geographical area,
department, and business type.
Without getting into the specifics of our busines, a hypothetical query
(with values substituted for integer keys...) would look like this:
WHERE area IN ('Nevada', 'New Mexico')
AND dept IN ('Sales', 'Purchasing')
AND bus IN ('Retail')
>> For this reason, also, MySQL's buffers and cache don't do too much
>> good with multiple-user scenarios--the portion of the WHERE clause
>> that filters results by user permissions is almost guaranteed to be
>> different with each user.
>
> It depends how and where you've flattened the data, and how it's
> indexed.
Right now we're basically fully normalized and well indexed.
> [hypothetical] If your access is determined by a record code
> that links to a table of values that are used as a bitwise array to
> check against specific flags in a user-record bitwise array, then yes,
> it's going to be horrible. If you're using a float-value to describe the
> "sensitivity" of a record and comparing that to another float value,
> that's going to suck too. If you've got a massive intersection table
> between your data and your user records, that caches poorly if the whole
> intersection table doesn't fit in the cache and you've got multiple
> users fussing over the table at the same time, but may work just fine if
> there's usually only one person grovelling over the list of records at a
> time.
These are not representative of our situation, however...
> Similarly if you've got nearly unique views over the data for each
> user, limiting which records they can see, and also have more than one
> working at the same time.
This is something I'm actually considering:
1. User requests a report
2. If not exists, flattened memory table gets created just for that user
3. Check validity of temp table data (set by insert/update/delete logic)
4. If temp table data still valid, feed results from there
5. If temp table data not valid, drop temp table, go to step 2.
> Some of those scenarios may sound silly, but
> every one of them is a highly-granular permissions scheme that's been
> proposed to me at one point or another in the past. ... And, of course,
> there's the factor that most of the time, tracing back the development
> of the very granular user permissions ended up showing that the problem
> and requirements were very different from the scheme under
> consideration.
We've been building apps in this suite for a couple of years now. We're
pretty solid on permission scheme requirements. We've finally settled on
something that offers the flexibility that we need. The drawback of course
is that few queries are fed out of cache, although I don't have hard numbers
on query_cache_hits and Com_select offhand.
|
|
|
Re: Record pagination with count of total rows [message #170506 is a reply to message #170504] |
Wed, 03 November 2010 16:48 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 11/3/2010 10:09 AM, Matthew Leonhardt wrote:
> "Jerry Stuckle"<jstucklex(at)attglobal(dot)net> wrote in message
> news:iarnmg$rar$1(at)news(dot)eternal-september(dot)org...
>> On 11/3/2010 8:14 AM, Matthew Leonhardt wrote:
>>> "Norman Peelman"<npeelman(at)cfl(dot)rr(dot)com> wrote in message
>>> news:iaqvjv$l1s$1(at)news(dot)eternal-september(dot)org...
>>>
>>> [snip]
>>>
>>>> Other options to consider:
>>>>
>>>> MySQL - make sure that the result (query) buffer is sufficiently large
>>>> enough to hold many such result sets so that the full scan queries run
>>>> once and your paginated queries pull from the buffer as MySQL doesn't go
>>>> back to disk if the result set is already buffered (unless something has
>>>> changed). If something has changed MySQL should go back to disk for the
>>>> result set - this should be handled by MySQL automatically.
>>>> Using 'LIMIT offset, numrows' during pagination should dramatically
>>>> reduce query time when the full result set is already in the buffer.
>>>
>>> Like I said, DB server configuration is being explored in parallel. I
>>> don't
>>> want this discussion to get off topic.
>>>
>>>> PHP - use APC (Alternative PHP Cache) to store result sets in shared
>>>> memory with reasonable time to live. Probably about the same as using a
>>>> session variable but faster/less resources when multiple people doing
>>>> same
>>>> query (how often would that happen?). This could allow (with the proper
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Perhaps I should have clarified. Not often.
>
>>>> logic) users to store/bookmark queries and come back to them without
>>>> hitting the database again (good for queries of data that may not
>>>> change).
>>>
>>> We have our own caching mechanism in place, however, I'll check out APC.
>>> I
>>> doubt this will help us much, however. Our permissions are very granular
>>> and there are rarely two users that have the same permissions (and would
>>> see
>>> the same data returned from the query). For this reason, also, MySQL's
>>> buffers and cache don't do too much good with multiple-user
>>> scenarios--the
>>> portion of the WHERE clause that filters results by user permissions is
>>> almost guaranteed to be different with each user.
>>>
>>>
>>
>> You missed the point. When the user displays the second page, the same
>> query would be run - and if the db buffer is big enough, the data will
>> already be there, making the query very quick.
>
> No I didn't. I was specifically addressing his question about multiple
> users executing identical queries.
>
>> And if you have multiple people running concurrently, increase the db
>> buffer accordingly.
>
> Agreed, but OT.
>
>
>
Yes, but the you have a performance problem involving both PHP and
MySQL. The two work together to give you your result, and your solution
will most probably involve both - so you must look at them together, not
one at a time.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|