I have just finished a round of throughput checks for LWP performance between versions 12.5.4 and 15.7 – threaded & process kernel mode. The comparison is a bit “unfair” since I compare 12.5.4 with statement cache disabled (it was in its diapers then), and 15.7 with the statement cache enabled + functionality group enabled as well. What interests me is not so much to compare apples to apples, but to see how the same code performs in what has been “the optimal” configuration for 12.5.4 and what would be “the optimal” configuration for 15.7. An arguable statement, but at least the intention is clear.
I’d summarize the findings in this manner: IF the 15.7 ASE succeeds to utilize all of its statement reuse arsenal (plan sharing, lwp reuse) and the statement cache is effective, it will indeed yield you performance boost. If it does not – it will under-perform 12.5.4 with statement cache disabled, threaded kernel faring better than the process mode.
Below are the figures:
12.5.4 – statement cache set to off, dynamic_prepare set to on:
15.7 – statement cache set to on, dynamic_prepare set to on:
The first graph is easier to read: what you get is 80-90% average engine load (10-engine ASE running 15 LWPs), the load results in ~8000 DYNP calls in 30 seconds, the client runs ~180 code loops each 4 minutes, the number of newly generated LWPs is around one a second.
I.e., for 12.5.4 the numbers are: ENG = 10, CPU = 90, DYNP/SEC = 270, NEWLWP/SEC = 1, THRP = 45.
For 15.7 the situation is more complex – the data in graph represents both the threaded and process kernel modes, statement cache sizing, number of engines/thread resize &c. In general, though, it may be summarized thus:
ENG = 10, KERN = process, CPU = 80, DYNP/SEC = 270, NEWLWP/SEC = 150, THRP = 38.
ENG = 20, KERN = process, CPU = 45, DYNP/SEC = 200, NEWLWP/SEC = 240, THRP = 35.
ENG = 10, KERN = threaded, CPU = 90, DYNP/SEC = 270, NEWLWP/SEC = 180, THRP = 32.
ENG = 20, KERN = threaded, CPU = 50, DYNP/SEC = 270, NEWLWP/SEC = 180, THRP = 28.
But, if the 15.7 succeeds to reuse the statements effectively – which is what the most of the statement cache optimization of the latest releases were aimed at – than we get a better throughput:
ENG = 20, KERN = threaded, CPU = 40, DYNP/SEC = 270, NEWLWP/SEC = 1, THRP = 50.
I will perform more sanitized and systematic tests on this in the following days (on different platforms), what is obvious by now is the following list of facts:
 Properly sized and effectively used statement cache brings about a throughput boost which was unavailable in previous releases of ASE. The meaning of “properly sized and effectively used” is that the cache has good hit ratio and there is a fairly low ratio of new LWPs generation (may be monitored both by watching the monStatementCache and monCachedProcedures, inter alia).
 Threaded kernel mode is by far more stable than process kernel mode – even if under certain types of stress the process mode may yield better throughput (see the previous post). It is easier to tune (compare configuring dynamically the number of online engines in threaded and process mode and see the difference).
 Undersized statement cache or ineffectively used statement cache is a problem. It will bring the throughput down. It will cause ASE to perform slightly worse than its earlier rock-solid release (12.5.4). It is very recommended to test thoroughly the performance of the statement cache and adjust its usage.
To be continued (systematized)…