Sunday, August 2, 2009

Hey, everybody! Let's analyze the performance of a three tier system with PDQ-R!

This post is a continuation of this post that I did earlier this week.

Continuing the solution for Case Study IV: An E-Business Service from the book, Performance by Design: Computer Capacity Planning By Example.

In the previous post I discuss how to take the transition probability matrix and work backwards toward the original series of linear equations for solving the number of visits to a series of web pages. In this case study there are actually two types of visitors that result in two transition probability matrices that must be utilized. There are 25% of Type A visitors and 75% of Type B visitors.

Each tier of the hypothetical e-biz service is made up by a single CPU and a single disk drive. A matrix is supplied with the total service demand for each component by each page that is hit by visitors.

While it is simple to write some code to analyze web logs to generate the transition probability matrix based upon customer traffic it is very difficult to isolate the total demand at each component with chaotic customer traffic. But that is why we have load testing tools that are available to us. In a pseudo-production environment we are capable of simulating customer traffic to one page at a time and calculating the total demand for components. In this particular case only the CPU and disk drives are being modeled but for a real service we'd want to model the CPU, disk drives, memory system, network system, etc.

After running simulated customer traffic against isolated page hits we could generate a similar demand matrix for components and use it for what-if analysis.

I went ahead and kept my solution in R even though I saw where I thought that a perl solution would be more elegant (did I just use perl and elegant in the same sentence?) I designed my solution with two separate programs, one to spit out numbers and another to generate graphs of page response times and another showing component utilization. Both pieces of code make use of PDQ-R and allow for a variable number of web servers, application servers and database servers.




# Solution parameters

gamma <- 10.96; # Rate into system
numWS <- 1; # Number of Web Servers
numAS <- 1; # Number of Application Servers
numDS <- 1; # Number of Database Servers

# external library
library("pdq");

# Constants #

E <- 1;
H <- 2;
S <- 3;
V <- 4;
G <- 5;
C <- 6;
B <- 7;
X <- 8;

PAGE_NAMES <- c("Enter", "HomePage", "Search", "ViewBids", "Login", "CreateAuction", "PlaceBid", "Exit");
COMPONENTS <- c("CPU", "Disk");
SERVER_TYPES <- c("WS", "AS", "DS");

WS_CPU <- 1;
WS_DISK <- 2;
AS_CPU <- 3;
AS_DISK <- 4;
DS_CPU <- 5;
DS_DISK <- 6;

# Functions used in solution

VisitsByTransitionMatrix <- function(M, B) {
A <- t(M);
A <- -1 * A;
for (i in 1:sqrt(length(A))) {
j <- i;
A[i,j] <- A[i,j] + 1;
};
return(solve(A,B));
};

CalculateLambda <- function(gamma, f_a, f_b, V_a, V_b, index) {
return (
gamma*((f_a*V_a[index]) + (f_b*V_b[index]))
);
};


f_a <- 0.25; # Fraction of TypeA users
f_b <- 1 - f_a; # Fraction of TypeB users

lambda <- 1:X; # Array of lambda for each page

SystemInput <- matrix(c(1,0,0,0,0,0,0,0),nrow=8,ncol=1) # 8.3, Figure 8.2, page 208
TypeA <- matrix(c(0,1,0,0,0,0,0,0,0,0,0.7,0,0.1,0,0,
0.2,0,0,0.4,0.2,0.15,0,0,0.25,0,0,
0,0,0.65,0,0,0.35,0,0,0,0,0,0.3,0.6,
0.1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0), ncol=8, nrow=8, byrow=TRUE); # 8.4, Table 8.1, page 209
TypeB <- matrix(c(0,1,0,0,0,0,0,0,0,0,0.7,0,0.1,0,0,
0.2,0,0,0.45,0.15,0.1,0,0,0.3,0,0,
0,0,0.4,0,0,0.6,0,0,0,0,0,0.3,0.55,
0.15,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,0), nrow=8, ncol=8, byrow=TRUE); # 8.4, Table 8.2, page 210
DemandTable <- matrix(c(0,0.008,0.009,0.011,0.06,0.012,0.015,
0,0,0.03,0.01,0.01,0.01,0.01,0.01,0,
0,0,0.03,0.035,0.025,0.045,0.04,0,0,
0,0.008,0.08,0.009,0.011,0.012,0,0,
0,0.01,0.009,0.015,0.07,0.045,0,0,0,
0.035,0.018,0.05,0.08,0.09,0), ncol=8, nrow=6, byrow=TRUE); # 8.4, Table 8.4, page 212 (with modifications)

VisitsA <- VisitsByTransitionMatrix(TypeA, SystemInput);
VisitsB <- VisitsByTransitionMatrix(TypeB, SystemInput);

lambda[E] <- 0; # Not used in calculations
lambda[H] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, H);
lambda[S] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, S);
lambda[V] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, V);
lambda[G] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, G);
lambda[C] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, C);
lambda[B] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, B);
lambda[X] <- 0 # Not used in calculations

Init("e_biz_service");

# Define workstreams

for (n in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[n]);
CreateOpen(workStreamName, lambda[n]);
};

# Define Web Server Queues

for (i in 1:numWS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("WS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};

# Define Application Server Queues

for (i in 1:numAS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("AS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};

# Define Database Server Queues

for (i in 1:numDS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("DS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};

# Set Demand for the Web Servers

for (i in 1:numWS) {
demandIndex <- WS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("WS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numWS);
};
};
};

# Set Demand for the App Servers

for (i in 1:numAS) {
demandIndex <- AS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("AS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numAS);
};
};
};

# Set Demand for the Database Servers

for (i in 1:numDS) {
demandIndex <- DS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("DS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numDS);
};
};
};

SetWUnit("Trans");
SetTUnit("Second");

Solve(CANON);

print("Arrival Rates for each page:");

for (i in H:B) {
print(sprintf("%s = %f", PAGE_NAMES[i], lambda[i]));
};

print("[-------------------------------------------------]");

print("Page Response Times");

for (i in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[i]);
print(sprintf("%s = %f seconds.", PAGE_NAMES[i], GetResponse(TRANS, workStreamName)));
};

print("[-------------------------------------------------]");

print("Component Utilizations");

for (i in 1:numWS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("WS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
};
print(sprintf("%s = %3.2f %%", nodeName, totalUtilization * 100));
};
};

for (i in 1:numAS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("AS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
};
print(sprintf("%s = %3.2f %%", nodeName, totalUtilization * 100));
};
};

for (i in 1:numDS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("DS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
};
print(sprintf("%s = %3.2f %%", nodeName, totalUtilization * 100));
};
};



Here is a bit of sample output with 10.96 users entering the system per second:




[1] "Arrival Rates for each page:"
[1] "HomePage = 10.960000"
[1] "Search = 13.658485"
[1] "ViewBids = 2.208606"
[1] "Login = 3.664958"
[1] "CreateAuction = 1.099487"
[1] "PlaceBid = 2.074180"
[1] "[-------------------------------------------------]"
[1] "Page Response Times"
[1] "HomePage = 0.083517 seconds."
[1] "Search = 1.612366 seconds."
[1] "ViewBids = 1.044683 seconds."
[1] "Login = 2.323417 seconds."
[1] "CreateAuction = 3.622690 seconds."
[1] "PlaceBid = 3.983755 seconds."
[1] "[-------------------------------------------------]"
[1] "Component Utilizations"
[1] "WS_1_CPU = 49.91 %"
[1] "WS_1_Disk = 55.59 %"
[1] "AS_1_CPU = 71.11 %"
[1] "AS_1_Disk = 35.59 %"
[1] "DS_1_CPU = 38.17 %"
[1] "DS_1_Disk = 97.57 %"


Take a look at that database server disk utilization. Almost 100%! That isn't any good. Let's run the model with two database servers just to ease up on the poor drives!

I modify the line that reads "numDS <- 1; # Number of Database Servers" to read "numDS <- 2; # Number of Database Servers" and let 'er rip:




[1] "Arrival Rates for each page:"
[1] "HomePage = 10.960000"
[1] "Search = 13.658485"
[1] "ViewBids = 2.208606"
[1] "Login = 3.664958"
[1] "CreateAuction = 1.099487"
[1] "PlaceBid = 2.074180"
[1] "[-------------------------------------------------]"
[1] "Page Response Times"
[1] "HomePage = 0.083517 seconds."
[1] "Search = 0.237452 seconds."
[1] "ViewBids = 0.336113 seconds."
[1] "Login = 0.358981 seconds."
[1] "CreateAuction = 0.462042 seconds."
[1] "PlaceBid = 0.440903 seconds."
[1] "[-------------------------------------------------]"
[1] "Component Utilizations"
[1] "WS_1_CPU = 49.91 %"
[1] "WS_1_Disk = 55.59 %"
[1] "AS_1_CPU = 71.11 %"
[1] "AS_1_Disk = 35.59 %"
[1] "DS_1_CPU = 19.09 %"
[1] "DS_1_Disk = 48.78 %"
[1] "DS_2_CPU = 19.09 %"
[1] "DS_2_Disk = 48.78 %"


That's better. There is a significant reduction in page response time to boot:




Page 1 DS 2 DS Diff
HomePage 0.083517 0.083517 0
Search 1.612366 0.237452 -1.374914
ViewBids 1.044683 0.336113 -0.70857
Login 2.323417 0.358981 -1.964436
CreateAuction 3.62269 0.462042 -3.160648
PlaceBid 3.983755 0.440903 -3.542852


Holy smoke! Adding that second database server makes a heck of a difference, doesn't it?

Being able to pull up numbers is great, but it doesn't have the impact that a good graph does. I love graphs! They are so great for conveying information to non-technical folks.

So, to do that I took my original solution and did a code spin, fold and mutilation to allow the generation of graphs. Here is the code that I wrote to generate the graphs.




# Solution parameters

maxGamma <- 11.2 # Maximum rate into system
steppingValue <- 0.1;

numWS <- 1; # Number of Web Servers
numAS <- 1; # Number of Application Servers
numDS <- 1; # Number of Database Servers

# external library
library("pdq");

# Constants

E <- 1;
H <- 2;
S <- 3;
V <- 4;
G <- 5;
C <- 6;
B <- 7;
X <- 8;

PAGE_NAMES <- c("Enter", "HomePage", "Search", "ViewBids", "Login", "CreateAuction", "PlaceBid", "Exit");
COMPONENTS <- c("CPU", "Disk");
SERVER_TYPES <- c("WS", "AS", "DS");

WS_CPU <- 1;
WS_DISK <- 2;
AS_CPU <- 3;
AS_DISK <- 4;
DS_CPU <- 5;
DS_DISK <- 6;

# Functions used in solution

VisitsByTransitionMatrix <- function(M, B) {
A <- t(M);
A <- -1 * A;
for (i in 1:sqrt(length(A))) {
j <- i;
A[i,j] <- A[i,j] + 1;
};
return(solve(A,B));
};

CalculateLambda <- function(gamma, f_a, f_b, V_a, V_b, index) {
return (
gamma*((f_a*V_a[index]) + (f_b*V_b[index]))
);
};


f_a <- 0.25; # Fraction of TypeA users
f_b <- 1 - f_a; # Fraction of TypeB users

lambda <- 1:X; # Array of lambda for each page

SystemInput <- matrix(c(1,0,0,0,0,0,0,0),nrow=8,ncol=1) # 8.3, Figure 8.2, page 208
TypeA <- matrix(c(0,1,0,0,0,0,0,0,0,0,0.7,0,0.1,0,0,
0.2,0,0,0.4,0.2,0.15,0,0,0.25,0,0,
0,0,0.65,0,0,0.35,0,0,0,0,0,0.3,0.6,
0.1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0), ncol=8, nrow=8, byrow=TRUE); # 8.4, Table 8.1, page 209
TypeB <- matrix(c(0,1,0,0,0,0,0,0,0,0,0.7,0,0.1,0,0,
0.2,0,0,0.45,0.15,0.1,0,0,0.3,0,0,
0,0,0.4,0,0,0.6,0,0,0,0,0,0.3,0.55,
0.15,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,0), nrow=8, ncol=8, byrow=TRUE); # 8.4, Table 8.2, page 210
DemandTable <- matrix(c(0,0.008,0.009,0.011,0.06,0.012,0.015,
0,0,0.03,0.01,0.01,0.01,0.01,0.01,0,
0,0,0.03,0.035,0.025,0.045,0.04,0,0,
0,0.008,0.08,0.009,0.011,0.012,0,0,
0,0.01,0.009,0.015,0.07,0.045,0,0,0,
0.035,0.018,0.05,0.08,0.09,0), ncol=8, nrow=6, byrow=TRUE); # 8.4, Table 8.4, page 212 (with modifications)

VisitsA <- VisitsByTransitionMatrix(TypeA, SystemInput);
VisitsB <- VisitsByTransitionMatrix(TypeB, SystemInput);

numSteps <- (maxGamma/steppingValue)+1;
#numSteps <- (maxGamma/steppingValue);
numElements <- numSteps*X;
numUElements <- numSteps*(3*length(COMPONENTS));

steppingArray <- 1:numSteps;
responseArray <- 1:numElements;
utilArray <- 1:numUElements;

responseArray <- responseArray * 0;
utilArray <- utilArray * 0;

componentList <- length(COMPONENTS)*length(SERVER_TYPES);

entryNumber <- 1;
for (serverType in SERVER_TYPES) {
for (serverComponent in COMPONENTS) {
componentList[entryNumber] <- sprintf("%s %s", serverType, serverComponent);
entryNumber <- entryNumber + 1;
};
};

dim(responseArray) = c(X, round(numElements/X));
dim(utilArray) = c(3*length(COMPONENTS), round(numUElements/(3*length(COMPONENTS))));

loopCount <- 1;

for (gamma in seq(0, maxGamma, steppingValue)) {

steppingArray[loopCount] <- gamma;

lambda[E] <- 0; # Not used in calculations
lambda[H] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, H);
lambda[S] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, S);
lambda[V] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, V);
lambda[G] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, G);
lambda[C] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, C);
lambda[B] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, B);
lambda[X] <- 0 # Not used in calculations

Init("e_biz_service");

# Define workstreams

for (n in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[n]);
CreateOpen(workStreamName, lambda[n]);
};

# Define Web Server Queues

for (i in 1:numWS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("WS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};

# Define Application Server Queues

for (i in 1:numAS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("AS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};

# Define Database Server Queues

for (i in 1:numDS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("DS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};

# Set Demand for the Web Servers

for (i in 1:numWS) {
demandIndex <- WS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("WS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numWS);
};
};
};

# Set Demand for the App Servers

for (i in 1:numAS) {
demandIndex <- AS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("AS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numAS);
};
};
};

# Set Demand for the Database Servers

for (i in 1:numDS) {
demandIndex <- DS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("DS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numDS);
};
};
};

Solve(CANON);

for (i in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[i]);
responseArray[i, loopCount] <- GetResponse(TRANS, workStreamName);
};

uArrayEntries <- 0;
for (i in 1:numWS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("WS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
if (i == 1) {
utilArray[uArrayEntries+j, loopCount] <- totalUtilization*100;
};
};
};
};

uArrayEntries <- 2;
for (i in 1:numAS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("AS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
if (i == 1) {
utilArray[uArrayEntries+j, loopCount] <- totalUtilization * 100;
};
};
};
};

uArrayEntries <- 4;
for (i in 1:numDS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("DS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
if (i == 1) {
utilArray[uArrayEntries+j, loopCount] <- totalUtilization * 100;
};
};
};
};

loopCount <- loopCount + 1;

};

# Generate Response Time Graph

loopCount <- 0;
for (i in H:B) {
arrayLength = numElements/X;
if (loopCount == 0) {
jpeg(file=sprintf("response_time_%d_WS_%d_AS_%d_DS.jpg", numWS, numAS, numDS), height=768, width=1024, quality=100);
plot(steppingArray,
responseArray[i, 1:arrayLength],
xlab="Hits Per Second into System",
ylab="Response Time",
col=i,
ylim=c(0,ceiling(max(responseArray))),
type="l",
pch=i,
lwd=4);
title(main=sprintf("Page Response Times for E-Biz with %d WS, %d AS and %d DS", numWS, numAS, numDS), font.main=2);
} else {
lines(steppingArray, responseArray[i, 1:arrayLength], col=i, pch=i, lwd=4);
};

loopCount <- loopCount + 1;
};
legend(1, ceiling(max(responseArray)), PAGE_NAMES[H:B], col=H:B, lty=1, lwd=4, cex=1.2);
dev.off()

# Graph component utilization

loopCount <- 0;
for (i in 1:length(componentList)) {
arrayLength = numSteps;
if (loopCount == 0) {
jpeg(file=sprintf("resource_utilization_%d_WS_%d_AS_%d_DS.jpg", numWS, numAS, numDS), height=768, width=1024, quality=100);
plot(steppingArray,
utilArray[i, 1:arrayLength],
xlab="Hits Per Second into System",
ylab="Resource Utilization, %",
col=i,
ylim=c(0,100),
type="l",
pch=i,
lwd=4);
title(main=sprintf("Resource Utilization for E-Biz with %d WS, %d AS and %d DS", numWS, numAS, numDS), font.main=2);
} else {
lines(steppingArray, utilArray[i, 1:arrayLength], col=i, pch=i, lwd=4);
};
loopCount <- loopCount + 1;
};
legend(0, 100, componentList, col=1:length(componentList), lty=1, lwd=4, cex=1.2);
dev.off()


The code isn't what I would call elegant at this time as a lot of it is still hardcoded for the specific solution but it is a step in the right direction.

And let's take a look at response time between the single database server and dual load bearing database servers:



Take a look at the effect that the overworked database disk drive has on the response time. At 11.2 customers per second into the site and the response time has shot up to the 30 second mark. Definitely not what you want your customers to have to suffer through to be sure.

Here is a graph generated with the dual load bearing database server solution:



Holy guacamole is that a heck of an improvement or what? Any management type can look at these two graphs and immediately realize the impact to the system and the need for extra equipment.

And just for good measure, here is the resource utilization with both single and dual database servers:





Even a MBA grad can look at those graphs and realize that there is a problem that needs to be solved. w00t!

Ain't heuristic analysis fun?

Applying PDQ-R to this case study was a great exercise and I'm glad I undertook it. I can't wait to apply this to a real system and compare the results to see how well it works "in the real world." One thing that this type of heuristic analysis won't show is the interaction between pages with negative performance. In some circumstances I have seen performance programs between page that were thought to not be related. When Page A is put under duress Page B slows down even though it is thought that Page A is not related to Page B. Often it has been found that in a spaghetti line of object dependency that in some way the two pages were related. With the analysis of service demand of pages we can also find pages that have a lot of high service demand as well.

In the future along with page response times and normal metric reporting I think I'll add in service demand as well so that when pages do start to perform poorly hopefully the change in service demand will give an area to look into at the start of the analysis of the root cause of the performance problem to assist the developers.

Dr. Gunther has some more examples of applying Perl::PDQ to a variety of systems in his book, Analyzing Computer System Performance with Perl::PDQ. The source to the solutions in the book can be found in the PDQ download. But developing this solution from the ground up really drove some points home for me that will no doubt be useful in my future endeavors.

No comments: