Saturday, August 8, 2009

Manual model building TeamQuest Model gotcha

A while back I started to play around with Perl::PDQ and decided in order to help me learn the R language that I'd just stick with PDQ-R for now. At work I don't have a system yet in place where I can perform measurements and comparisons of actual response times versus PDQ-R results.

I did the best next thing for me, which was to compare the results in a case study. As with some of my previous blog entries I was using case studies from Performance By Design.

For my first attempt of PDQ-R coding I was using the baseline from Chapter 5: Case Study I: A Database Service. This chapter is pretty nifty because it covers the topic of cluster analysis for workload analysis with three classes of workload. That led me to learning the basics of cluster analysis with R.

Here was my solution for the baseline found in 5.4 of Performance By Design.




library("pdq");

# Total req/sec into the open queue model
lambda_into_system <- 1.33;

# Split up the req/sec into each class based upon
# previously supplied ratios

coeff <- c(0.33/1.33, 0.53/1.33, 0.47/1.33);

lambda <- coeff * lambda_into_system;

# Class1,Class2,Class3
cpu_demand <- c(0.096, 0.615, 0.193);
disk1_demand <- c(0.088, 0.683, 0.763);
disk2_demand <- c(0.119, 0.795, 0.400);

workStreamName <- 1:3;


Init("Chapter 5.4");

for (n in 1:3) {
workStreamName[n] <- sprintf("class_%d", n);
CreateOpen(workStreamName[n], lambda[n]);
};

CreateNode("CPU", CEN, FCFS);
CreateNode("Disk1", CEN, FCFS);
CreateNode("Disk2", CEN, FCFS);

for (n in 1:3) {
SetDemand("CPU", workStreamName[n], cpu_demand[n] );
SetDemand("Disk1", workStreamName[n], disk1_demand[n]);
SetDemand("Disk2", workStreamName[n], disk2_demand[n]);
};

SetWUnit("Trans");
SetTUnit("Second");

Solve(CANON);
#Report();

response <- 1:3;

for (n in 1:3) {
response[n] <- GetResponse(TRANS, workStreamName[n] );
};

for (n in 1:3) {
print(sprintf(" PDQ-R computation shows response time for class %d is %f seconds", n, response[n]));
};


Which yields the results of:



> source('baseline.R')
[1] " PDQ-R computation shows response time for class 1 is 0.864179 seconds"
[1] " PDQ-R computation shows response time for class 2 is 6.105397 seconds"
[1] " PDQ-R computation shows response time for class 3 is 4.535833 seconds"


These values match the results in the book and the available Excel spreadsheet the authors developed in support of the book.

After I coded my PDQ-R solution I started to look into TeamQuest Model for capacity analysis and decided to do the same case study and see if the numbers agreed between all three sources.

I developed my TeamQuest Model and setup the visits and service time based upon the computed service demand for all three classes of workload:



However, the results did not match at all!



I contacted the TeamQuest folks with what I had done and showed my initial work both with PDQ-R and computations by hand and figured that I was doing something wrong with TeamQuest Model. I wanted to know what was up with the difference in values.

TeamQuest tech support finally got back to me with the solution. It appears that TeamQuest Model expects the service time to be set to 0.001 seconds and the number of visits modified to meet the service demand desired. I must've overlooked that in the TeamQuest Model tutorial but sure enough, it works. After sending some e-mails back and forth with TeamQuest it appears that the 0.001 second service time limit is only for CPUs. I am told that it is OK to use the actual visits and service times with AR IO's, but I haven't tested it out yet.



Results with:



Apparently if the TeamQuest agent is used to automatically create models based upon hardware and system utilization this is automatically set and is a non-issue. It's only when building a model from scratch from the ground up that it shows up. It's an easy fix but an unexpected issue.

No comments: