Several weeks ago I purchased the eBook from O'Reilly called "The Art of Capacity Planning." I've always thought that load testing and capacity planning went hand-in-hand. One is not a replacement for the other but one can assist with the other. Load test helps out capacity planning by applying load to psuedo-production systems and capacity planning helps load testing by verifying results in load test against real world systems.
I finished "The Art of Capacity Planning" and wanted to read more on the subject and picked up a copy of "Guerrilla Capacity Planning" which has a lot more math than "The Art of Capacity Planning." One of the concepts is the Universal Scaling Law based on Amdhal's Law. Dr. Neil J Gunther is a smart cookie. He even has a Ph.D in Theoretical Physics which makes him closer to Gordon Freeman than I'll ever be! (Side question: Do Ph.D's in Theoretical Physics get crowbars at graduation?)
Anyhoo, in section 5.6.1 one of the methods in the book is to use Excel to do second degree polynomial regression for the calculation of two necessary coefficients, sigma and kappa. But when I tried to use Excel I was getting a negative value for sigma and one of the rules of the Universal Scaling Law is that the coefficients can never, ever, ever be negative. I just figured that I fat fingered something and tried it again and once again, got mismatching results.
I scratched my noggin, tried to figure out where I err'd and did some Googling and came across this entry of Dr. Gunther's blog:
Negative Scalability Coefficients in Excel
Because in Excel (and some other packages, like my TI-89) you can't put a constraint on the lower limits of the coefficient, you might from time to time get negative coefficients. But from reading the blog entry I see that other people are using R with success.
This is the first time that I've ever messed around with R for statistical purposes. In the past I've written some stat routines (years ago!) in C# for comparing before/after load testing results.
Here is how I used R from start to finish to gonkulate the coefficients.
Using the data from Section 5.3 I did the following in R:
First I defined my p array, which in the book is the number of processors used for ray tracing:
1: p <- c(1, 4, 8, 12, 16, 20, 24, 28, 32, 48, 64)
Then I defined my c array, which is the relative capacity for the number of processors used for ray tracing:
1: c <- c(1.0, 3.9, 6.5, 8.5, 9.5, 10.0, 10.5, 11.5, 13.0, 14.0, 15.5)
I combined both arrays into a data frame for later use.
1: df <- data.frame(p, c)
And when I check out the contents of df I get:
1: df
2: p c
3: 1 1 1.0
4: 2 4 3.9
5: 3 8 6.5
6: 4 12 8.5
7: 5 16 9.5
8: 6 20 10.0
9: 7 24 10.5
10: 8 28 11.5
11: 9 32 13.0
12: 10 48 14.0
13: 11 64 15.5
I can now use a non-linear regression routine with my data frame that I entered above.
1: usl <- nls(c ~ p/(1+sigma*(p-1)+kappa*p*(p-1)), df, algorithm="port", start=c(sigma=0.0, kappa=0.0), lower=c(0,0))
I can then access the coefficients by named index:
1: sigma <- coef(usl)["sigma"]
2: kappa <- coef(usl)["kappa"]
3:
4: sigma
5: sigma
6: 0.0497973
7:
8: kappa
9: kappa
10: 1.143404e-05
11:
Huzzah!
I can now interpolate the relative capacity based upon the USL and the coefficients that were previously gonkulated and add that to my current data frame, df, that I defined earlier. I do have to note that I was a slackard and did not apply the significant digits rules as outlined in Chapter 3 of "Guerrilla Capacity Planning."
1: df$proj_c <- p/(1 + sigma * (p - 1) + kappa * p * (p - 1))
There are the projected relative capacities. Yay!
1: df
2: p c proj_c
3: 1 1 1.0 1.000000
4: 2 4 3.9 3.479686
5: 3 8 6.5 5.929346
6: 4 12 8.5 7.745536
7: 5 16 9.5 9.144406
8: 6 20 10.0 10.253815
9: 7 24 10.5 11.154233
10: 8 28 11.5 11.898837
11: 9 32 13.0 12.524174
12: 10 48 14.0 14.259114
13: 11 64 15.5 15.298811
And here I will make a simple little graph of the actual versus projected relative capacity:
1: plot(p, c)
2: lines(p, proj_c)
And here is the graph that is generated:
Kinda nifty, eh?
I can see myself using R more in the future. I'd rather write routines for automagic analysis of data with R than write my own routines from the ground up.
No comments:
Post a Comment