Thursday, January 28, 2010

Using R to Plot Universal Scaling Law Curve Automagically

The past two days I was doing some benchmarking of a system that might be used to stub back end systems and I was very interested in generating an USL curve based upon the benchmark data I was producing.

In previous posts I had shown how to use R to generate the kappa and sigma coefficients for use with the USL equation.

I start with the data that I've compiled on the system based upon the number of concurrent threads and the throughput measured:



   1:  p,x

   2:  1,1

   3:  3,2.998108449

   4:  5,4.995428752

   5:  7,6.928278689

   6:  10,7.539249685

   7:  13,11.50488651

   8:  15,11.79476671

   9:  20,15.65936318

  10:  30,13.61349306

  11:  40,16.50567465

  12:  50,14.83716898


The number of threads is represented by "p" and the throughput is represented by "x"

I wrote the following R function to automagically create a plot of the data points for the above file and generate the USL curve along the data points and highlight the maximum theoretical point for the USL:



   1:  # Example usage:

   2:  # plotUSL("c:/benchmark/benchmark.csv", "Benchmark data with USL curve")

   3:  # CSV file must have two columns with a header of "p, x"

   4:  # Example:

   5:  #     p, x

   6:  #    1, 1

   7:  #    2, 1.5

   8:  #    3, 2

   9:   

  10:  plotUSL <- function(dataFile, graphTitle) {

  11:    uslData <- read.csv(dataFile, header=TRUE);

  12:    uslData$c <- uslData$x / uslData$x[1];

  13:    usl <- nls(c ~ p/(1+sigma*(p-1)+kappa*p*(p-1)),

  14:               uslData,

  15:               algorithm="port",

  16:               start=c(sigma=0.0, kappa=0.0),

  17:               lower=c(0,0));

  18:    sigma <- coef(usl)["sigma"];

  19:    kappa <- coef(usl)["kappa"];

  20:    p <- 1:round(1.75 * max(uslData$p));

  21:    Relative_Capacity <- p/(1+sigma*(p-1)+kappa*p*(p-1));

  22:    plot(p, Relative_Capacity, type="l", ylim=c(0, round(max(Relative_Capacity, uslData$c)*1.1)));

  23:    points(uslData$p, uslData$c, pch=20);

  24:   

  25:    indexValue <- 1;

  26:    testValue  <- Relative_Capacity[1];

  27:    maxValue   <- max(Relative_Capacity);

  28:   

  29:    while(testValue != maxValue) {

  30:      indexValue = indexValue + 1;

  31:      testValue = Relative_Capacity[indexValue];

  32:    };

  33:   

  34:    points(p[indexValue], Relative_Capacity[indexValue], col=2, pch=13, cex=2.0);

  35:   

  36:    title(main=graphTitle, col.main="black", font.main=4, cex=1.5);

  37:    title(sub=sprintf("USL max at p = %d with Relative_Capacity = %f",

  38:          indexValue,

  39:          Relative_Capacity[indexValue]), 

  40:          col.main="black", 

  41:          font.main=4, 

  42:          cex=1.5);

  43:  };



And viola, we can quick and dirty results after invoking the function!

> plotUSL("g:/usl/benchmark.csv", "Benchmark data with USL curve")



This function could easily be modified to accept a third parameter and save off the image to a JPG, PNG or even PDF. But I leave that as an exercise for the reader.