Friday, August 28, 2009

Hey, everybody! Let's use SimPy to simulate performance of a three tier eBiz solution!

Back in this blog post, I wrote about using PDQ-R to analyze the performance of a three tier eBiz solution.

I had been getting into learning how to simulate queuing networks with SimPy and this week I got my three tier SimPy solution up and running and crunching numbers.

The SimPy solution was MUCH more involved than the PDQ solution and it took me a bit to wrap my head around all the goodies I needed to know before I started getting good results.

I started by looking at the jackson.py code and got an idea of how write the code that was needed to get the job done.

Like the jackson.py code, I also had three computers but unlike the jackson.py code I had two nodes per computer that represented the CPU and Disk. Get that setup was the real tricky part and I had to bang my wall against the SimPy brick wall for a while until I broke through and figured the solution out.

Like the jackson.py code, I use a Process to control the rate that I hit web pages by invoking another process that sets up the hits to the individual components that are also processes. One of the big lessons that I learned is that if the service demand for a component is zero, don't make a yield request call for the component resource.

You'd think that I would have figured that out earlier but it totally slipped past me and it was really kicking my buttocks. Most of the page response times were looking good but in the PDQ analysis, the HomePage didn't have a long response time due to there only being service demand on the Web CPU and Disk and I wasn't seeing that in my SimPy solution.

Here is what I was seeing in my SimPy solution for page response time for the Home page:



I had some other chores to do and as I was walking past my machine I noticed out of the corner of my eye some code and it hit me like a ton of bricks. I was making a yield request for each component even if there was no service demand for the component and that yield request was a blocking operation. Well, duh! A few if statements fixed that problem and a quick test showed that the Home Page response time was no diverging from the PDQ analysis like in the above graph!



Wow, what a difference!

The biggest things that I got out of doing the SimPy solution was that all component requests need to be in their own Processes and if there is no service demand for the component then it should be called at all.

Perhaps next week I'll add the queuing output and component utilization into my SimPy code for more left/right comparisons against PDQ. But over all, I'm fairly happy with my solution but there is some code that needs to be cleaned up.

Down below is my code and the comparisons of my SimPy solution versus my PDQ solution.



#!/usr/bin/env python

from SimPy.Simulation import *
from random import Random, expovariate, uniform

class Metrics:

metrics = dict()

def Add(self, metricName, frameNumber, value):
if self.metrics.has_key(metricName):
if self.metrics[metricName].has_key(frameNumber):
self.metrics[metricName][frameNumber].append(value)
else:
self.metrics[metricName][frameNumber] = list()
self.metrics[metricName][frameNumber].append(value)
else:
self.metrics[metricName] = dict()
self.metrics[metricName][frameNumber] = list()
self.metrics[metricName][frameNumber].append(value)

def Keys(self):
return self.metrics.keys()

def Mean(self, metricName):
valueArray = list()
if self.metrics.has_key(metricName):
for frame in self.metrics[metricName].keys():
for values in range(len(self.metrics[metricName][frame])):
valueArray.append(self.metrics[metricName][frame][values])

sum = 0.0
for i in range(len(valueArray)):
sum += valueArray[i]

if len(self.metrics[metricName][frame]) != 0:
return sum/len(self.metrics[metricName])
else:
return 0 # Need to learn python throwing exceptions
else:
return 0

class RandomPath:

def RowSum(self, Vector):
rowSum = 0.0
for i in range(len(Vector)):
rowSum += Vector[i]
return rowSum

def NextPage(self, T, i):
rowSum = self.RowSum(T[i])
randomValue = G.Rnd.uniform(0, rowSum)

sumT = 0.0

for j in range(len(T[i])):
sumT += T[i][j]
if randomValue < sumT:
break

return j

class G:

numWS = 1
numAS = 1
numDS = 2

Rnd = random.Random(12345)

PageNames = ["Entry", "Home", "Search", "View", "Login", "Create", "Bid", "Exit" ]

Entry = 0
Home = 1
Search = 2
View = 3
Login = 4
Create = 5
Bid = 6
Exit = 7

WS = 0
AS = 1
DS = 2

CPU = 0
DISK = 1

WS_CPU = 0
WS_DISK = 1
AS_CPU = 2
AS_DISK = 3
DS_CPU = 4
DS_DISK = 5

metrics = Metrics()

# e h s v l c b e
HitCount = [0, 0, 0, 0, 0, 0, 0, 0]

Resources = [[ Resource(1), Resource(1) ], # WS CPU and DISK
[ Resource(1), Resource(1) ], # AS CPU and DISK
[ Resource(1), Resource(1) ]] # DS CPU and DISK

# Enter Home Search View Login Create Bid Exit
ServiceDemand = [ [0.000, 0.008, 0.009, 0.011, 0.060, 0.012, 0.015, 0.000], # WS_CPU
[0.000, 0.030, 0.010, 0.010, 0.010, 0.010, 0.010, 0.000], # WS_DISK
[0.000, 0.000, 0.030, 0.035, 0.025, 0.045, 0.040, 0.000], # AS_CPU
[0.000, 0.000, 0.008, 0.080, 0.009, 0.011, 0.012, 0.000], # AS_DISK
[0.000, 0.000, 0.010, 0.009, 0.015, 0.070, 0.045, 0.000], # DS_CPU
[0.000, 0.000, 0.035, 0.018, 0.050, 0.080, 0.090, 0.000] ] # DS_DISK

# Type B shopper
# 0 1 2 3 4 5 6 7
TransitionMatrix = [ [0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00], # 0
[0.00, 0.00, 0.70, 0.00, 0.10, 0.00, 0.00, 0.20], # 1
[0.00, 0.00, 0.45, 0.15, 0.10, 0.00, 0.00, 0.30], # 2
[0.00, 0.00, 0.00, 0.00, 0.40, 0.00, 0.00, 0.60], # 3
[0.00, 0.00, 0.00, 0.00, 0.00, 0.30, 0.55, 0.15], # 4
[0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00], # 5
[0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00], # 6
[0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00] ] # 7

class DoWork(Process):

def __init__(self, i, resource, serviceDemand, nodeName, pageName):
Process.__init__(self)
self.frame = i
self.resource = resource
self.serviceDemand = serviceDemand
self.nodeName = nodeName
self.pageName = pageName

def execute(self):
StartUpTime = now()
yield request, self, self.resource
yield hold, self, self.serviceDemand
yield release, self, self.resource
R = now() - StartUpTime

G.metrics.Add(self.pageName, self.frame, R)

class CallPage(Process):

def __init__(self, i, node, pageName):
Process.__init__(self)
self.frame = i
self.StartUpTime = 0.0
self.currentPage = node
self.pageName = pageName

def execute(self):

if self.currentPage != G.Exit:

print >> sys.stderr, "Working on Frame # ", self.frame, " @ ", now() , " for page ", self.pageName

self.StartUpTime = now()

if G.ServiceDemand[G.WS_CPU][self.currentPage] > 0.0:
wsCPU = DoWork(self.frame, G.Resources[G.WS][G.CPU], G.ServiceDemand[G.WS_CPU][self.currentPage]/G.numWS, "wsCPU", self.pageName)
activate(wsCPU, wsCPU.execute())

if G.ServiceDemand[G.WS_DISK][self.currentPage] > 0.0:
wsDISK = DoWork(self.frame, G.Resources[G.WS][G.DISK], G.ServiceDemand[G.WS_DISK][self.currentPage]/G.numWS, "wsDISK", self.pageName)
activate(wsDISK, wsDISK.execute())

if G.ServiceDemand[G.AS_CPU][self.currentPage] > 0.0:
asCPU = DoWork(self.frame, G.Resources[G.AS][G.CPU], G.ServiceDemand[G.AS_CPU][self.currentPage]/G.numAS, "asCPU", self.pageName)
activate(asCPU, asCPU.execute())

if G.ServiceDemand[G.AS_DISK][self.currentPage] > 0.0:
asDISK = DoWork(self.frame, G.Resources[G.AS][G.DISK], G.ServiceDemand[G.AS_DISK][self.currentPage]/G.numAS, "asDISK", self.pageName)
activate(asDISK, asDISK.execute())

if G.ServiceDemand[G.DS_CPU][self.currentPage] > 0.0:
dsCPU = DoWork(self.frame, G.Resources[G.DS][G.CPU], G.ServiceDemand[G.DS_CPU][self.currentPage]/G.numDS, "dsCPU", self.pageName)
activate(dsCPU, dsCPU.execute())

if G.ServiceDemand[G.DS_DISK][self.currentPage] > 0.0:
dsDISK = DoWork(self.frame, G.Resources[G.DS][G.DISK], G.ServiceDemand[G.DS_DISK][self.currentPage]/G.numDS, "dsDISK", self.pageName)
activate(dsDISK, dsDISK.execute())

G.HitCount[self.currentPage] += 1

yield hold, self, 0.00001 # Needed to prevent an error. Doesn't add any blocking to the six queues above


class Generator(Process):
def __init__(self, rate, maxT, maxN):
Process.__init__(self)
self.name = "Generator"
self.rate = rate
self.maxN = maxN
self.maxT = maxT
self.g = Random(11335577)
self.i = 0
self.currentPage = G.Home

def execute(self):
while (now() < self.maxT):
self.i+=1
p = CallPage(self.i,self.currentPage,G.PageNames[self.currentPage])
activate(p,p.execute())
yield hold,self,self.g.expovariate(self.rate)
randomPath = RandomPath()

if self.currentPage == G.Exit:
self.currentPage = G.Home
else:
self.currentPage = randomPath.NextPage(G.TransitionMatrix, self.currentPage)

def main():

maxWorkLoad = 10000
Lambda = 4.026*float(sys.argv[1])
maxSimTime = float(sys.argv[2])

initialize()
g = Generator(Lambda, maxSimTime, maxWorkLoad)
activate(g,g.execute())

simulate(until=maxSimTime)

print >> sys.stderr, "Simulated Seconds : ", maxSimTime

print >> sys.stderr, "Page Hits :"
for i in range(len(G.PageNames)):
print >> sys.stderr, "\t", G.PageNames[i], " = ", G.HitCount[i]

print >> sys.stderr, "Throughput : "
for i in range(len(G.PageNames)):
print >> sys.stderr, "\t", G.PageNames[i], " = ", G.HitCount[i]/maxSimTime

print >> sys.stderr, "Mean Response Times:"

for i in G.metrics.Keys():
print >> sys.stderr, "\t", i, " = ", G.metrics.Mean(i)

print G.HitCount[G.Home]/maxSimTime, ",", G.metrics.Mean("Home"), ",", G.metrics.Mean("View"), ",", G.metrics.Mean("Search"), ",", G.metrics.Mean("Login"), ",", G.metrics.Mean("Create"), ",", G.metrics.Mean("Bid")

if __name__ == '__main__': main()









Like my previous comparison of SimPy versus PDQ for a single queue, the PDQ results are slightly higher, so no major surprises there.

No comments: