
Performance Measurements of ALTQ/CBQ: 
	date: 97/08/15
	machine: Intel P6 200MHz, FX440 chipset  (x 3)
	os: FreeBSD-2.2-2 + ALTQ/CBQ-0.3.2
	test program: netperf

See <http://www.csl.sony.co.jp/person/kjc/cbq/perf.html> for the 
graphs of these results.

System Configuration:

	(src)	   (router)    (dest)
	linus  -->  gemini  --> tipo
		          ^CBQ

	      ATM(155M)    ATM(155M):ENI ATM-155
			   or 10baseT(10M):Intel EtherExpress Pro/100B
	                   or 100baseT(100M):Intel EtherExpress Pro/100B

			   *note: 100baseT is used with the full-duplex mode.

Throughput Test:

This test measures the throughput of a single TCP flow by the local
loop, 150M ATM, 10baseT, and 100baseT.  The result suggests that the
current PCs have CPU power enough to handle multiple 100Mbps
interfaces and the overhead of CBQ is negligible.

	The test measures the minimum overhead of CBQ.  Though the
	setting has a few classes and only one active flow, the
	overhead includes flow-extraction, timestamping, packet
	classification and scheduling.

lo0: Local loop (P6-200)

	% ./netperf -H 127.0.0.1 -l 20 -- -s 56K -S 56K -m 8K
	TCP STREAM TEST to localhost
	Recv   Send    Send                          
	Socket Socket  Message  Elapsed              
	Size   Size    Size     Time     Throughput  
	bytes  bytes   bytes    secs.    10^6bits/sec  

cbq off	 57344  57344   8192    20.01     326.58
cbq on	 57344  57344   8192    20.00     302.30

en0: 155M ATM (P6-200 --> P6-200 --> p6-200)
		   en0  en1    en0 en0
		               ^CBQ

	% ./netperf -H tipo -l 20 -- -s 56K -S 56K -m 8K

cbq off	 57344  57344   8192    20.01     133.20
cbq on	 57344  57344   8192    20.00     133.31

fxp0: 10baseT (P6-200 --> P6-200 --> P6-200)
	            en0  en0  fxp0  fxp0
		               ^CBQ

cbq off	 32768  32768   8192    20.02       6.39
cbq on	 32768  32768   8192    20.05       6.46


fxp0: 100baseT (P6-200 --> P6-200 --> P6-200)
	            en0  en0  fxp0  fxp0
                                ^CBQ

	% ./netperf -H tipo -l 20 -- -s 32K -S 32K -m 8K

cbq off	 32768  32768   8192    20.01      93.04
cbq on	 32768  32768   8192    20.01      92.89 

Latency Test:

This test measures the latency overhead sending request/response style
messages.

en0: 155M ATM (P6-200 --> P6-200 --> p6-200)
		   en0  en1    en0 en0
                               ^CBQ

	% ./netperf -H tipo -t UDP_RR -l 10 -- -r 1,1

	UDP REQUEST/RESPONSE TEST to tipo
	Local /Remote
	Socket Size   Request  Resp.   Elapsed  Trans.
	Send   Recv   Size     Size    Time     Rate         
	bytes  Bytes  bytes    bytes   secs.    per sec   

request:1 byte/response 1 byte:
cbq off	9216   41600  1        1       10.00    2821.89   
	9216   41600

cbq on	9216   41600  1        1       10.01    2744.03   
	9216   41600 

request:64 bytes/response 64 bytes:
cbq off	9216   41600  64       64      10.01    2301.06   
	9216   41600

cbq on	9216   41600  64       64      10.00    2243.14   
	9216   41600

request:1K bytes/response 64 bytes:
cbq off	9216   41600  1024     64      10.00    1476.31   
	9216   41600

cbq on	9216   41600  1024     64      10.00    1454.39   
	9216   41600

request:8K bytes/response 64 bytes:
cbq off	9216   41600  8192     64      10.00     394.59   
	9216   41600

cbq on	9216   41600  8192     64      10.00     392.76   
	9216   41600

fxp0: 10baseT (P6-200 --> P6-200 --> P6-200)
	            en0  en0  fxp0  fxp0
                               ^CBQ

	% ./netperf -H tipo -t UDP_RR -l 10 -- -r 1,1

	UDP REQUEST/RESPONSE TEST to tipo
	Local /Remote
	Socket Size   Request  Resp.   Elapsed  Trans.
	Send   Recv   Size     Size    Time     Rate         
	bytes  Bytes  bytes    bytes   secs.    per sec   

request:1 byte/response 1 byte:
cbq off	9216   41600  1        1       10.00    2277.37   
	9216   41600

cbq on	9216   41600  1        1       10.01    2234.13   
	9216   41600

request:64 bytes/response 64 bytes:
cbq off	9216   41600  64       64      10.00    1800.75   
	9216   41600

cbq on	9216   41600  64       64      10.00    1768.02   
	9216   41600

request:1K bytes/response 64 bytes:
cbq off	9216   41600  1024     64      10.00     681.05   
	9216   41600

cbq on	9216   41600  1024     64      10.01     676.45   
	9216   41600

request:8K bytes/response 64 bytes:
cbq off	9216   41600  8192     64      10.01     116.64   
	9216   41600

cbq on	9216   41600  8192     64      10.00     116.67   
	9216   41600


Bandwidth Allocation:

This test measures the throughput of a TCP flow when the class is
regulated to (N x 5)% of the interface bandwidth.

en0: 155M ATM (P6-200 --> P6-200 --> p6-200)
		   en0  en1    en0 en0
                               ^CBQ
[cbq.conf]
interface en0 bandwidth 134000000 cbq
class cbq en0 root_class NULL priority 0 admission none pbandwidth 100
class cbq en0 def_class root_class borrow root_class priority 2 pbandwidth 98 default
class cbq en0 tcp_class def_class priority 3 pbandwidth 5 
filter en0 tcp_class 0 0 0 0 6

	% ./netperf -H tipo -t TCP_STREAM -l 20 -- -s 56K -S 56K -m 8K
	TCP STREAM TEST to tipo
	Recv   Send    Send                          
	Socket Socket  Message  Elapsed              
	Size   Size    Size     Time     Throughput  
	bytes  bytes   bytes    secs.    10^6bits/sec  

off	 57344  57344   8192    20.01     133.58

 5%	 57344  57344   8192    20.08       6.38
10%	 57344  57344   8192    20.02      13.06
15%	 57344  57344   8192    20.03      18.90
20%	 57344  57344   8192    20.02      24.73
25%	 57344  57344   8192    20.02      30.35
30%	 57344  57344   8192    20.01      35.90
35%	 57344  57344   8192    20.02      42.58
40%	 57344  57344   8192    20.02      47.92
45%	 57344  57344   8192    20.02      59.44
50%	 57344  57344   8192    20.02      65.83
55%	 57344  57344   8192    20.02      72.64
60%	 57344  57344   8192    20.02      79.23
65%	 57344  57344   8192    20.03      86.39
70%	 57344  57344   8192    20.00      92.30
75%	 57344  57344   8192    20.01      98.26
80%	 57344  57344   8192    20.01     106.64
85%	 57344  57344   8192    20.01     113.00
90%	 57344  57344   8192    20.00     119.40
95%	 57344  57344   8192    20.01     126.50 


fxp0: 10baseT (P6-200 --> P6-200 --> P6-200)
	            en0  en0  fxp0  fxp0
                               ^CBQ
[cbq.conf]
interface fxp0 bandwidth 10000000 cbq
class cbq fxp0 root_class NULL priority 0 admission none pbandwidth 100
class cbq fxp0 def_class root_class borrow root_class priority 2 pbandwidth 98 default
class cbq fxp0 tcp_class def_class priority 3 pbandwidth 10
filter fxp0 tcp_class 0 0 0 0 6

	% ./netperf -H tipo -l 20 -- -m 8K -s 32K -S 32K

	TCP STREAM TEST to tipo
	Recv   Send    Send                          
	Socket Socket  Message  Elapsed              
	Size   Size    Size     Time     Throughput  
	bytes  bytes   bytes    secs.    10^6bits/sec  

off	 32768  32768   8192    20.01       6.49  
 5%	 32768  32768   8192    20.59       0.47
10%	 32768  32768   8192    20.26       0.93
15%	 32768  32768   8192    20.16       1.37
20%	 32768  32768   8192    20.14       1.80
25%	 32768  32768   8192    20.12       2.29
30%	 32768  32768   8192    20.10       2.76
35%	 32768  32768   8192    20.09       3.19
40%	 32768  32768   8192    20.07       3.63
45%	 32768  32768   8192    20.07       4.07
50%	 32768  32768   8192    20.06       4.51 
55%	 32768  32768   8192    20.06       4.92
60%	 32768  32768   8192    20.05       5.31
65%	 32768  32768   8192    20.06       5.59
70%	 32768  32768   8192    20.05       5.95
75%	 32768  32768   8192    20.05       6.17
80%	 32768  32768   8192    20.06       6.38 
85%	 32768  32768   8192    20.06       7.16 
90%	 32768  32768   8192    20.04       7.17
95%	 32768  32768   8192    20.01       7.03


fxp0: 100baseT (P6-200 --> P6-200 --> P6-200)
	            en0  en0  fxp0  fxp0
                                ^CBQ
[cbq.conf]
interface fxp0 bandwidth 100000000 cbq
class cbq fxp0 root_class NULL priority 0 admission none pbandwidth 100
class cbq fxp0 def_class root_class borrow root_class priority 2 pbandwidth 98 default
class cbq fxp0 tcp_class def_class priority 3 pbandwidth 10            
filter fxp0 tcp_class 0 0 0 0 6

	% ./netperf -H tipo -l 20 -- -m 8K -s 32K -S 32K

off	 32768  32768   8192    20.01      92.44
 5%	 32768  32768   8192    20.07       4.37
10%	 32768  32768   8192    20.05       7.94
15%	 32768  32768   8192    20.02       9.73
20%	 32768  32768   8192    20.04       9.79 
25%	 32768  32768   8192    20.02       9.78
30%	 32768  32768   8192    20.02      10.17
35%	 32768  32768   8192    20.02      12.15
40%	 32768  32768   8192    20.01      19.17
45%	 32768  32768   8192    20.04      24.05
50%	 32768  32768   8192    20.04      30.26
55%	 32768  32768   8192    20.02      38.40
60%	 32768  32768   8192    20.05      55.95 
65%	 32768  32768   8192    20.04      60.52
70%	 32768  32768   8192    20.03      65.61
75%	 32768  32768   8192    20.03      70.74 
80%	 32768  32768   8192    20.02      74.86 
85%	 32768  32768   8192    20.03      79.86
90%	 32768  32768   8192    20.02      85.47
95%	 32768  32768   8192    20.02      89.00

The measured performance is not good for 100baseT.
This is caused by the kernel timer granularity and small MTU of 100baseT.

CBQ uses kernel timer to control the bandwidth.  If there is no other
event which triggers CBQ, CBQ can send only (maxburst * avg_pkt_size)
within the kernel timer interval.

If we assume the following condition,
	maxburst: 16
	avg_pkt_size: MTU
	timer interval: 10ms  (20ms for CBQ using every other timer tick)
the upper limit can be calculated as
	10Mbps for MTU=1500
	52.4Mbps for MTU=8192

But relying on the kernel timer is a worst-case scenario.  My
experience shows that CBQ can control the rate of a flow pretty well
even in a high-speed network because of other triggers (sending or
receiving packets).
(see cbq-howto.txt for more details.)

The following data is obtained by the modified kernel using 1000Hz
timer granularity instead of 100Hz. (this kernel should be able to
handle up to 100Mbps relaying solely on timers.)

off	 32768  32768   8192    20.01      92.81
 5%	 32768  32768   8192    20.07       4.61
10%	 32768  32768   8192    20.03       9.16
15%	 32768  32768   8192    20.02      13.71
20%	 32768  32768   8192    20.02      17.94
25%	 32768  32768   8192    20.01      22.84
30%	 32768  32768   8192    20.02      27.78
35%	 32768  32768   8192    20.01      32.23
40%	 32768  32768   8192    20.01      37.08
45%	 32768  32768   8192    20.01      42.07
50%	 32768  32768   8192    20.01      46.78
55%	 32768  32768   8192    20.02      51.56
60%	 32768  32768   8192    20.01      56.14
65%	 32768  32768   8192    20.01      60.45
70%	 32768  32768   8192    20.01      65.48
75%	 32768  32768   8192    20.01      70.85
80%	 32768  32768   8192    20.01      75.02
85%	 32768  32768   8192    20.01      79.58
90%	 32768  32768   8192    20.01      85.08
95%	 32768  32768   8192    20.01      89.29


