
	Brief Introduction to How CBQ works+

Here's a brief explanation of the CBQ mechanism.

Please read Sally's papers and notes for more details.
<http://www-nrg.ee.lbl.gov/floyd/cbq.html>

CBQ consists of
	classifier	  -- classifies packets to the pre-defined classes
	estimator	  -- estimates bandwidth usage of each class
	packet scheduler  -- selects the next class to send a packet

Classifier:
	Classifier works in the way similar to packet-filtering.
	It extracts flow information (dst_addr, dst_port, src_addr,
	src_port, proto for an IP flow) from a packet, then tries to
	find a corresponding class.  If no class is found, the default
	class is used.

Estimator:
	CBQ measures the average packet interval for each class,
	and estimates whether the class is underlimit or overlimit.

Packet Scheduler:
	priority-based as long as higher priority class is underlimit.
	weighted-round robin within the same priority.
	a class may borrow bandwidth from its ancestors if specified so.


IMPLEMENTATION:

Rate-Control:

CBQ does rate-control, which means packets could remain in the queue
even when the interface is idle. This is called a non-work conserving
queue.

To send packets remaining in the queue, the queueing system needs a
trigger to start sending.  A trigger could be:
	queueing another packet
	transmit or receive interrupt from the interface device 
	timer

If there are other flows, queueing or interrupt occurs.
A TCP flow receives acks which can be a trigger.  Especially in a
steady state, TCP gets an ack every other packet.
However, among these triggers, only the timer is guaranteed to happen.
Unfortunately, in most systems, the timer granularity is too coarse
for packet scheduling.  So, the strategy is to design the packet
scheduling to work with the coarse timer, but use other triggers, if
available, to do better.

CBQ uses the generic kernel timer.  In most Unix systems, the
granularity of the timer is 10 msec.  CBQ uses minimum 20 msec when
setting a kernel timer, since the next tick might be arbitrary short.

CBQ measures the average interpacket time, and controls the rate of a flow.

target packet_interval is calculated as follows:

	f: fraction of the bandwidth given to the class
	b: bandwidth of the interface
	av_pkt_size: average packet size for the class

	packet_interval = av_pkt_size / (b * f)

maxburst:
	After being idle for a while, a class can send no more than
	maxburst packets. 

minburst:
offtime:
	When the flow of a class becomes steady, it repeats the cycle
	of sending the minburst packets and idle for the offtime.
	(offtime is calculated from minburst)

But to be idle for the offtime period, CBQ uses a kernel timer 	and
offtime is rounded up to the next time tick. 

This leads to the following rule:

if packet_time * maxburst < 20 msec
then the packet scheduler can send the bandwidth worth of packets by
the timer.

This also suggests:
there is a cycle of N * 10 msec, within this cycle, n packets (< maxburst) 
can be sent at the beginning of the cycle and the class is idle for
the rest of the cycle.  This becomes a steady state.

For example,  consider a flow with average packet size 1K bytes and
its rate is 10 % of 10Mbps Ethernet.

	packet_interval = 1K * 8 / (10M * 0.1)
		        = 8 ms

To achieve the target rate of 1Mbps, the system should send 2.5
packets every 20 msec timer interval.
If the scheduler can send 5 packets in 40 msec cycle, it becomes a
steady cycle.  To make a 40 msec cycle, maxburst should be bigger than 5.

The approach using the kernel timer does not scale well for high-speed 
network.  For example, if you want to achieve 50% of 100Mbps Ethernet
with 1KB packet size, you have to send 125 packets within 20 msec. 
(which means you have to set maxburst to 125.)

But relying on the kernel timer is a worst-case scenario.  My
experience shows that CBQ can control the rate of a flow pretty well
even in a high-speed network because of other triggers.

With some combination of link speed and pbandwidth (especially,
pbandwidth is more than 70%), the allocation doesn't work as good as
other settings.
If bandwidth allocation is far off, try to set "minburst" to 4 or 6.
This often works.

However, when you measure the maximum throughput, you have to be
careful and take these facts into account, especially when there's
only one UDP flow.  I recommend to use TCP when you measure the
performance.

Packet Size:

As you can see from the explanation above, packet size affects CBQ.
Though the CBQ parameters are designed not to be sensitive to packet
size (see Sally's note for more details), setting appropriate average
packet size is important.

If the average packet size is set too large, a class will not be able
to achieve its target rate.  On the other hand, if the average packet
size is set too small, it will use more than its target rate.

Note that average packet size means the average packet size when the
class is at its limit.  Even if a program sends many small packets,
programs tends to send large packets at their peak rate.

Don't forget the fact that a large packet can be fragmented to MTU of
the interface.  Also, TCP tries to send the packet size of path MTU.

Thus, it is safe to set the MTU size to the average packet size, and
it is default.

Link-Sharing:

Classes can share the bandwidth by borrowing from their ancestors.
However, the mechanism of borrowing is somewhat parameter sensitive.

"efficient" parameter in the interface command makes borrowable
classes work-conserving, that is, even when there is no underlimit
class CBQ schedules a class.  Note that "efficient" doesn't affect
classes with no borrow class.

Maxdelay:

Note that even if you specify a small maxdelay, it gives no priority
over other classes.
When maxdelay is specified, it is just reflected to the max queue size
of the class.  Which means if you specify a small maxdelay, the max
queue size of the class becomes small.

Priority:
A class with the highest priority is scheduled first as long as it is
underlimit.  Weighted-round robin is used within the same priority.

Weighted Round Robin:
Each class gets its allotment each time its turn comes around.
If a class sends a small packets, then the class gets the
next turn as well, as long as its allotment remains.
If a class sends a large packet, then the class gets deficit and will
be skipped in the next round as long as it has deficit.

CBQ Accounting: (experimental)
CBQ can be used just for accounting with the original FIFO queueing.

