The BeOS Networking Environment

There have been a fair number of questions recently in BeDevTalk and BeUserTalk
regarding the networking rewrite.  I thought I'd use my shift in the newsletter
sweatshop to describe the new architecture and give some details on its status.

This article is an attempt to roughly describe the stack internals and present
developers who might wish to create new protocols or NIC drivers with some
tasty tidbits of info.  For sane, normal developers who have no desire to work
on networking internals, rest assured that the sockets API will be just as it is
on your favorite BSD clone and skip down to the $ales Pitch.  


Overview

BeOS networking is being completely replaced by a new architecture, called
the BeOS Networking Environment, or BONE.  None of the existing R4.x networking will
survive the change;  it is either being ported over to the new architecture (in the
case of drivers), or being discarded completely (in the case of the net server, net
kit, netdev kit/net server addon architecture, PPP, Netscript, etc.)  The new
architecture focuses on performance, scalability, maintainability, and extensibility,
in no specific order.  It is simpler than the current net_server, and yet far more
flexible as well.

The BONE architecture is a modular design that allows for easy removal or replacement
of any of the individual parts, by users or by Be.  In this regard, BONE is an API
specification for a networking architecture, and a description of how those modules
interoperate.  The implementation Be will ship can have parts replaced by users at
will if they so desire, provided that they adhere to the specification.  


Obligatory ASCII Diagram


                            _______________
                           |               |
                           | libsocket.so  |
                           |_______________|
user land                          |
- - - - - - - - - - - - - - - - - -+- - - - - - - - - - - - - - - - - - -
kernel land                        |
                           ________|________
                          |                 |
    ______________        |  net api driver |
   |              |       |_________________|
   |   bone_util  |                |
   |______________|        ________|________
                          |                 |
transport layer           | protocol module |     (e.g. udp)
                          |_________________|
                                   |
                           ________|________
                          |                 |
network layer             | protocol module |     (e.g. ipv4)
                          |_________________|
                                   |
                           ________|________
                          |                 |
data link layer           | datalink module |     (contains routing, ARP, etc)
                          |_________________|
                             /           \ 
               _____________/___         _\_______________
              |     loopback    |       |     802.3       |
              | framing module  |       |  framing module |   (and more (HDLC, etc)...)
              |_________________|       |_________________|
                       |                         |
physical layer         |                         |
               ________|________         ________|________
              |                 |       |                 |
              | loopback driver |       | ethernet driver |   (and more (PPP, etc)...)
              |_________________|       |_________________|


As can be seen by the above diagram, BONE consists of a library in user
space which is linked with user programs, and a driver and several modules in
kernel space which implement the networking services.  There are several
networking protocol and framing modules, as seen in the diagram, which
are structures that extend the module_info structure to provide the standard API
used by each module type.  Put another way, each module in the above diagram is
a concrete instance of an abstract C class representing each networking
module type.

Let's look at each driver and module type in the architecture.


*  libsocket/Net API driver/kernel sockets module

All networking functionality visible to user programs is provided by libsocket,
which is a very thin library that opens a driver (which provides the socket
"file descriptor") and communicates with it via ioctls to provide the networking
API.  The net API driver instantiates the internal data structures associated with
a socket (the bone_endpoint_t), sets up the protocol stack for each socket, and
handles all communication between the socket and the stack.

Other networking APIs besides BSD sockets interface could be implemented to
talk to the net API driver using the same ioctls that libsocket does.

BONE also provides a libnet emulation library which allows programs linked
against the R4.x libnet.so to continue to function, ensuring binary compatibility.

And finally, for the truly ambitious amongst you who are developing networked
file systems and such, you will be happy to hear that there is a kernel module
interface to the sockets API so you will be able to use networking from kernel land.


*  BONE utilities module

The bone_util module contains functionality that the other modules need and/or
which doesn't fit elsewhere.  bone_data (see below) manipulation, benaphores, fifos,
masked data copying, and other "generic" utilities are provided here.  All parts of
the BONE system use this important module.  It defines operations for several data
types.

A bone_data_t is a data type that is used in BONE as a container for transient
networking data.  While it fulfills the same requiremnts as mbufs do under a BSD
networking architecture, bone_data_t are quite different than mbufs and suffer
from none of mbufs' limitations or problems.

Central to the effeciency of a networking stack is reducing the amount of data
copies.  Unlike mbufs, bone_data_t are containers of lists of iovecs.  A bone_data_t
contains two such lists:  a "freelist", which contains pointers to actual memory
addresses which need to be freed, and a "datalist", which contains a virtual "view"
of networking memory that can not only be very effeciently accessed, but also easily
modified as well.

Consider the following scenario.  A user calls "sendto" with a buffer containing a
udp datagram that is 2000 bytes long.  This results in a bone_data_t with the
following layout:

bone_data_t {
	datalist: {iov_base = &buffer, iov_len = 2000}
	freelist: {&buffer, 2000}*
}

(* actually this wouldn't be here in this case since on datagram sends BONE is
zero-copy and would pass the user's buffer directly to the NIC driver rather than
allocating a new buffer that would later need freeing.  But we'll leave it here
for demonstration purposes.)

The udp layer would then add a header to the data.  This is very easily done by
simply adding an iovec to the chain:

bone_data_t {
	datalist: {&udp_header, 8}, {&buffer, 2000}
	freelist: {&udp_header, 8}, {&buffer, 2000}
}

(again, the udp_header would not *really* be added to the free list, since the udp
layer would be using a local buffer for it that would not need freeing, but we'll
use it as an example, as with the IP header below)

Now, suppose the interface it is being sent on has an MTU of 1500 bytes.  IP would
need to fragment the data and add an IP header to each frag.

On other systems (especially BSD-based systems that use mbufs), there would need
to be multiple copies done here.  BONE simply manipulates iovecs in their lists:

bone_data_t {
	datalist: {&ip_header, 20},{&udp_header, 8}, {&buffer, 1472},
						{&ip_header_2, 20}, {&buffer + 1472, 528}
	freelist: {&ip_header_2, 20},{&ip_header, 20},{&udp_header, 8}, {&buffer, 2000}
}

By manipulating the logical view of the data rather than copying, bone will see a
big scalability and performance win when using large datagrams (such as during
bulk data transfer of things like large image files).



*   bone_proto_info_t

All of the protocols are implemented as instances of bone_proto_info_t.  These
are chained together as appropriate in structures called bone_proto_node_t for
each networking endpoint instance when it is created.  A driver_settings
configuration file specifies which protocols to put in a socket's stack when
the socket is created.

(If you are afraid of looking at a config file for effeciency reasons, don't be;
The bone_util module contains optimized functions for reading the BONE settings.
On average, opening a socket under BONE takes on the order of 300 usec (microseconds)).

When networking operations occur, the net api driver calls the appropriate function
in the bone_proto_info_t module on top of its protocol stack.  The protocol then
performs all necessary protocol-specific operations and calls the next protocol
in the chain, on down to the network layer protocol, which passes the final data
on to the datalink layer.

To add a new protocol to bone, one essentially creates a bone_proto_info "subclass"
for the protocol, and adds entries for it to the BONE configuration file.  It will
be loaded at runtime by either the API driver (for new sockets) or the datalink layer
(for inbound data).


*  bone_datalink

The datalink module is the center of the BONE architecture.

The datalink module handles things like routing, ARP, interface management and
link-level framing.  The first thing the datalink module does is load the network
interface driver modules.  Each of them then scans hardware and does their hoodoo
magic, and calls back into the datalink module to register a ifnet_t structure for
each instance of the networking card that they find.  The modules reregister at any
time they need to, responding to things like new cardbus cards being inserted, new
USB interfaces being logically added, etc.

Each time an interface is brought up (via ifconfig, etc), the datalink module spawns
off a thread which blocks on the interface module's receive method.  When new data
arrives on the interface, it is read by that thread, demuxed and pushed up the
appropriate protocol stack to the receive queue of the appropriate bone_endpoint_t.

The fact that each interface has its own reader thread associated with it, in
addition to the fact that multiple user-level threads will be pushing data
simultaneously through the system, should provide BONE with greater
scalability than other systems, particularly in the area of stack latency.
Multiple-interface BeOS systems perform quite well under BONE.

Networking Interfaces are represented using the traditional BSD struct ifnet
data structure, modified for BeOS.  This structure contains much info about an
interface, including the various addresses associated with it, volatile statistics,
the bone_interface_info_t module to use for the interface, and the bone_frame_info_t
module to use for framing the data.


*	bone_frame_info_t

Since many different interfaces use the same link-level framing types, these were
isolated out into modules to facilitate reuse.  For example, any number of ethernet
card driver modules can load the single bone_802.3 module for their framing needs.

Similarly, by decoupling framing from the rest of the link layer, a single NIC
driver module can use different types of framing.  For example, a HiPPI interface
that is configured to use the HiPPI physical layer vs. its logical layer framing.
Another example would be an ethernet interface that wants to send jumbograms rather
than 1500-byte ethernet frames.


*  bone_interface_info_t

A networking-oriented interface to device drivers is added in BONE, to be used
in writing NIC drivers.  If desired, a traditional device driver can also export a
bone_interface_info_t module interface, which makes porting existing drivers easy.


Sample Code

In the way of sample code, I have included the current snapshot of bone_proto.h
and bone_interface.h, the two headers most useful to the majority of you who will
be writing BONE modules.  I have also included a snapshot of the bone_util.h
BONE utilities header file, since the other files use it so much.  Finally,
I have included the source code to the BONE loopback interface module to
illustrate how to write an network interface module.

Note that these files should be considered alpha-level software.  They are likely
to change in the future.  The loopback module is (purposely) nonoptimized and
provided as an illustration; real loopback operations are heavily optimized in BONE
and bypass this module entirely.

While these files aren't everything you need to start developing for BONE, they
should give you an idea of the directions you should be heading.


Massively Cool Features (the $ales Pitch)

In addition to the traditional BeOS GUI-based tools, all of your favorite unix
networking utilities are either already ported or will port readily.  Examples
include:

	BIND 8.2 tools:
		addr, dnsquery, irpd, named-bootconf, nslookup, dig, host, mkservdb,
		named-xfer, nsupdate, dnskeygen, named, ndc

	Configuration Tools:
		route, ifconfig, etc

	Utilities:	
		telnet, ping, ftp, traceroute, tcpdump, libpcap, etc       

	and many more.

Almost every feature that BeOS net developers have been asking for is there;  sockets
are file descriptors, the sockets API is much more compliant, raw sockets are there,
it is relatively easy to add new protocols, there is a kernel networking interface,
and so forth.

Net performance has improved massively; there are no hard numbers (and we aren't
done optimizing) but our benchmarks are putting BONE around twenty times (2000%)
the speed of the current net_server; BONE is in the same league as Linux and FreeBSD,
though not fully competitive with their speed yet.  Yet.  :-)


Schedule

OK, I realize that the biggest question all of you are asking is "when?".  In
traditional Be style, I can only say "soon".  The new stack is almost ready for
beta.  That's all I can say for now.  
