Writing Video Drivers for BeOS 
By Andrew Kimpton
 
Like many engineers a lot of my motivation comes from having
something new on my desk, a gadget, widget (or toy as my
wife prefers). I recently bought a new Sony Laptop which
like most on the market uses a Neomagic graphics controller.
BeOS runs just fine on this notebook and mine even came
with two 4GB partitions on the 8GB drive - it looks like it
was made for multiple OS's ! 
 
BeOS supports the Neomagic family of graphics controllers
from the old Neomagic 128 (aka 2070) through to the newer
Neomagic 256AV (aka 2200) and the variants in between.
Unfortunately that support is acheived through the 'old'
style of app-server add-ons rather than the newer (and
blessed) app-server accelerants. Whilst it works it's not
'current' and it doesn't do some things I'd like such as
centering the display when the resolution is less than the
size of the LCD panel, or supporting DPMS to turn the
display off when not in use. So given hardware and
motivation here's an example in writing a app-server
accelerant for the Neomagic chips. 

Oh yes... one last 'wrinkle' - there's no public
documentation available on these chips, so our information
is derived from the sources contained within the XFree86
X-Windows server for Un*x, and from previous experience with
programming VGA controllers (the sort of stuff I'm afraid
you can't easily find in a single book, you have to get
'apprenticed' to a master and learn it that way 8-) 
 
This is a fairly large project and so it'll be split up into
a few newsletter articles. Initially we'll look at the
significant parts of the kernel level hardware driver, then
the parts of the app-server accelerant neccessary for basic
framebuffer access. Well also cover a couple of bells &
whistles, subsequent articles will deal with adding support
for a hardware cursor, and handling hardware acceleration
of 2D operations. 
 
The Kernel Driver 
 
For video cards the kernel driver is a fairly simple beast
(in most cases) providing a mechanism to establish that a
card is installed and to map sections of the cards memory
into the address space for others to access. The driver may
also need to provide a couple of convenience routines for
accessing VGA registers and possibly an interrupt handler
if you need to do something special such as catching the
vertical blanking moment. 

Taking a look through the source for the driver (driver.c)
init_hardware() simply walks the list of PCI devices to
establish wether something significant to this driver is
installed. init_driver() is a little more interesting - it
allocates some per-driver storage and creates a lock that
we can use to serialize access to the driver. Once again it
looks for installed hardware by calling probe_devices()
finally it optionally adds some extra commands to the
kernel debugger. This is a useful trick that lets us print
out significant information from within the kernel debugger,
it's perfectly feasible to use this technique from any
driver. If the hardware is not installed init_driver() will
never be called and the commands never added to the kernel
debugger. probe_devices() again walks the list of installed
hardware building a /dev/graphics entry for each piece of
hardware that is significant to this driver. The name of the
entry follows a set of rules defined by Trey Boudreau which
allows ls /dev/graphics to be decoded by humans to
determine installed hardware and it's location on the PCI
bus. 

publish_devices() simply uses the array created during
probe_devices() to tell the OS what should be presented in
/dev/graphics. Only two of the remaining functions deserve
any special attention the open and control functions. The
first time this device is opened (generally by the
app-server) nm_opened() will call first_open().
first_open() will allocate an area that will be shared
between the driver and the accelerant and contain pointers
to certain parts of the card and other useful pieces of
configuration information - the cardinfo structure globally
referenced as 'ci' in both the driver and the accelerant.
first_open() will also call map_physical_memory() to map the
devices framebuffer RAM and memory mapped registers into
accessible memory so that the app-server and it's
accelerant can write to them. It's also inside first_open()
that you would set up and install an interrupt handler if
you needed one. 

nm_control() is where most of the work during operation is
performed. This function provides one standard ioctl()
selector and three private ones. The standard selector
required for all graphics device drivers is
B_GET_ACCELERANT_SIGNATURE. During the app-server startup
it scans and opens each entry in /dev/graphics and then
calls ioctl() with B_GET_ACCELERANT_SIGNATURE for the
opened device. Graphics devices should then return a string
with the name of the accelerant for this device. The app-
server will then load this accelerant and continue the
initialization process through the accelerant. The three
private ioctl() selector in our driver are actually quite
standard and are probably going to be needed by any driver.
Since the card_info structure is stored in an area if the
ID of that area is known it can be cloned and then shared by
multiple applications. It's this mechanism that allows the
accelerant to share data with the driver. Lastly we need to
be able to read and write VGA registers. These registers
reside in the bottom of system memory (at locations such as
0x3d4) and can only be written to or read from by kernel
software so we provide two ioctl() selectors that allow for
arbitrary access for byte reads & writes to this area. 
 
The remainder of the kernel driver is largely standard or
even unimplemented. Reading & Writing to the graphics
device doesn't make as much sense as you might at first
think. Other functions simple reverse the work of their
partners (nm_open() & nm_close()). For more information on
general driver 'things' Todd Thomas recent article on USB
drivers is a good source of information too. 
 
And now on to the app-server accelerant. 
 
An accelerant has one primary entry point
get_accelerant_hook() which returns the addresses of other
functions as requested by the app-server. The functions are
in 5 groups : 
 
* Accelerant initialization and 'cloning' - used mostly by
  the Game Kit for BWindowScreen 
* Mode configuration - determining supported screen
  resolutions and depths, setting a given mode, handling the
  palette for 8bit (256 color) modes, and handling Display
  Power Management System (DPMS). 
* Cursor management - setting the cursor shape and mask, and
  moving the location of the cursor onscreen. 
* Synchronization - reporting which of the app-servers BLIT
  requests have completed. 
* 2D Acceleration - carrying out BLIT requests to use
  hardware features (if available) to perform fast fills or
  copying of areas of the screen. 
 
At the early stage of accelerant development only the Init
function and mode configuration are mandatory to be able to
actually see the desktop on screen. If the clone functions
are not implemented BWindowScreen will not work, and without
an implementation of hardware cursor functions BDirectWindow
will not work fully. Also there is a significant performance
penalty for not implementing the 2D acceleration features
since the system CPU will have to do work that could be
offloaded onto the graphics chip. 
But hey! You'll see 'something' ! 
 
So lets plough on with the work. Our init() function uses
the GETGLOBALS ioctl selector to retreive the area_id of
the Card Info structure the hardware driver setup. We then
call clone_area() with that area_id so that we now have a
shared area of memeory that the driver and accelerant can
communicate through. We then set some basic information in
that structure such as memory size etc. We also build a list
of available display modes that we can use later during the
mode setting process. 
 
Mode configuration can often seem the most complicated part
of the process, and can also be the most frustrating since
problems at this stage will nearly always result in either
no display at all or an unreadable display. The app-server
calls 4 functions to handle mode configuration,
_get_accelerant_mode_count() to find how many different
modes are available, _get_mode_list() to return a complete
list of all available modes, _propose_display_mode() is
used when small adjustments have been made to a previously
chosen mode. These adjustments are the sort of thing that
would result from using the slider in the Screen
Preferences panel to adjust the refresh rate, and the call
can actually be ignored - as is the case in this driver.
Finally _set_display_mode() does all the real work.
Ultimately _set_display_mode() calls SetupCRTC() in this
sample driver to get all the work done. Let's walk through
this function and look at what it does. 
 
Much of current graphics chip programming still carries lots
of legacy from VGA (and even earlier) display standards.
All of our clock values (particularly the value for
horizontal timing) need to be converted from 'pixel' values
to 'character' values, this means dividing by 8 since
characters are 8 pixels wide. For simplicity we'll also
extract the values for vertical timing into local variables
too to make the code a little easier to read. In order to
have the electron beam of a CRT paint an image the beam (a
pretty analog device) needs a certain amount of setup time
before and after the actual drawing area. This could be
considered to be when the beam is scanning in space outside
of the edge of the picture tube. There is also a certain
amount of time required for the beam to retrace from the
right edge of the screen to the left edge and from the
bottom to the top. So hsyncstart & hsyncend corresponed to
'edges' of the area. hdisp is the actual time the beam is
'painting' visible data, htotal is the total time required
to draw one line and retrace back to the beginning of the
next. Driving all of this is a clock that needs to be
correctly programmed to give the appropriate pulse rate to
drive the whole analog system. Fortunately for us in this
example we're driving an LCD screen directly (as opposed to
driving an external LCD panel through a standard VGA
connector) and so programming the clock is unnecessary
(we'll cover it in another article dealing with
simaltaeneous display on the LCD and an external monitor) 
 
The code should be self explanatory but there are some
points worth noting: The neomagic chip uses the basic
standard VGA registers and add's some extensions. VGA
registers are programmed by writing an index value to a
particular location and then writing the one byte data
value to the location after the index. After the data has
been written the index is assumed to have incremented by one
so writing to the data location a second time will writing
to the next register in the table. There are four main
groups of registers. The attribute and sequence registers
generally contain the standard values seen in this sample.
The CRT controller and Extension registers need to be
programmed on a per mode basis in many cases, and it is
these registers that are often extended for additional
features (as can be seen in the neomagic chip). 
 
Finally there are the DAC and palette registers. One section
of code in the driver appears to read and discard a value
from the DAC register four times before writing 0 to the
register. This apparent waste of reads is neccessary to
'uncover' a particular DAC register before writing a value
to it. The palette of the Neomagic chips is 6 bits each for
Red, Green & Blue (except in 24bpp mode when it's 8 bits).
The size of the per 'gun' values of a palette varies from
chip to chip, however if you're image seems to dark, washed
out, or has a particular colour cast to it you're probably
not shifting the palette values appropriately before
writing to them. 
 
Having talked extensively about the indexed style of VGA
registers it is worth noting that some vendors are switched
to more 'regular' memory-mapped registers for their PCI/AGP
video controllers (thankfully). 3Dfx is one good example of
this, and is a company who has published their register
specifications. 
 
Our example driver is at this point functional, it displays
a picture, allows you to chose different display modes, and
even supports DPMS. However it's not fast, and doesn't have
a hardware cursor. Those are both topics for a future
article in the next couple of weeks. 
 
 
 
 

