Auto-Discovery Daemon
---------------------

1. Introduction

The auto-discovery daemon is a user level program to allow multiple hosts on a 
single network (for now) to become aware of each other.  When a host running 
auto-discovery becomes aware of another host's existence, it can inform the 
openMosix kernel.  The openMosix kernel can then add it to its internal map of 
existing machines.

Currently the daemon is complied to run in alpha mode which means that does not
interact or affect the openMosix kernel.  Normally it would perform I/O on 
/proc/hpc/admin/mospe and /proc/hpc/admin/config.  Instead it writes to
two /tmp files: /tmp/om_mospe and /tmp/om_config in much the same way as it 
would to the /proc files.

These /tmp files are binary.  A small utility called showmap is included.  By
running showmap, the values that would have been written to /proc can be seen
in a human-readable format.  For example, the "mosix.map" that would have been 
passed to the kernel can be viewed.  This is done so the daemon can be tested
in a few environments before it is used in a potentially harmful way.  (When
not compiled in ALPHA mode, this utility is similar to running "setpe -r".)


2. Requirements

Your kernel must be configured to support IP multicast (CONFIG_IP_MULTICAST
kernel option).  This is probably configured by default.


3. Building and Running

% make clean
% make
% ./omdiscd -n

The "-n" or "--nodaemon" options cause omdiscd to run in the foreground,
sending messages and debugging output to standard error.  Without this option, 
all output will go to syslog, potentially causing confusion with testing.

By default, the daemon will allow the kernel to choose an interface to use
for multicast communication.  A specific interface can be used by adding the
"-i <interface>" option.

On my cluster, "ariette", I run (at least) three copies for testing.  The 
gateway "ariette" specifies the "-i" option because it has multiple interfaces,
only one residing in the cluster:

ariette# ./omdiscd -n -i eth1,eth0
node1# ./omdiscd -n
node2# ./omdiscd -n
[...]

node1 and node2 are on the same network as eth1 on ariette.  Specifying two
interfaces (and no multicast TTL) means that auto-discovery will route messages
between the network connected to eth1 on ariette and the network connected
to eth0 on ariette.  In addition, node1 and node2 will configure its map with
one of ariette's interfaces as an alias entry.

To run showmap:

% ./showmap

There are some tests that are used to validate some internals of the daemon.
The tests can be built and run by the following commands.  It is important to 
note that building for testing produces different binaries, so a "make clean" 
is necessary.

To build and run tests:

% make clean
% make -f Makefile.test
% ./test

To run the daemon in live mode, remove (or macro out) "#define ALPHA" from
openmosix.c and showmap.c.  Then make clean, and make.


4. Limitations

4.1 Node-id Generation

Node identifiers are generated by taking the last two octets of the IP address 
of a given machine.  The obvious problem with this is potential node-id 
collisions, which I think will not arise in most clusters.

4.2. Routing

When auto-discovery is doing routing of messages between networks, there is no
routing loop detection---in fact, the route that the auto-discovery messages
take may be different than the traffic between the openMosix nodes.  For 
anything but a very simple network, use of real multicast routing (e.g. 
mrouted) is recommended.


5 Command line options:

  --interface or -i <interface>[,<interface>[,<interface>]...]]:
    The interface option can be used to specify between one and six interfaces
    which will be used in the openMosix cluster.  It is specified as a comma 
    separated list.  Each interface listed will receive and send multicast 
    notifications, unless the "-m" or "--multicast-ttl" option is specified.
    In that case, only the first interface listed will send and receive 
    messages, and others will be configured as openMosix aliases.  When this 
    option is not specified, auto-discovery will allow the kernel to select a 
    default interface.  This option is not necessary when a host only has one 
    configured interface.

  --nodaemon or -n:
    The nodaemon option causes auto-discovery to run in the foreground, and 
    not as a daemon.  All output will go to standard error, as opposed to 
    syslog when running as a daemon.

  --multicast-ttl or -m <value>:
    When this option is specified, the value passed is used as the 
    time-to-live (TTL) for multicast.  This option assumes that multicast 
    routing is configured on gateways connecting clusters.  When it is 
    specified, auto-discovery configured with multiple interfaces will only 
    send and recieve notifications on one interface.

  --help or -h:
    displays basic usage.


6 Debug Messages:

There is a function call from main to log_set_debug().  This is used to 
enable debug messages for various features.  See log.h for a list of 
parameters.  Full debug messaging can be enabled on a running daemon by
giving it a SIGUSR1 signal.  SIGUSR1 signals toggle the daemon between 
the compile-time default and full messaging.

7 TODO List:

Below is a group of things which still need to be done to the daemon.

To do:
  + net.c/openmosix.c: add basic gateway discovery.
  + general: add ability for nodes to leave the cluster.
  + general: write man page.
  + event.c: add event handling code if/when finally needed.
  + general: add pid file
  + sys.c: add omdiscd.pid file to /var/...
  + general: clean up command line options (add --route, etc).

Performance/Optimizations:
  + net.c: if more than one message is waiting to be received, meaning
    multiple nodes will join, place messages in a queue and process them
    in batch.

Things to decide:
  + should multiple join messages be sent?  perhaps one per hour?
  + should nodes be removed from the kernel if they send a leave message?
  + IPv6 support here and in openMosix
