Previous Section - PHISH WWW Site - PHISH Documentation - Next Section

4. PHISH Library

This sections documents the API to the PHISH library that PHISH minnows call. In PHISH lingo, a "minnow" is a stand-alone application which makes calls to the PHISH library.

The API for the MPI and ZMQ (socket) versions of the PHISH library are identical.

A general discussion of how and when minnows call PHISH library functions is given in the Minnows section of the manual.

The PHISH library has a C-style API, so it is easy to write minnows in any language, e.g. C, C++, Fortran, Python. A true C++-style API is also provided, which means a C++ program can use either the C or C++ API. A Python wrapper on the C-style API is also provided, which has a slightly different syntax for some functions. The doc pages for individual library functions document all 3 APIs. See the section below entitled C vs C++ vs Python interface for a quick overview.

PHISH minnows communicate with other minnows by sending and receiving datums. Before looking at individual library calls, it may be helpful to understand how data is stored internally in a datum by the PHISH library. This topic is discussed below, in the section entitled Format of a datum.



4.1 List of library functions

The PHISH library is not large; there are only a handful of calls, that can be grouped into the following categories. Follow the links to see a doc page for each library call.

  1. Library calls for initialization
  2. Library calls for shutdown
  3. Library calls for receiving datums
  4. Library calls for sending datums
  5. Library calls for queueing datums
  6. Miscellaneous library calls

4.2 Building the PHISH library

There are two different versions of the PHISH library that can be built. One that calls message-passing functions from the MPI library, and one that calls socket functions from the ZMQ library. In either case, the library should typically be built as a shared library so it can loaded at run-time by each minnow. This is required if the minnow is written in Python.

The easiest way to build all of PHISH, including the PHISH libraries, is to use the cross-platform CMake build system. We recommend building PHISH with a separate build directory:

$ tar xzvf phish.tar.gz -C ~/src
$ mkdir ~/build/phish
$ cd ~/build/phish
$ ccmake ~/src/phish-14sep12 

Then, in the CMake curses interface, configure the build, generate makefiles, and build phish:

$ make 

Alternatively, you can build either version from the src directory of the distribution by typing one of these lines:

make -f Makefile.machine mpi
make -f Makefile.machine zmq 

where "machine" is the name of one of the Makefiles in the directory. These should produce the file libphish-mpi.so or libphish-zmq.so.

If none of the provided Makefiles are a match to your machine, then you can use of them as a template for creating your own, e.g. Makefile.foo. Note that only the top section for compiler/linker settings need be edited. This is where you should specify your compiler and any switches it uses. The MPI_INC setting is only needed if you are building the MPI version of the library, and the compiler needs to know where to find the mpi.h file. Likewise the ZMQ_INC setting is only needed if you are building the ZMQ version of the library, and the compiler needs to know where to find the zmq.h file. The MPI_LIB and ZMQ_LIB settings are for the MPI and ZMQ library themselves and any other auxiliary libraries they require.

If the build is successful, a libphish-mpi.a or libphish-zmq.a file is produced.

You can also type

make -f Makefile.machine clean 

to remove *.o and lib*.so files from the directory.


4.3 C vs C++ vs Python interface

As noted above, the APIs to the PHISH library for C versus C++ versus Python are very simliar. A C++ program can use either the C or C++ API.

To use the C interface, a C or C++ program includes the file src/phish.h and makes calls to functions as follows:

#include "phish.h"
phish_error("My error"); 

The C++ interface in src/phish.hpp encloses the PHISH API in the namespace "phish", so functions can be invoked as

#include "phish.hpp"
phish::error("My error"); 

or as

#include "phish.hpp"
using namespace phish
error("My error"); 

To use the Python interface, see this section of the manual for details. A Python program can invoke a library function as

import phish
phish.error("My error") 

or

from phish import *
error("My error") 

4.4 Format of a datum

The chief function of the PHISH library is to facilitate the exchange of data between minnows. This is done through datums, which contain one or more fields. Each field is a fundamental data type such as a "32-bit integer" or a "vector of doubles" or a NULL-terminated character string.

The PHISH library defines a specific explicit type for each fundamental data type it recognizes, such as "int32" for 32-bit signed integers, or "uint64" for 64-bit unsigned integers, or "double" for a double-precision value. This is so that the format of the datum, at the byte level, is identical on different machines, and datums can thus be exchanged between minnows running on machines with different word lengths or between minnows written in different languages (e.g. C vs Fortran vs Python).

IMPORTANT NOTE: Different endian ordering of fundamental numeric data types on different machines breaks this model. We may address this at some future point within the PHISH library.

This is the byte-level format of datums that are sent and received by minnows via the PHISH library:

Integer flags are interleaved with the fundamental data types and the flags themselves are all 32-bit signed integers. This allows minnows that call the phish_pack and phish_unpack functions to use the usual C "int" data type as function arguments, instead of the int32_t types defined in the function prototypes. The compiler will only give an error if the native "int" on a machine is not a 32-bit integer. See the doc pages for phish_pack and phish_unpack for details.

The "type" values are one of these settings, as defined in src/phish.h:

PHISH_CHAR, PHISH_INT*, PHISH_UINT*, PHISH_FLOAT, and PHISH_DOUBLE are a single character, a signed integer (of length 8,16,32,64 bits), an unsigned integer (of length 8,16,32,64 bits), a float (typically 4 bytes), and a double (typically 8 bytes).

PHISH_RAW is a string of raw bytes which minnows can format in any manner, e.g. a C data structure containing a collection of various C primitive data types. PHISH_STRING is a standard C-style NULL-terminated C-string. The NULL is included in the field.

The ARRAY types are contiguous sequences of int*, uint*, float, or double values, packed one after the other.

PHISH_PICKLE is an option available when using the Python wrapper on the PHISH library to encode arbitrary Python objects in pickled form as a string of bytes.

The "size" values are only included for PHISH_RAW (# of bytes), PHISH_STRING (# of bytes including NULL), the ARRAY types (# of values), and PHISH_PICKLE (# of bytes).

The field data is packed into the datum in a contiguous manner. This means that no attention is paid to alignment of integer or floating point values.

The maximum allowed size of an entire datum (in bytes) is set to a default value of 1024 bytes or 1 Kbyte. This can be overridden via the set memory command in a PHISH input script or "--set memory" command-line option.

When a datum is sent to another minnow via the MPI version of the PHISH library, MPI flags the message with an MPI "tag". This tag encodes the receiving minnow's input port and also a "done" flag. Specifically, if the datum is not a done message, the tag is the receiver's input port (0 to Nport-1). For a done message a value of MAXPORT is added to the tag. See the discussion of MAXPORT in this section of the manual.

Similarly, the ZMQ version of the PHISH library prepends a "done" flag and port number to each datum.

See the phish_input doc page for a discussion of ports. See the shutdown section of the Minnows doc page for a discussion of "done" messages.