(c) Software Lab. Alexander Burger
This document describes how to call C functions in shared object files
(libraries) from PicoLisp, using the built-in native function - possibly with the help of
the struct and lisp functions.
native calls a C function in a
shared library. It tries to
The direct return value of native is the Lisp representation of
the C function's return value. Further values, returned by reference from the C
function, are available in Lisp variables (symbol values).
struct is a helper function,
which can be used to manipulate C data structures in memory. It may take a
scalar (a numeric representation of a C value) to convert it to a Lisp item, or
(more typically) a pointer to a memory area to build and extract data
structures. lisp allows you to install callback functions, callable
from C code, written in Lisp.
%@ is a convenience function,
simplifying the most common use case of native.
In combination, these functions can interface PicoLisp to almost any C function.
The above steps are fully dynamic; native doesn't have (and
doesn't require) a priori knowledge about the library, the function or the
involved data. No need to write any glue code, interfaces or include files. All
functions can even be called interactively from the REPL.
The arguments to native are
The simplest form is a call to a function without return value and without arguments. If we assume a library "lib.so", containing a function with the prototype
void fun(void);
then we can call it as
(native "lib.so" "fun")
The first argument to native specifies the library. It is either
the name of a library (a symbol), or the handle of a previously
found library (a number).
As a special case, a transient symbol "@" can be passed for the
library name. It then refers to the current main program (instead of an external
library), and can be used for standard functions like "malloc" or
"free".
Because this is needed so often,
(%@ "fun" ...)
can be used instead of
(native "@" "fun" ...)
native uses dlopen(3) internally to find and open
the library, and to obtain the handle. If the name contains a slash ('/'), then
it is interpreted as a (relative or absolute) pathname. Otherwise, the dynamic
linker searches for the library according to the system's environment and
directories. See the man page of dlopen(3) for further details.
If called with a symbolic argument, native automatically caches
the handle of the found library in the value of that symbol. The most natural
way is to pass the library name as a transient
symbol ("lib.so" above): The initial value of a transient symbol is
that symbol itself, so that native receives the library name upon
the first call. After successfully finding and opening the library,
native stores the handle of that library in the value of the passed
symbol ("lib.so"). As native evaluates its arguments
in the normal way, subsequent calls within the same transient scope will receive
the numeric value (the handle), and don't need to open and search the library
again.
The same rules applies to the second argument, the function. When called with
a symbol, native stores the function handle in its value, so that
subsequent calls evaluate to that handle, and native can directly
jump to the function.
native uses dlsym(3) internally to obtain the
function pointer. See the man page of dlsym(3) for further details.
In most cases a program will call more than one function from a given library. If we keep the code within the same transient scope (i.e. in the same source file), each library will be opened - and each function searched - only once.
(native "lib.so" "fun1") (native "lib.so" "fun2") (native "lib.so" "fun3")
After "fun1" was called, "lib.so" will be open, and
won't be re-opened for "fun2" and "fun3". Consider
the definition of helper functions:
(de fun1 () (native "lib.so" "fun1") ) (de fun2 () (native "lib.so" "fun2") ) (de fun3 () (native "lib.so" "fun3") )
After any one of fun1, fun2 or fun3
was called, the symbol "lib.so" will hold the library handle. And
each function "fun1", "fun2" and "fun3"
will be searched only when called the first time.
Note that the function handle points to a structure in memory, which is
automatically allocated. This implies that a memory leak may occur if the
transient symbol holding the function handle goes out of scope (e.g. by repeated
(re)loading the library after
executing its functions).
Warning: It should be avoided to put more than one library into a single transient scope if there is a chance that two different functions with the same name will be called in two different libraries. Because of the function handle caching, the second call would otherwise (wrongly) go to the first function.
The (optional) third argument to native specifies the return
value. A C function can return many types of values, like integer or floating
point numbers, string pointers, or pointers to structures which in turn consist
of those types, and even other structures or pointers to structures.
native tries to cover most of them.
As described in the result specification,
the third argument should consist of a pattern which tells native
how to extract the proper value.
In the simplest case, the result specification is NIL like in
the examples so far. This means that either the C function returns
void, or that we are not interested in the value. The return value
of native will be NIL in that case.
If the result specification is one of the symbols B,
I or N, an integer number is returned, by interpreting
the result as a char (8 bit unsigned byte), int (32
bit signed integer), or long number (64 bit signed integer),
respectively. Other (signed or unsigned numbers, and of different sizes) can be
produced from these types with logical and arithmetic operations if necessary.
If the result specification is the symbol C, the result is
interpreted as a 16 bit number, and a single-char transient symbol (string) is
returned.
A specification of S tells native to interpret the
result as a pointer to a C string (null terminated), and to return a transient
symbol (string).
If the result specification is a number, it will be used as a scale to
convert a returned double (if the number is positive) or
float (if the number is negative) to a scaled fixpoint number.
Examples for function calls, with their corresponding C prototypes:
(native "lib.so" "fun" 'I) # int fun(void); (native "lib.so" "fun" 'N) # long fun(void); (native "lib.so" "fun" 'P) # void *fun(void); (native "lib.so" "fun" 'S) # char *fun(void); (native "lib.so" "fun" 1.0) # double fun(void);
If the result specification is a list, it means that the C function returned a pointer to an array, or an arbitrary memory structure. The specification list should then consist of either the above primitive specifications (symbols or numbers), or of cons pairs of a primitive specification and a repeat count, to denote arrays of the given type.
Examples for function calls, with their corresponding pseudo C prototypes:
(native "lib.so" "fun" '(I . 8))       # int *fun(void);  // 8 integers
(native "lib.so" "fun" '(B . 16))      # unsigned char *fun(void);  // 16 bytes
(native "lib.so" "fun" '(I I))         # struct {int i; int j;} *fun(void);
(native "lib.so" "fun" '(I . 4))       # struct {int i[4];} *fun(void);
(native "lib.so" "fun" '(I (B . 4)))   # struct {
                                       #    int i;
                                       #    unsigned char c[4];
                                       # } *fun(void);
(native "lib.so" "fun"                 # struct {
   '(((B . 4) I) (S . 12) (N . 8)) )   #    struct {unsigned char c[4]; int i;}
                                       #    char *names[12];
                                       #    long num[8];
                                       # } *fun(void);
If a returned structure has an element which is a pointer to some
other structure (i.e. not an embedded structure like in the last example above),
this pointer must be first obtained with a N pattern, which can
then be passed to struct for further
extraction.
The (optional) fourth and following arguments to native specify
the arguments to the C function.
Integer arguments (up to 64 bits, signed or unsigned char,
short, int or long) can be passed as they
are: As numbers.
(native "lib.so" "fun" NIL 123) # void fun(int); (native "lib.so" "fun" NIL 1 2 3) # void fun(int, long, short);
String arguments can be specified as symbols. native allocates
memory for each string on the stack, passes the pointer to the C function, and
cleans up the stack when done.
(native "lib.so" "fun" NIL "abc") # void fun(char*); (native "lib.so" "fun" NIL 3 "def") # void fun(int, char*);
Note that the allocated string memory is released after the return
value is extracted. This allows a C function to return the argument string
pointer, perhaps after modifying the data in-place, and receive the new string
as the return value (with the S specification).
(native "lib.so" "fun" 'S "abc") # char *fun(char*);
Also note that specifying NIL as an argument passes an empty
string ("", which also reads as NIL in PicoLisp) to the C function.
Physically, this is a pointer to a NULL-byte, and is not a NULL-pointer.
Be sure to pass 0 (the number zero) if a NULL-pointer is desired.
Floating point arguments are specified as cons pairs, where the value is in
the CAR, and the CDR holds the fixpoint scale. If the scale is positive, the
number is passed as a double, otherwise as a float.
(native "lib.so" "fun" NIL # void fun(double, float); (12.3 . 1.0) (4.56 . -1.0) )
Composite arguments are specified as nested list structures.
native allocates memory for each array or structure (with
malloc(3)), passes the pointer to the C function, and releases the
memory (with free(3)) when done.
This implies that such an argument can be both an input and an output value to a C function (pass by reference).
The CAR of the argument specification can be NIL (then it is an
input-only argument). Otherwise, it should be a variable which receives the
returned structure data.
The CADR of the argument specification must be a cons pair with the total size of the structure in its CAR. The CDR is ignored for input-only arguments, and should contain a result specification for the output value to be stored in the variable.
For example, a minimal case is a function that takes an integer reference, and stores the number '123' in that location:
void fun(int *i) {
   *i = 123;
}
We call native with a variable X in the CAR of the
argument specification, a size of 4 (i.e. sizeof(int)), and
I for the result specification. The stored value is then available
in the variable X:
: (native "lib.so" "fun" NIL '(X (4 . I))) -> NIL : X -> 123
The rest (CDDR) of the argument specification may contain initialization data, if the C function expects input values in the structure. It should be a list of initialization items, optionally with a fill-byte value in the CDR of the last cell.
If there are no initialization items and just the final fill-byte, then the whole buffer is filled with that byte. For example, to pass a buffer of 20 bytes, initialized to zero:
: (native "lib.so" "fun" NIL '(NIL (20) . 0))
A buffer of 20 bytes, with the first 4 bytes initialized to 1, 2, 3, and 4, and the rest filled with zero:
: (native "lib.so" "fun" NIL '(NIL (20) 1 2 3 4 . 0))
and the same, where the buffer contents are returned as a list of bytes in
the variable X:
: (native "lib.so" "fun" NIL '(X (20 B . 20) 1 2 3 4 . 0))
For a more extensive example, let's use the following definitions:
typedef struct value {
   int x, y;
   double a, b, c;
   int z;
   char nm[4];
} value;
void fun(value *val) {
   printf("%d %d\n", val->x, val->y);
   val->x = 3;
   val->y = 4;
   strcpy(val->nm, "OK");
}
We call this function with a structure of 40 bytes, requesting the returned
data in V, with two integers (I . 2), three doubles
(100 . 3) with a scale of 2 (1.0 = 100), another integer
I and four characters (C . 4). If the structure gets
initialized with two integers 7 and 6, three doubles 0.11, 0.22 and 0.33, and
another integer 5 while the rest of the 40 bytes is cleared to zero
: (native "lib.so" "fun" NIL '(V (40 (I . 2) (100 . 3) I (C . 4)) -7 -6 (100 11 22 33) -5 . 0) )
then it will print the integers 7 and 6, and V will contain the
returned list
((3 4) (11 22 33) 5 ("O" "K" NIL NIL))
i.e. the original integer values 7 and 6 replaced with 3 and 4.
Note that the allocated structure memory is released after the return value is extracted. This allows a C function to return the argument structure pointer, perhaps after modifying the data in-place, and receive the new structure as the return value - instead of (or even in addition to) to the direct return via the argument reference.
The preceding Arguments section mentions that
native implicitly allocates and releases memory for strings, arrays
and structures.
Technically, this mimics automatic variables in C.
For a simple example, let's assume that we want to call read(2)
directly, to fetch a 4-byte integer from a given file descriptor. This could be
done with the following C function:
int read4bytes(int fd) {
   char buf[4];
   read(fd, buf, 4);
   return *(int*)buf;
}
buf is an automatic variable, allocated on the stack, which
disappears when the function returns. A corresponding native call
would be:
(%@ "read" 'I Fd '(Buf (4 . I)) 4)
The structure argument (Buf (4 . I)) says that a space of 4
bytes should be allocated and passed to read, then an integer
I returned in the variable Buf (the return value of
native itself is the integer returned by read). The
memory space is released after that.
(Note that we can call %@ here, as read resides in
the main program.)
Instead of a single integer, we might want a list of four bytes to be
returned from native:
(%@ "read" 'I Fd '(Buf (4 B . 4)) 4)
The difference is that we wrote (B . 4) (a list of 4 bytes)
instead of I (a single integer) for the result specification (see the Arrays and Structures section).
Let's see what happens if we extend this example. We'll write the four bytes to another file descriptor, after reading them from the first one:
void copy4bytes(int fd1, int fd2) {
   char buf[4];
   read(fd1, buf, 4);
   write(fd2, buf, 4);
}
Again, buf is an automatic variable. It is passed to both
read and write. A direct translation would be:
(%@ "read" 'I Fd '(Buf (4 B . 4)) 4) (%@ "write" 'I Fd2 (cons NIL (4) Buf) 4)
This works as expected. read returns a list of four bytes in
Buf. The call to cons builds the structure
(NIL (4) 1 2 3 4)
i.e. no return variable, a four-byte memory area, filled with the four bytes
(assuming that read returned 1, 2, 3 and 4). Then this structure is
passed to write.
But: This solution induces quite some overhead. The four-byte buffer is
allocated before the call to read and released after that, then
allocated and released again for write. Also, the bytes are
converted to a list to be stored in Buf, then that list is extended
for the structure argument to write, and converted again back to
the raw byte array. The data in the list itself are never used.
If the above operation is to be used more than once, it is better to allocate the buffer manually, use it for both reading and writing, and then release it. This also avoids all intermediate list conversions.
(let Buf (%@ "malloc" 'P 4) # Allocate memory (%@ "read" 'I Fd Buf 4) # (Possibly repeat this several times) (%@ "write" 'I Fd2 Buf 4) (%@ "free" NIL Buf) ) # Release memory
To allocate such a buffer locally on the stack (just like a C function would
do), buf can be used. Equivalent to the
above is:
(buf Buf 4 # Allocate local memory (%@ "read" 'I Fd Buf 4) (%@ "write" 'I Fd2 Buf 4) )
For a more typical example, we might call the Fast Fourier Transform using the library from the FFTW package. With the example code for calculating Complex One-Dimensional DFTs:
#include <fftw3.h>
...
{
   fftw_complex *in, *out;
   fftw_plan p;
   ...
   in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
   out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
   p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
   ...
   fftw_execute(p); /* repeat as needed */
   ...
   fftw_destroy_plan(p);
   fftw_free(in); fftw_free(out);
}
we can build the following equivalent:
(load "@lib/math.l")
(de FFTW_FORWARD . -1)
(de FFTW_ESTIMATE . 64)
(de fft (Lst)
   (let
      (Len (length Lst)
         In (native "libfftw3.so" "fftw_malloc" 'P (* Len 16))
         Out (native "libfftw3.so" "fftw_malloc" 'P (* Len 16))
         P (native "libfftw3.so" "fftw_plan_dft_1d" 'N
            Len In Out FFTW_FORWARD FFTW_ESTIMATE ) )
      (struct In NIL (cons 1.0 (apply append Lst)))
      (native "libfftw3.so" "fftw_execute" NIL P)
      (prog1
         (struct Out (make (do Len (link (1.0 . 2)))))
         (native "libfftw3.so" "fftw_destroy_plan" NIL P)
         (native "libfftw3.so" "fftw_free" NIL Out)
         (native "libfftw3.so" "fftw_free" NIL In) ) ) )
This assumes that the argument list Lst is passed as a list
of complex numbers, each as a list of two numbers for the real and imaginary
part, like
(fft '((1.0 0) (1.0 0) (1.0 0) (1.0 0) (0 0) (0 0) (0 0) (0 0)))
The above translation to Lisp is quite straightforward. After the two buffers
are allocated, and a plan is created, struct is called to store the argument list
in the In structure as a list of double numbers (according to the
1.0 initialization item). Then
fftw_execute is called, and struct is called again to
retrieve the result from Out and return it from fft
via the prog1. Finally, all memory is
released.
If such allocated data (strings, arrays or structures passed to
native) are constant during the lifetime of a program, it makes
sense to allocate them only once, before their first use. A typical candidate is
the format string of a printf call. Consider a function which
prints a floating point number in scientific notation:
(load "@lib/math.l") : (de prf (Flt) (%@ "printf" NIL "%e\n" (cons Flt 1.0)) ) -> prf : (prf (exp 12.3)) 2.196960e+05
Sometimes it is necessary to do the reverse: Call Lisp code from C code.
This mechanism uses the Lisp-level function lisp. No C source code access is
required.
lisp returns a function pointer, which can be passed to
C functions via native. When this function pointer is
dereferenced and called from the C code, the corresponding Lisp function
is invoked. Only five numeric arguments and a numeric return value can
be used, and other data types must be handled by the Lisp function with
struct and memory management
operations.
Callbacks are often used in user interface libraries, to handle key-, mouse-
and other events. Examples can be found in "@lib/openGl.l". The
following function mouseFunc takes a Lisp function, installs it
under the tag mouseFunc (any other tag would be all right too) as a
callback, and passes the resulting function pointer to the OpenGL
glutMouseFunc() function, to set it as a callback for the current
window:
(de mouseFunc (Fun) (native `*GlutLib "glutMouseFunc" NIL (lisp 'mouseFunc Fun)) )
(The global *GlutLib holds the library
"/usr/lib/libglut.so". The backquote (`) is important
here, so that the transient symbol with the library name (and not the global
*GlutLib) is evaluated by native, resulting in the
proper library handle at runtime).
A program using OpenGL may then use mouseFunc to install a
function
(mouseFunc
   '((Btn State X Y)
      (do-something-with Btn State X Y) ) )
so that future clicks into the window will pass the button, state and coordinates to that function.