Technology

A (Possibly) Safer Interface to the Ctypes FFI

Author Photo

Matt Walker

· 7 min read
Thumbnail

Hello again,

While discussing issues on Ctypes, I came up with a pretty radical new interface that could be implemented. I think it would provide safety guarantees that Ctypes cannot offer at the moment. I will explain this interface here; this blog post is likely to be of interest to anyone who has had to use Ctypes, or would like to in the future.

The Problem to Solve

Consider the following (unsafe) call to C using Ctypes (credits @yallop):

let strchr = Foreign.foreign "strchr" (ptr char @-> char @-> returning (ptr char)) in
let p = CArray.of_string "abc" in
let q =  strchr (CArray.start p) 'a' in
(* let () = Gc.compact () in *)
Printf.printf "%c\n" !@q

For those who don’t know, strchr returns a pointer in the null-terminated char ptr argument, pointing to the first instance of char found. In particular, if the CArray p were to be freed before dereferencing q, this would lead to undefined behaviour. In fact, we can trigger this UB ourselves by simply uncommenting the Gc.compact () line. p is not referenced in the body of what follows, so will (or at the very least, can) be collected from the point of view of the GC.

The issue is that the GC doesn’t know about the dependency between p and q, namely that p must remain alive for q to remain valid, which we shall write as q => p. (Mnemonically, the order can be remembered by reference to implication: q => p -> if q then p -> if q is alive then p must be alive too.)

The Interface of living.ml

(* living.ml *)
type dep = Dep : 'a -> dep

(** This is exposed, but do not access unsafe_value unless you know what you're doing. *)
type 'a t = { unsafe_value: 'a; dependencies : dep list}

(** Monadic bind *)
let bind : ('a -> 'b t) -> 'a t -> 'b t =
  fun f x ->
    let y = f x.unsafe_value in
    { y with dependencies = x.dependencies @ y.dependencies}

(** Monadic return *)
let return : 'a -> 'a t =
  fun x -> { unsafe_value = x; dependencies = [Dep x]}

(** Functor map *)
let map : ('a -> 'b) -> 'a t -> 'b t =
  fun f x -> { x with unsafe_value = f x.unsafe_value }

(** Encode dependence between two raw values *)
let (=>) x y = { unsafe_value = x; dependencies = [Dep x; Dep y]}

module Let_syntax = struct
  let (let*) x f = x |> bind f

  let (let+) x f = x |> map f
end

The first type, Living.dep is a GADT that homogenizes all types to dep, so that they can be placed in a list. The existential type they end up with renders them essentially useless except for one key fact: the garbage collector must still consider them alive.

Living.t is thought of as a value in unsafe_value and all values that it depends on in dependencies. Using this, we can define a monadic-bind interface that composes operations with dependant values in a simple and natural way. However, you don’t need to understand monads to understand how to use this interface; I detail it below.

Here is what the code would look like in a theoretical-Living enhanced version of Ctypes:

let open Living.Let_syntax in
let strchr_unsafe = Foreign.foreign "strchr" (ptr char @-> char @-> returning (ptr char)) in
let strchr : char ptr -> char -> char ptr Living.t =
  fun s c ->
    let c_ptr = strchr_unsafe s c in
    Living.(c_ptr => s)
let p = CArray.of_string "abc" in
let* q =  strchr (CArray.start p) 'a' in
(* let () = Gc.compact () in *)
let () = Printf.printf "%c\n" !@q in
Living.return ()

We first wrap the strchr function with normal Ctypes, marking it as unsafe. They we define our safe strchr which encodes the dependency of the return value on the argument. This is the important part, and so you should make sure you understand this. We are saying c_ptr being alive implies that s must remain alive, so use c_ptr => s. This wraps the return value in a Living.t, which bundles the two together.

Finally, we come to using it: CArray.of_string does not need to be wrapped, since it copies into a C-compatible array the contexts of the OCaml string "abc". However, we now must use let* syntax (or Living.bind directly, if you prefer) to unwrap q in the body below. We dereference q, using the plain Ctypes (!@) operator, and now must explicitly return something wrapped in a Living.t, so we just return unit Living.t. Note however, that this returned value depends on p and q, and so p and q cannot be collected, even if we uncomment the Gc.compact (), and even though p does not explicitly appear after the call to Gc.compact ().

And that’s it! Our code is now safe!

Discussion

There are a couple of “kinds” of functions you’d want to be able to write with Living. The first is lifting 'a -> 'b functions to 'a Living.t -> 'b Living.t functions. This is acheived with Living.map, which ensures that the 'a in particular won’t be collected. This is because Living.(=>) and Living.return both include all their arguments in the dependencies, so there is no (safe) way to construct a Living.t that doesn’t depend on itself.

The second is normal dependency creating functions 'a -> 'b Living.t. This is useful for, for example, making safe the traversal of a C linked list. (Note that this example is not null-pointer safe, but is garbage collection safe, which is the only safety Living is trying to provide.) You can define:

(** Define a simple linked list *)
module List = struct
  type t
  let s : t structure = structure "linked_list"
  let data_f = field s "data" int
  let next_f = field s "next" (ptr s)
  let () = seal s
end

(** Assumes for simplicity that the [List.t structure ptr] argument is not null! *)
let data_ptr : List.t structure ptr -> int ptr Living.t =
  (* Note that [s |-> f] returns a ptr to a structure ptr's field) *)
  fun xs -> Living.((xs |-> data_f) => xs)

(** Assumes for simplicity that the [List.t structure ptr] argument is not null! *)
let next : List.t structure ptr -> List.t structure ptr Living.t =
  fun xs -> Living.((xs |-> next_f) => xs)

(** Assumes for simplicity the [List.t structure ptr] argument is not null! *)
let rec get_last : List.t structure ptr -> int ptr Living.t =
  fun xs ->
    let open Living.Let_syntax in
    let* next_ptr = next xs  in
    if is_null next_ptr
    then data_ptr xs
    else get_last next_ptr

You can see something somewhat remarkable here: only data_ptr and next need to be carefully implemented. The code for get_last just works by bind composition. Indeed, it is often the case that only the functions that call Ctypes operators (like (|->) or (!@)) need to be careful. This localizes the danger of making a mistake with the dependencies. That is, these kinds of dependency creating functions come in two flavours: ones you need to think about and ones you don’t! That’s a pretty big win in my book, since before you had to think carefully about all uses of FFI allocated values (and whether they were still alive). Indeed, if a library provided a wrapped version of (|->) that encoded the dependency for you, you wouldn’t have to think at all!

The final kind of function is dependency-removing functions. They are of the form 'a Living.t -> 'b. These are quite tricky, and normally occur by copying the data out of the foreign heap that you need. In particular, 'b does not depend on the liveness of 'a or any of it’s dependencies to be valid. An example could be:

let unsafe_deref : 'a ptr Living.t -> 'a =
  fun x -> (let* x' = x in !@ x').unsafe_value

Notice the idiom we do here, where we call unsafe_value on the entire expression doing the computation of what we want within the Living.t context. !@ x.unsafe_value in particular might not be safe because !@ can (at least theoretically) call the garbage collector before dereferencing the actual pointer. For this reason, you should write all of these functions 'a Living.t -> 'b in the style of first mapping to 'a to 'b, and only then calling unsafe_value on the result.

Conclusion

If there is interest, I can provide an opam package of Living.t and perhaps even a wrapped version of Ctypes that is “safer” in the above sense. Thanks for reading, and I hope this helps in some way, even if you don’t end up using this particular implementation. Please comment on the thread on the OCaml discourse with your thoughts, positive or negative.

Best,

Matt

#open-source#ocaml#ffi
Author Photo

About Matt Walker

Matt Walker is a software engineer with a love for all things Functional, DevOps, and Typed, currently residing in Toronto, Canada.