RuDy: Ruby native extensions in D programming language

03 Mar 2009 – Warsaw

There are two facts about Ruby.

Well, at least two I want to focus on.

First: the “official” Matz Ruby Interpreter (MRI) is here to stay and going to remain the mainstream Ruby interpreter. Despite its technical flaws, despite JRuby gaining popularity in some very specific applications (Mike Lee used it pretty creatively with GreaseSpoon in his “Test By Proxy” method) and with Rubinius development practically stalled because of recent EngineYard layoffs.

Second: Ruby 1.9 / 2.0, with all the performance improvements it brings, is (will be) still “slow” in CPU-intensive operations when compared to compiled languages and when performance really matters.

Arbitrary selection of uncheckable hypotheses (almost axioms) leads to a simple conclusion: native extensions are here to stay, whenever and wherever Ruby’s performance is simply not enough.

That would be digestable, except for extensions have to be written in plain old raw C. But I don’t want to write in C.

Don’t get me wrong. I love C for its simplicity (without feature bloat) and being close-to-the-metal, while having some pretty big possibilites. And for all the programs written in it (when you try to say “Linux” with a smile and wide “i” like in “Leenux” it always ends up in a menacing grin). It’s just that I hate C for its simplicity (feature-poor) and being close-to-the-metal, combined with possiblities of shooting yourself in the foot.

It’s not as bad as C++. But it’s still a language I’d gladly leave behind.

I’d love to write native extensions in a compiled (to machine code) and fast, yet modern and pro-programmer language. C-family syntax would be a plus because of familiarity and ability to copypaste library-agnostic code (mostly number crunching, pretty common in numerical simulations) with minimal modifications.

And there is one like that, which I fell in love with after reading some and writing more (whole semester of “Computer Methods of Simulation” programming lab). It’s called D Programming Language and it really deserves more attention that it gets.

There are many aspects of D I love (and some I don’t like), most of them outlined in my talk about D on january WRUG meeting. Basically speaking (for needs of this blog post) D, with all its awesomeness and pragmatism, can interface with C code/applications/libraries both ways. So yes, you can – at least theoretically – write Ruby native extensions using Ruby’s C API, but in D.

Although there are a few gotchas.

Many D and C datatypes are different. Differences can be “trivial”, i.e. ones that shoot in the foot only after moving to a different platform, like constant-size numeric types (int in D is always 32 bit, no matter what insane hardware you’re targetting). They can be because of “pragmatic” syntactic sugar (arrays in D are actually structs with integer size and pointer to data. They can be non-trivial, like struct padding (or structs in general), and these ones are fortunately (for the most part) solved by the compiler that passes stuff differently to functions declared as extern (C) int some_func(...).

These differences pair up with second problem: for interfacing with C functions, D needs to have their declarations. They are usually within .h C header files that D compilers don’t read – they want these declarations in .d source files. For small APIs or small subset of API functions that you want to use it’s cool to copypaste these functions into .d file and modify their accepted datatypes slightly (if necessary – rarely). For bigger APIs this daunting task can be performed using htod (Windows-only, bleargh!) or bcd.gen.

And remember about differences in datatypes. Where C function accepts “a string”, which in C world means zero-terminated array/pointer of chars, in D you have to pass (d_string ~ \0).ptr. That’s fortunately the only really crappy part about C vs D incompatibilities – rest of them can be circumvented or boiled down to lowest common denominator.

All these difficulties are still nothing compared to awesomeness (productivity, features, design) of D programming language. Believe me not, try it yourself. Hell, it’s one position below Ruby in Tiobe Index (and the language is damn young!).

Let’s get back to Ruby. So I want to write my native extensions in D. Ruby has a pretty big C API which doesn’t interface with D code out-of-the-box after converting .h to .d with bcd.gen. And getting the Ruby API to work in D is just the beginning, as I would also like nicer, D-like (object-oriented, using D’s metaprogramming and reflection capabilities) API for Ruby native extensions. Like Ruby object wrapping (into convenient D classes) or automagic class and method definition, i.e. selected class with its method would be accessible from Ruby after writing single line of code, without rb_define_class followed by tens of rb_define_method. And I want my extconf to recognize D extension and prepare proper makefiles, so I can distribute it in a gem to other people who would only see “building native extensions” on gem install and not even getting a clue about the thing being written in D.

It is all possible: D bindings to Ruby API, nice D API for Ruby extensions, D-aware extconf etc. etc. Hell, pythonists already have that, so even the hard D part for the API has been already figured out by someone smart.

I’d love to have something like PyD, but for Ruby of course.

So I started it. And called it RuDy, name inspired by a talk between Ray C. Horn and Kirk McDonald (author of PyD – the man knows damn lot about D and interfacing with interpreted language). And want to make it something really usable, to let fellow rubyists code even their native extensions in a modern language, so they won’t have to dwell into cold and unforgiving depths of raw C world, threatened by memory leaks and segfaults.

As for now it consists only of the “D bindings to C API” – and even that of incomplete, but a pretty usable already, set. On the plus side, by “set” I mean functions with full test coverage in dexter application (part of RuDy project – cover the whole API bindings with unit tests), not all the shit bcd.gen threw onto me. So there’s a pretty big chance that many other functions (I say most should) in this binding work without explicit statements, but it’s not good enough for me if they’re not covered with unittests and passing them.

So, all the fellow Ruby enthusiasts willing to get their hands dirty on some open source code: RuDy needs you! I’ve developed it into a state of having some (the most important in my opinion) working bindigs to Ruby API, and will continue doing that (so you can basically write your extensions in D already), but there’s a lot of work ahead to make it a great project

There are developers needed in all three of the fields, all of them pretty much orthogonal (i.e. you can work on your field without looking at others’ development):

  • D bindings to Ruby C API — this is basically what I’ve already started and completed some part of; requires knowledge of some C, willingness to learn D (including differences and similarities of both languages) and basic Ruby (unit tests and basically checking “if it works”)
  • D-aware mkmf (or other build system) — requires some Ruby skill (to read and copy/modify mkmf code) and at least basic makefile-fu (to understand extconf code being modified)
  • D object-oriented Ruby API — requires willingness to learn (a lot of D, some Python for reading how it’s done in PyD)

Just think about it: an open-source project requiring not very much sophisticated programming skills or knowledge (just the ability and willingness to learn – but hey, we’re programmers!), just some work and looking awesome on your resume (Ruby, D, connecting two languages, open source project). In?

If you want to contact me, do it via one of the contact methods listed on about page.