RSS feed blog search engine
 

Vidar Hokstad V2.0  
Released:  3/7/2009 6:31:07 PM  
RSS Link:  http://www.hokstad.com/index.rdf  
Last View 2/11/2012 9:51:53 AM  
Last Refresh 2/11/2012 9:51:53 AM  
Page Views 273  
Comments:  Read user comments (0)  
Report violation Report a violation or adult content
Save It  



Description:



Vidar Hokstad's blog posts


Contents:

Writing a (Ruby) compiler in Ruby bottom up - step 25

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

First of all, here's the biggest reason I've been so excrutiatingly slow with getting this part together (at least that's my story, I guess I've had other things on my plate too...):

Tristan is 13 months now, and a real menace to my laptop (pulling off keys and drooling on the screen) and anything else within reach, and a real attention seeker...

So, whenever I'm slow at posting a new part, I'll blame him. I have no part in the delays at all, of course. None. I'm completely faultless...

Towards define_method via closures

For starters, here's the commit that covers most of this

To support define_method we need to support blocks with arguments. Really, full closures.

If you haven't already, you should read my post on closures

We want this:


    define_method(:foo) do |a,b|
       # Do something with a,b
    end

to be mostly equivalent to:


    def foo a,b
       # Do something with a,b
    end
    

For our attr_* implementation we actually want more:


    def attr_reader sym
      define_method sym do
        %s(ivar self sym) # FIXME: Create the "ivar" s-exp directive.
      end
    end

(or, as it turns out, not really that, as I realize the above can be simplified - if we have a lookup function to lookup the symbol to instance var slot, we can just use our index primitive to get the address; we'll return to that, but the above is the current state of attr_reader)

What that calls for is actually proper closures. One step harder.

There's the other issue that the above will be really inefficient. As in, requiring a hashtable lookup on every call inefficient, when we can statically determine the offset for any instance variable we know about at compile time. We'll get back to that at some point too, but if you've been following this series you know I care about getting things working first, efficient second.

For now attr_reader and friends still serve as a useful test case for basic closure support.

Baby steps

The first step towards closures is actually easy (we've already done it). We have our :lambda primitive that creates an anonymous function (which isn't really anonymous, it's just that the name is quitely created in the compiler and not accessible to the programmer).

Adding an environment

But an anonymous function is only of relatively limited use if you can't bind variables to it - it's not much more than a function pointer in that case.

So how do we let "sym" hang around, and how do we take advantage of that for define_method?

First, let us consider how you can call a lambda in Ruby:

  • yield
  • Proc#call

We could implement Proc#call via yield by passing the block as an argument to a method, if we wanted to, but that would probably be more inefficient than implementing both in terms of a primitive.

But this means those are the only things that has to explicitly know how to deal with the environment.

That gives us an interesting option. We could turn:


    c,d = somevalues
    do |a,b|
       ... uses c,d 
    end
    

into


    class SomePrivateUniqueClassName
        def initialize c,d
           @c,@d = c,d
        end
        
        def call(a,b)
            ... code here with access to c,d rewritten
        end
    end
    

And we could have it inherit Proc. yield in that case is just rewritten to block.call

This does however have the troubling implication of messing with self, and as this irb session shows, we'd have to fix that:

>> lambda { p self }.call
main
=> nil
>> 

The other alternative is to create a lightweight environment binding of sorts. Oh wait, we already have that. It's called an object. The problem is not the class above as such, but creating a full class for each block. We could do something like this instead:


    class Proc
      def initialize & block
         # So Proc#initialize takes a block, but we're using this
         # to create blocks... Uh oh, I sense endless recursion.
         # We need to make it work so we can initialize a Proc
         # both from a block and from a raw function pointer
      end
      
      def call *args
         %s(call @block_func (res args))
      end
    end

Our code needs to be rewritten, so that blocks get wrapped in code to create the appropriate Proc object. Additionally, if the method uses variables from the surrounding scope, we need to alter the surrounding method and the block to refer to instance variables in this object, and if any of those variables are arguments to the surrounding scope, we need to copy them.

An example

Here's a simple example using a closure. Note the use of s-expressions here because we're now starting to get bitten by the changes we've done to turn Ruby code by default into method calls (the way it should be) without having put in the pre-requisite work to implement the number classes and Kernel methods (such as print to replace the libc printf). Ignore that for now.


    def mkcounter step
      cnt = 0
      lambda do
        %s(assign cnt (add cnt step))
        %s(printf "cnt: %d
" cnt)
      end
    end
    
    cnt = mkcounter(5)
    cnt.call
    puts "Calling again..."
    cnt.call
    puts "Done"

And here's the expected output:

cnt: 5
Calling again...
cnt: 10
Done

And here's the syntax tree after we've done the required transformations (more about that in a second):


    [:do,
     [:defm,
      :mkcounter,
      [:step],
      [:let,
       [:__env__, :__tmp_proc],
       [:sexp, [:assign, :__env__, [:call, :malloc, [8]]]],
       [:assign, [:index, :__env__, 1], :step],
       [:assign, [:index, :__env__, 0], 0],
       [:do,
        [:assign,
         :__tmp_proc,
         [:defun,
          ".L2",
          [:self, :__closure__, :__env__],
          [:let,
           [],
           [:sexp,
            [:assign,
             [:index, :__env__, 0],
             [:add, [:index, :__env__, 0], [:index, :__env__, 1]]]],
           [:sexp, [:printf, "cnt: %dn", [:index, :__env__, 0]]]]]],
        [:sexp, [:call, :__new_proc, [:__tmp_proc, :__env__]]]]]],
     [:assign, :cnt, [:call, :mkcounter, 5]],
     [:callm, :cnt, :call],
     [:call, :puts, [[:sexp, [:call, :__get_string, :".L0"]]]],
     [:callm, :cnt, :call],
     [:call, :puts, [[:sexp, [:call, :__get_string, :".L1"]]]]]

That's a bit of a mouthful, so lets go through it the new bits


    [:let,
       [:__env__, :__tmp_proc],

This is added to hold a pointer to the environment we create to hold cnt and step, and to hold a pointer to the function we use to create the Proc respectively.


       [:sexp, [:assign, :__env__, [:call, :malloc, [8]]]],
       [:assign, [:index, :__env__, 1], :step],
       [:assign, [:index, :__env__, 0], 0],

Then we allocate the environment. We copy the argument, and from now on we use (index env 1) to refer to step. We then carry out the cnt = 0 statement. cnt and step are both moved into the environment because they are used inside the closure.

(You may have already picked up that we currently won't handle nested closures - lets leave some fun for later)


        [:assign,
         :__tmp_proc,
         [:defun,
          ".L2",
          [:self, :__closure__, :__env__],

Then we create a function and assign the address of the function to __tmp_proc. As you can see there are tree synthetic arguments to it:

self,__closure__ and __env__. The first is, as you'd expect, the object the method is called on. The second is to be used when passing blocks to a method, and the third is used to refer to the environment when inside.

(Something I just realized: We're begging for a name clash, if there's a closure defined inside that needs a separate environment. Obnoxious details; later)


            [:assign,
             [:index, :__env__, 0],
             [:add, [:index, :__env__, 0], [:index, :__env__, 1]]]],

And this monstrosity is what cnt = cnt + step was turned into.

Now, to make this work, we obviously need this __new_proc function, and a Proc#call that's sensible. Here's our current Proc:


    class Proc
      # FIXME: Add support for handling arguments (and blocks...)
      def call
        %s(call (index self 1) (self 0 (index self 2)))
      end
    end
    
    # We can't create a Proc from a raw function directly, so we get
    # nasty. The advantage of this rather ugly method is that we
    # don't in any way expose a constructor that takes raw functions
    # to normal Ruby
    #
    %s(defun __new_proc (addr env)
    (let (p)
     # Assuming 3 pointers for the instance size. Need a better way for this
       (assign p (malloc 12))
       (assign (index p 0) Proc)
       (assign (index p 1) addr)
       (assign (index p 2) env)
       p
    ))

Very simplistic and rather ugly, but it does the work. As you can see, the environment pointer is stored in the Proc object, and Proc#call hides the ugliness.

You may notice that this is inefficient - an extra level of indirection. Indeed it is - we can in theory make the environment part of the Proc object and save an indexed lookup, and there's a bunch of other tricks waiting. But again, that's optimizations we wont deal with now.

For now, lets move on to how to do the appropriate changes to get the example output above.

Some refactoring, and a few rewrites

Almost all the changes for this is in the Compiler class, and the code in question is now in the transform.rb file. There are some other small changes, but I don't think they need to be covered in detail.

Specifically, transform.rb represents the start of a bit of refactoring.

Some constructs in the Compiler class can be built easily on top of more primitive operations.

Home  


 
 




Privacy Policy