In Files

Marshal

The marshaling library converts collections of Ruby objects into a byte stream, allowing them to be stored outside the currently active script. This data may subsequently be read and the original objects reconstituted.

Marshaled data has major and minor version numbers stored along with the object information. In normal use, marshaling can only load data written with the same major version number and an equal or lower minor version number. If Ruby’s “verbose” flag is set (normally using -d, -v, -w, or –verbose) the major and minor numbers must match exactly. Marshal versioning is independent of Ruby’s version numbers. You can extract the version by reading the first two bytes of marshaled data.

str = Marshal.dump("thing")
RUBY_VERSION   #=> "1.9.0"
str[0].ord     #=> 4
str[1].ord     #=> 8

Some objects cannot be dumped: if the objects to be dumped include bindings, procedure or method objects, instances of class IO, or singleton objects, a TypeError will be raised.

If your class has special serialization needs (for example, if you want to serialize in some specific format), or if it contains objects that would otherwise not be serializable, you can implement your own serialization strategy.

There are two methods of doing this, your object can define either marshal_dump and marshal_load or _dump and _load. marshal_dump will take precedence over _dump if both are defined. marshal_dump may result in smaller Marshal strings.

marshal_dump and marshal_load

When dumping an object the method marshal_dump will be called. marshal_dump must return a result containing the information necessary for marshal_load to reconstitute the object. The result can be any object.

When loading an object dumped using marshal_dump the object is first allocated then marshal_load is called with the result from marshal_dump. marshal_load must recreate the object from the information in the result.

Example:

class MyObj
  def initialize name, version, data
    @name    = name
    @version = version
    @data    = data
  end

  def marshal_dump
    [@name, @version]
  end

  def marshal_load array
    @name, @version = array
  end
end

_dump and _load

Use _dump and _load when you need to allocate the object you’re restoring yourself.

When dumping an object the instance method _dump is called with an Integer which indicates the maximum depth of objects to dump (a value of -1 implies that you should disable depth checking). _dump must return a String containing the information necessary to reconstitute the object.

The class method _load should take a String and use it to return an object of the same class.

Example:

class MyObj
  def initialize name, version, data
    @name    = name
    @version = version
    @data    = data
  end

  def _dump level
    [@name, @version].join ':'
  end

  def self._load args
    new(*args.split(':'))
  end
end

Since Marhsal.dump outputs a string you can have _dump return a Marshal string which is Marshal.loaded in _load for complex objects.

Public Class Methods

dump( obj [, anIO] , limit=-1 ) → anIO click to toggle source

Serializes obj and all descendant objects. If anIO is specified, the serialized data will be written to it, otherwise the data will be returned as a String. If limit is specified, the traversal of subobjects will be limited to that depth. If limit is negative, no checking of depth will be performed.

class Klass
  def initialize(str)
    @str = str
  end
  def say_hello
    @str
  end
end

(produces no output)

o = Klass.new("hello\n")
data = Marshal.dump(o)
obj = Marshal.load(data)
obj.say_hello  #=> "hello\n"

Marshal can't dump following objects:

 
               static VALUE
marshal_dump(int argc, VALUE *argv)
{
    VALUE obj, port, a1, a2;
    int limit = -1;
    struct dump_arg *arg;
    volatile VALUE wrapper;

    port = Qnil;
    rb_scan_args(argc, argv, "12", &obj, &a1, &a2);
    if (argc == 3) {
        if (!NIL_P(a2)) limit = NUM2INT(a2);
        if (NIL_P(a1)) goto type_error;
        port = a1;
    }
    else if (argc == 2) {
        if (FIXNUM_P(a1)) limit = FIX2INT(a1);
        else if (NIL_P(a1)) goto type_error;
        else port = a1;
    }
    wrapper = TypedData_Make_Struct(rb_cData, struct dump_arg, &dump_arg_data, arg);
    arg->dest = 0;
    arg->symbols = st_init_numtable();
    arg->data    = st_init_numtable();
    arg->infection = 0;
    arg->compat_tbl = st_init_numtable();
    arg->encodings = 0;
    arg->str = rb_str_buf_new(0);
    if (!NIL_P(port)) {
        if (!rb_respond_to(port, s_write)) {
          type_error:
            rb_raise(rb_eTypeError, "instance of IO needed");
        }
        arg->dest = port;
        if (rb_respond_to(port, s_binmode)) {
            rb_funcall2(port, s_binmode, 0, 0);
            check_dump_arg(arg, s_binmode);
        }
    }
    else {
        port = arg->str;
    }

    w_byte(MARSHAL_MAJOR, arg);
    w_byte(MARSHAL_MINOR, arg);

    w_object(obj, arg, limit);
    if (arg->dest) {
        rb_io_write(arg->dest, arg->str);
        rb_str_resize(arg->str, 0);
    }
    clear_dump_arg(arg);
    RB_GC_GUARD(wrapper);

    return port;
}
            
load( source [, proc] ) → obj click to toggle source
restore( source [, proc] ) → obj

Returns the result of converting the serialized data in source into a Ruby object (possibly with associated subordinate objects). source may be either an instance of IO or an object that responds to to_str. If proc is specified, it will be passed each object as it is deserialized.

 
               static VALUE
marshal_load(int argc, VALUE *argv)
{
    VALUE port, proc;
    int major, minor, infection = 0;
    VALUE v;
    volatile VALUE wrapper;
    struct load_arg *arg;

    rb_scan_args(argc, argv, "11", &port, &proc);
    v = rb_check_string_type(port);
    if (!NIL_P(v)) {
        infection = (int)FL_TEST(port, MARSHAL_INFECTION); /* original taintedness */
        port = v;
    }
    else if (rb_respond_to(port, s_getbyte) && rb_respond_to(port, s_read)) {
        if (rb_respond_to(port, s_binmode)) {
            rb_funcall2(port, s_binmode, 0, 0);
        }
        infection = (int)(FL_TAINT | FL_TEST(port, FL_UNTRUSTED));
    }
    else {
        rb_raise(rb_eTypeError, "instance of IO needed");
    }
    wrapper = TypedData_Make_Struct(rb_cData, struct load_arg, &load_arg_data, arg);
    arg->infection = infection;
    arg->src = port;
    arg->offset = 0;
    arg->symbols = st_init_numtable();
    arg->data    = st_init_numtable();
    arg->compat_tbl = st_init_numtable();
    arg->proc = 0;

    major = r_byte(arg);
    minor = r_byte(arg);
    if (major != MARSHAL_MAJOR || minor > MARSHAL_MINOR) {
        clear_load_arg(arg);
        rb_raise(rb_eTypeError, "incompatible marshal file format (can't be read)\n\
\tformat version %d.%d required; %d.%d given",
                 MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
    }
    if (RTEST(ruby_verbose) && minor != MARSHAL_MINOR) {
        rb_warn("incompatible marshal file format (can be read)\n\
\tformat version %d.%d required; %d.%d given",
                MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
    }

    if (!NIL_P(proc)) arg->proc = proc;
    v = r_object(arg);
    clear_load_arg(arg);
    RB_GC_GUARD(wrapper);

    return v;
}
            
load( source [, proc] ) → obj click to toggle source
restore( source [, proc] ) → obj

Returns the result of converting the serialized data in source into a Ruby object (possibly with associated subordinate objects). source may be either an instance of IO or an object that responds to to_str. If proc is specified, it will be passed each object as it is deserialized.

 
               static VALUE
marshal_load(int argc, VALUE *argv)
{
    VALUE port, proc;
    int major, minor, infection = 0;
    VALUE v;
    volatile VALUE wrapper;
    struct load_arg *arg;

    rb_scan_args(argc, argv, "11", &port, &proc);
    v = rb_check_string_type(port);
    if (!NIL_P(v)) {
        infection = (int)FL_TEST(port, MARSHAL_INFECTION); /* original taintedness */
        port = v;
    }
    else if (rb_respond_to(port, s_getbyte) && rb_respond_to(port, s_read)) {
        if (rb_respond_to(port, s_binmode)) {
            rb_funcall2(port, s_binmode, 0, 0);
        }
        infection = (int)(FL_TAINT | FL_TEST(port, FL_UNTRUSTED));
    }
    else {
        rb_raise(rb_eTypeError, "instance of IO needed");
    }
    wrapper = TypedData_Make_Struct(rb_cData, struct load_arg, &load_arg_data, arg);
    arg->infection = infection;
    arg->src = port;
    arg->offset = 0;
    arg->symbols = st_init_numtable();
    arg->data    = st_init_numtable();
    arg->compat_tbl = st_init_numtable();
    arg->proc = 0;

    major = r_byte(arg);
    minor = r_byte(arg);
    if (major != MARSHAL_MAJOR || minor > MARSHAL_MINOR) {
        clear_load_arg(arg);
        rb_raise(rb_eTypeError, "incompatible marshal file format (can't be read)\n\
\tformat version %d.%d required; %d.%d given",
                 MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
    }
    if (RTEST(ruby_verbose) && minor != MARSHAL_MINOR) {
        rb_warn("incompatible marshal file format (can be read)\n\
\tformat version %d.%d required; %d.%d given",
                MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
    }

    if (!NIL_P(proc)) arg->proc = proc;
    v = r_object(arg);
    clear_load_arg(arg);
    RB_GC_GUARD(wrapper);

    return v;
}
            

Commenting is here to help enhance the documentation. For example, sample code, or clarification of the documentation.

If you have questions about Ruby or the documentation, please post to one of the Ruby mailing lists. You will get better, faster, help that way.

If you wish to post a correction of the docs, please do so, but also file bug report so that it can be corrected for the next release. Thank you.

blog comments powered by Disqus