Posts Tagged ‘Ruby’

When open-uri can’t convert Hash into String — another time it happens

Short answer: OpenURI doesn’t support the “feed://” pseudo protocol and if you try it with a hash of header options it gives you the same error as if, like some dumb muppet, you hadn’t required the library in the first place. In other words, it falls through to Kernel#open and leaves you scratching your head.

Long answer: Read on, code fiends. Read on.

Tonight I decided to earn some HusbandPoints™ by helping my wife get a large number of tagged photos off Picasa for a project that she’s working on. Downloading them by hand would’ve been a pain in the ass timewise and also would’ve been a big pain opportunity-cost wise as well, since she would’ve had to take time out from the main body of the project (a homemade cookbook for a friend’s wedding) to do a dumb photo-by-photo clickfest through the entire large Picasa album she’d assembled with her friends. Plus it gave me a reason to mess around w/ the Google APIs some — knowledge that would almost certainly come in handy later.

Now, the easiest way to go about scripting this w/ Ruby involves using open-uri to pass in the authorization token from Google into every request, per their ClientLogin authentication method. You do that with a piece of code like this:

1
2
3
4
5
6
7
8
9
10
# Assuming that @auth_token is set by a login method
def http_header
  {"Authorization" => "GoogleLogin auth=#{@auth_token}"}
end

# HTTP GET a Google content feed (Atom)
def get(url)
  response = open(url, http_header){ |f| f.read() }
  Hpricot.XML(response)
end

Here we’re getting the content from Google (which will come as an Atom feed, as all of their various pieces of content do) and then parsing the result with Hpricot. We pass the http_header Hash to OpenURI’s open method to specify a set of HTTP header variables. This is supposed to be easy, but tonight it wasn’t, and my wife was treated to the inelegant sounds of me cursing at my laptop screen for 10 or 15 minutes until I figured out what the problem was.

‘feed://’ don’t go ’round here

The problem turmed out to be the “feed://” pseudo protocol. Safari likes it (because it fancies itself a feed reader), and decided to make the RSS link provided by Google for the tag set my wife wanted to download into a “feed://” URL. Of course, there’s no such protocol, and “feed://” itself is a pretty lame. People have been bitching about its lameness for a long, long time. It’s almost as lame as me not catching it.

But the lamest thing of all (which was causing the cursing) is how OpenURI handles this:

1
2
3
4
5
6
TypeError: can't convert Hash into String

method initialize   in open-uri.rb at line 32
method open_uri_original_open   in open-uri.rb at line 32
method open in open-uri.rb at line 32
method get  in picasa.rb at line 62

This is the same thing you get when you try to use open on a URL with a hash of header arguments and you’ve forgotten to require the OpenURI library in the first place.

The problem here seems to be with this part:

1
2
3
4
5
6
7
8
9
10
11
def open(name, *rest, &block) # :doc:
  if name.respond_to?(:open)
    name.open(*rest, &block)
  elsif name.respond_to?(:to_str) &&
        %r{\A[A-Za-z][A-Za-z0-9+\-\.]*://} =~ name &&
        (uri = URI.parse(name)).respond_to?(:open)
    uri.open(*rest, &block)
  else
    open_uri_original_open(name, *rest, &block)
  end
end

It’s not calling the part you might think — the piece where it asks if the name can be converted to a string and if it conforms to a loose URI regex pattern. It’s instead calling it with the original, version of open, the one that the Kernel class provides so you can easily open files and URLs (but without all the tasty options given you by OpenURI). This error gets thrown by Kernel when you try to use open outside the context of OpenURI (as this guy points out).

Since we can tell that a URL that starts with “feed://” should pass the first of the two tests in the “elsif” clause (the regex pattern), that means that it’s not passing some part of the the URI.parse test. Here’s what that URI.parse method looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def self.parse(uri)
  scheme, userinfo, host, port,
    registry, path, opaque, query, fragment = self.split(uri)

  if scheme && @@schemes.include?(scheme.upcase)
    @@schemes[scheme.upcase].new(scheme, userinfo, host, port,
                                 registry, path, opaque, query,
                                 fragment)
  else
    Generic.new(scheme, userinfo, host, port,
                registry, path, opaque, query,
                fragment)
  end
end

No great clues there. But if you run through the code in the OpenURI#open method’s elsif clause, it turns out that if you parse the offending “feed://”-based URI, you don’t get a “URI::HTTP” object. You get a “URI::Generic” object, which doesn’t respond to open. Obviously, the library doesn’t support this kind of URL, and if it weren’t overriding a Kernel method, it’d probably say so, but it can’t make assumptions about what you’re trying to do with open, so it instead falls through to the call to the overridden Kernel#open and you get the same error you’d get if you never used “require ‘open-uri’” in the first place.

Lesson learned, boys and girls — pseudo protocols aren’t supported by much at all other than self-important feed reading software.

Thanks to the Gimite Google Spreadsheet library for inspiration on the auth code

Tags: ,

Make ruby-debug work better

If you’ve written Ruby, chances are you’ve had to use ruby-debug. You might’ve thought the experience sucked — especially the fact that the debugger defaults to a mode in which you have to use a keyword to get it to evaluate a statement. Lost? Here’s what I mean:

Say you start the debugger here:

1
2
result = resource[xml_obj.api_call_string].get
(rdb:1)

Then you want to take a look at the “xml_obj” variable. If this were (for instance) Python’s pdb, we’d just type “xml_obj” and hit return and be done with it. Not so in ruby-debug:

1
2
(rdb:1) xml_obj.api_call_string
*** Unknown command: "xml_obj.api_call_string".  Try "help".

This is because with default settings, the debugger needs a keyword (’p') to get it to actually evaluate your statement as Ruby and not a command to the debugger itself:

1
2
(rdb:1) p xml_obj.api_call_string
"documentService/documentsByCommunity"

That gets really tedious, really fast. The debugger’s help function (’help p’) will helpfully tell you that this is because the “autoeval” option is not enabled. If you’re thick like me, you won’t see this and you’ll just continue doing “p <whatever>” until you get so frustrated you drop what you’re doing one day and go hunt down a fix.

Here is that fix from inside your code:

1
2
require 'ruby-debug'
Debugger.settings[:autoeval] = true

You can also do this inside the debugger:

1
2
(rdb:1) set autoeval
autoeval is on.

Rails already does it via Rack middleware

You might be wondering why the debugging experience is different in Rails than in Ruby you’ve written elsewhere. I did too — remembering that this ‘p’ business isn’t necessary when I run the debugger as an option when I start up Mongrel in a Rails app. So I went digging for the code that Rails uses to set this stuff up. Those settings come from a piece of Rack middleware that lives in lib/rails/rack/debugger.rb. Here’s the class definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
module Rails
  module Rack
    class Debugger
      def initialize(app)
        @app = app

        require_library_or_gem 'ruby-debug'
        ::Debugger.start
        ::Debugger.settings[:autoeval] = true if ::Debugger.respond_to?(:settings)
        puts "=> Debugger enabled"
      rescue Exception
        puts "You need to install ruby-debug to run the server in debugging mode. With gems, use 'gem install ruby-debug'"
        exit
      end

      def call(env)
        @app.call(env)
      end
    end
  end
end

For more info on how Rails uses Rack, this is a pretty handy page from the Rails guides.

Ruby XML Part 4 – Correction and Elucidation

Hola amigos. I know it’s been a long time since I rapped at you on my Ruby XML stuff, and some of these posts have been pretty dunce-tastic, so I know all my readers were looking forward with bated breath to the next installment, wherein I would no doubt prove that I had grown incrementally smarter. Wait no longer.

OK so first things first. This thing I’ve messed around with recently isn’t a service — it’s a client. We must make sure that we get the nomenclature correct so that we’ll know what we’re all talking about. The thing on the other end is the service. I was referring to it as a service in kind of a generic way, but it wasn’t too helpful because this is most definitely properly called a client.

So what kinds of changes have I made recently? Well first off, I decided that this snippet from my last post was just painful to the point of utter crappiness:

1
2
w = WebEx::Request.new
w.send_request(XMLObject.create_attendee(@attendee))

Not only is it ugly, it’s long-winded and hard to remember. As soon as I had another person working with this code at work, I felt the sting of shame and knew I had to re-factor. I decided that there was no reason not to take the opportunity to use one of Ruby’s vaunted metaprogramming features to try and make this a little shorter. Now it works like this:

1
2
w = WebEx::Request.new
w.create_attendee(@attendee)

Maybe #method_missing?

So how did I accomplish that? Simple, I used Ruby’s method_missing to enable arbitrary method calls to be made on the WebEx::Request object.

method_missing is a hook method that gets called when there’s no definition found for a method called on an object — i.e. the method is missing. When that happens, the Ruby interpreter automatically calls an instance method called method_missing. Usually, that’s not defined, so the interpreter will raise an exception, but if method_missing is defined, the interpreter does whatever it says to. Here, I’m defining it like this:

1
2
3
def method_missing(request_method, *args)
  self.send_request(WebEx::XMLObject.send(request_method.to_sym, *args))
end

In this case, self is of course an object of the WebEx::Request class. When the interpreter tries to call #create_attendee on an instance of that class, it can’t find the method and method_missing is called. My implementation of method_missing assumes that the “method” being called on the WebEx::Request object is a real class method of the XMLObject class (which is where all my actual API calls are), so it calls it there, passing along the arguments it received in the first place.

Of course, if no such class method exists, the interpreter throws an exception like it should. But for legitimate calls, I’ve shortened my code by quite a bit.

<gasp> isn’t it bad to use method_missing?

Some might argue that this is a cavalier usage of method_missing. I don’t really think so. The entirety of my request class is only like three methods other than this. It exists only to make a request and process the result, but it does that dumbly because it actually uses XMLObjects for the real meat of both. WebEx::Request is just a dumb object acting on stuff at the direction of the WebEx::XMLObjects that it calls through XMLObject class methods. If the Request class had a lot more methods, using method_missing might have been an irresponsible design choice. But since the notion of the request and the thing being requested are necessarily intertwined, this approach made a lot of sense to me because it let me join the two classes up for usage (WebEx::Request and WebEx::XMLObject), improving readability and reducing verbosity but still letting me keep the two classes usefully separate in the code.

method_missing is one of those things that you’ll sometimes hear people talking about Ruby referring to as magic. Like other, more experienced folk, I hate the usage of the word “magic” to describe things that happen in Ruby. The concepts behind metaprogramming aren’t that difficult, and as Mr. Giles says in his blog post, metaprogramming is just programming. If not for method_missing, you wouldn’t have cool stuff like Builder and RSpec. Or maybe you would, but not as elegantly. method_missing underlies much of the powerful Doman-Specific Language stuff that is so popular and useful in Ruby, so as my dad used to say: “it’s not bad or scary — it’s just different.”

(Gist of the full module)

Tags: ,

Designing a Rudimentary XML Service with Ruby — (Part 3)

I’ve come full circle since part 1 in this mini-saga, and am now describing my simple WebEx XML web service with three classes in a WebEx module.

Ruby modules offer the programmer the ability to group classes together into like structures and to control namespaces for methods. They also function as a sort of pseudo-class, in that you can define “module methods” inside the module that feel like class methods. If you have a module Foo and a method self.bar, you can call it with Foo.bar.

That’s what I ended up doing with all the helper methods I’ve got in this new class definition — WebEx.time_from_string, WebEx.filter_by_session_name, etc.

I also abstracted the things I’d need into several classes. Now the XML client has:

  • An Attendee class to describe the most common type of API data we need to create and manipulate
  • An XMLObject class which describes a generic object containing XML and the recipe necessary to process it into Ruby data structures (implemented as a Proc object), as well as class methods for generating various types of such objects
  • A Request class which is responsible for sending an XMLObject to the WebEX API, sending the retrieved data through the object’s processor, and exposing the processed data as well as the processed result strings from the API request.

Here’s how you’d create an attendee (I’m stipulating the existence of an initialized Attendee object — @attendee)

1
2
w = WebEx::Request.new
w.send_request(XMLObject.create_attendee(@attendee))

Gist of the new WebEx module

This is necessarily kind of a spare time project. It’s something for work, but it does what it’s meant to just fine right now, so re-factoring it is something I’m doing in spare cycles. However, I’m gratified that continued refactoring has brought me back to the original idea of having a simple class with class methods that mirror the API’s methods. I had to trip through the things that brought me to make the other classes and gather everything together into a module, but it was a helpful (if stumbling) process to go through.

Tags: ,

Designing a rudimentary XML Service with Ruby (Part 2)

A few days back, I posted on my journey of understanding into the world of Ruby-based XML clients. This post is a continuation of that account.

Re-Arranging the WebEx Class

I figured it wouldn’t be long before I was back at the drawing board on my main architecture, and I was right. The earlier one I described turned out to be over-abstracted and hard to test.

It all started with my feeling that this looks really elegant:

1
events = WebEx.request(Event.list)

But in practice it turns out to be a little strange. “Attendee” and “Event” were two classes with no attributes of their own, and none of my code ever instantiated objects of these classes. These are two pretty obvious signs over an over-abstracted implementation. I’d been thinking that I’d write logic later which would (for example) instantiate objects of the Event class inside Event.list’s processor, but as I got more and more into the implementation, it just didn’t seem like I was going to need to mess with WebEx’s return values as discreetly defined objects. After all, I already had Events and Attendees represented as hashes in an array, which was working just fine for this first use case and the ones I could see on the horizon. Having separate classes for Event and Attendee would give me maximum extensibility, but at the cost of having pieces of overlong, over-organized code with no (current) purpose.

So I moved the Event class’s code into the WebEx class. Same with Attendee — now the WebEx class’s code looks like this:

(Gist of the WebEx class)

As you can see, everything is now an instance method of the WebEx object. This means that the syntax for getting a list of Events is now:

1
2
w = WebEx.new
events = w.request(w.event_list)

This still looks a little weird to me. I had been thinking that I should make WebEx#request into a class method, so as to have:

1
events = WebEx.request(w.event_list)

But that would mean having WebEx.request instantiate and return a new object of the WebEx class. There’s nothing wrong with that, but given the fact that another WebEx object already needs to be created in order to call one of its instance methods (event_list), it felt like a case where two objects of the same class which weren’t being used at all in the same way. Because of that awkwardness, I decided to live with the clunky-but-servicable all-instance-method approach. After all – there’s a good chance that I’ll refactor it yet again as I go… :-p

Testing

I’m embarrassed to not have spotted this earlier: the class as it had been written before was very hard to test for a couple major reasons:

  • There was no way to override the XML attribute of one of the WebExmlObjects being returned by class methods Event.list and Attendee.list_for_meeting
  • The HTTP request happened within the WebEx.request method, making it difficult to stub the HTTP request’s response, which had to happen in order to ensure that calling that method during testing didn’t involve net calls.

I solved each of these easily enough: I abstracted the HTTP request into its own method and I added a “payload” argument to each method that returned a WebExmlObject so that I could override its request XML.

After that, it was time to set up some fixtures. I created directories for “request” and “response” in my fixtures dir and added files containing the well-formed XML samples I got from the WebEx docs. Then I wrote methods for opening/reading each of them in my WebExSpecHelper module (this testing is all in RSpec). Below is a test that ensures that WebEx#request is calling WebEx#request_post:

1
2
3
4
it "should call request_post" do
  @w.should_receive(:request_post).and_return(lst_summary_event_response)
  @w.request(@w.event_list)
end

@w is the instance variable that is created before every spec, and lst_summary_event_response is the name of the spec helper method that returns the fixture of that XML response. There’s no particular reason I called this one as opposed to an attendee-related method – I just needed to assert that the call would happen and then stipulate the response it would give, so any of my helper methods would do.

Here’s that helper method doing what it’s meant to:

1
2
3
4
it "it should return all events if passed a nil time limit" do
  @events = @w.event_list(payload=lst_summary_event_request, time_limit=nil).processor.call(@w.doc)
  @events.length.should be(3)
end

There are three events in the fixture, so the length should be three when nothing is passed to time limit. All the fixture data from WebEx was in the past, so I altered the dates in there to have one event in the past, one in the future, and one in the future at a more distant date. Here’s what happens when you pass a time limit past that first (earlier) future date

1
2
3
4
5
6
it "should return only events happening after the time limit" do
  middle_future_date = "04/02/2012 01:06:49"
  limit = @w.time_from_string(middle_future_date)
  @events = @w.event_list(payload=lst_summary_event_request, time_limit=limit).processor.call(@w.doc)
  @events.length.should be(1)
end

Only one result gets returned, because the fixture only has one event listing which has a start date after the date given.

Next Steps

So far, my unit tests have covered very little – basically just the processor portion of a WebExmlObject. For full coverage, I’ll need to test the :x ml attribute which means validating the generated XML against the XML Schema Definitions(XSDs) WebEx provides with their API docs. Ruby doesn’t provide have any all-native tools for doing validation of XML against a given XSD, but the libxml library (which is distributed as a gem and gets its power from C-bindings it compiles at install time) will let you pass in a schema as a string and then validate against it.

Tags: ,