Archive for October, 2009

When open-uri can’t convert Hash into String — another time it happens

Short answer: OpenURI doesn’t support the “feed://” pseudo protocol and if you try it with a hash of header options it gives you the same error as if, like some dumb muppet, you hadn’t required the library in the first place. In other words, it falls through to Kernel#open and leaves you scratching your head.

Long answer: Read on, code fiends. Read on.

Tonight I decided to earn some HusbandPoints™ by helping my wife get a large number of tagged photos off Picasa for a project that she’s working on. Downloading them by hand would’ve been a pain in the ass timewise and also would’ve been a big pain opportunity-cost wise as well, since she would’ve had to take time out from the main body of the project (a homemade cookbook for a friend’s wedding) to do a dumb photo-by-photo clickfest through the entire large Picasa album she’d assembled with her friends. Plus it gave me a reason to mess around w/ the Google APIs some — knowledge that would almost certainly come in handy later.

Now, the easiest way to go about scripting this w/ Ruby involves using open-uri to pass in the authorization token from Google into every request, per their ClientLogin authentication method. You do that with a piece of code like this:

1
2
3
4
5
6
7
8
9
10
# Assuming that @auth_token is set by a login method
def http_header
  {"Authorization" => "GoogleLogin auth=#{@auth_token}"}
end

# HTTP GET a Google content feed (Atom)
def get(url)
  response = open(url, http_header){ |f| f.read() }
  Hpricot.XML(response)
end

Here we’re getting the content from Google (which will come as an Atom feed, as all of their various pieces of content do) and then parsing the result with Hpricot. We pass the http_header Hash to OpenURI’s open method to specify a set of HTTP header variables. This is supposed to be easy, but tonight it wasn’t, and my wife was treated to the inelegant sounds of me cursing at my laptop screen for 10 or 15 minutes until I figured out what the problem was.

‘feed://’ don’t go ’round here

The problem turmed out to be the “feed://” pseudo protocol. Safari likes it (because it fancies itself a feed reader), and decided to make the RSS link provided by Google for the tag set my wife wanted to download into a “feed://” URL. Of course, there’s no such protocol, and “feed://” itself is a pretty lame. People have been bitching about its lameness for a long, long time. It’s almost as lame as me not catching it.

But the lamest thing of all (which was causing the cursing) is how OpenURI handles this:

1
2
3
4
5
6
TypeError: can't convert Hash into String

method initialize   in open-uri.rb at line 32
method open_uri_original_open   in open-uri.rb at line 32
method open in open-uri.rb at line 32
method get  in picasa.rb at line 62

This is the same thing you get when you try to use open on a URL with a hash of header arguments and you’ve forgotten to require the OpenURI library in the first place.

The problem here seems to be with this part:

1
2
3
4
5
6
7
8
9
10
11
def open(name, *rest, &block) # :doc:
  if name.respond_to?(:open)
    name.open(*rest, &block)
  elsif name.respond_to?(:to_str) &&
        %r{\A[A-Za-z][A-Za-z0-9+\-\.]*://} =~ name &&
        (uri = URI.parse(name)).respond_to?(:open)
    uri.open(*rest, &block)
  else
    open_uri_original_open(name, *rest, &block)
  end
end

It’s not calling the part you might think — the piece where it asks if the name can be converted to a string and if it conforms to a loose URI regex pattern. It’s instead calling it with the original, version of open, the one that the Kernel class provides so you can easily open files and URLs (but without all the tasty options given you by OpenURI). This error gets thrown by Kernel when you try to use open outside the context of OpenURI (as this guy points out).

Since we can tell that a URL that starts with “feed://” should pass the first of the two tests in the “elsif” clause (the regex pattern), that means that it’s not passing some part of the the URI.parse test. Here’s what that URI.parse method looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def self.parse(uri)
  scheme, userinfo, host, port,
    registry, path, opaque, query, fragment = self.split(uri)

  if scheme && @@schemes.include?(scheme.upcase)
    @@schemes[scheme.upcase].new(scheme, userinfo, host, port,
                                 registry, path, opaque, query,
                                 fragment)
  else
    Generic.new(scheme, userinfo, host, port,
                registry, path, opaque, query,
                fragment)
  end
end

No great clues there. But if you run through the code in the OpenURI#open method’s elsif clause, it turns out that if you parse the offending “feed://”-based URI, you don’t get a “URI::HTTP” object. You get a “URI::Generic” object, which doesn’t respond to open. Obviously, the library doesn’t support this kind of URL, and if it weren’t overriding a Kernel method, it’d probably say so, but it can’t make assumptions about what you’re trying to do with open, so it instead falls through to the call to the overridden Kernel#open and you get the same error you’d get if you never used “require ‘open-uri’” in the first place.

Lesson learned, boys and girls — pseudo protocols aren’t supported by much at all other than self-important feed reading software.

Thanks to the Gimite Google Spreadsheet library for inspiration on the auth code

Tags: ,

Google Fast Flip — not sure about this yet

I’ve only played with Google’s new Fast Flip newsreading feature for a few minutes, but I’ve already got one major problem with it: it’s hard to scan headlines. When you think about it, that’s something that reading a paper newspaper still lets you do — you open up a double-sided broadsheet and you’re scanning over probably 5 or 6 stories on the inside, depending on the number of advertisements. If you scan down the front page or the main page of a section, you can see the headlines for 8-10 stories. With Fast Flip, the “scanning” view is a bunch of screen caps of the articles you’re about to look at, with the headline in small print underneath. Scanning this list of screen caps isn’t that informative because the shrunken headlines are hard to read.

Contrast this with the front page of Google News or something like Techmeme or memeorandum and you’ll see what I mean.

Now I get that Fast Flip is designed for you to click into one of the streams of articles and then use the left/right arrows to page through it, but this causes me to “zoom in” conceptually and doesn’t really let me stand back and see all the headlines from a distance. So I can have the experience of “flipping” from page to page and not knowing in advance anything about what I’m going to see next (other than some basics of subject matter), or I can scan small headlines all at once. Doesn’t feel like the greatest compromise in the world.

For me, I’m still deciding if I like this or not. I’m a big fan of graphic design and I like that Fast Flip offers an opportunity for that to shine through earlier in the reading process than it can on something like Google News or Techmeme, but I’m not sure if that outweighs the benefit of being able to move fast through a large number of headlines.