Sunday, December 1, 2013

Upgrade to Storm 0.9.0-wip16 with JRuby 1.7.4

At Korrelate, we've been running Storm with JRuby 1.6 and Storm 0.8.3-wip3 for many months and we recently decided to upgrade to JRuby 1.7 and Storm 0.9.0. We made a few brief attempts in the past and ran into problems that exceeded the amount of time we had to do the upgrade, but this time we had more downtime to get through the problems. Here is a rundown of what we did to get everything working.

We are using version 0.6.6 of Redstorm to handle deployment and testing.

The first hurdle to overcome was getting it to run locally using redstorm local.

In your project's ivy directory, create a file called storm_dependencies.xml. Notice the snakeyaml version is set to 1.11. This is a fix for https://github.com/nathanmarz/storm/pull/567

<?xml version="1.0"?>
<ivy-module version="2.0">
<info organisation="redstorm" module="storm-deps"/>
<dependencies>
<dependency org="storm" name="storm" rev="0.9.0-wip16" conf="default" transitive="true" />
<!-- https://github.com/nathanmarz/storm/pull/567 -->
<override org="org.yaml" module="snakeyaml" rev="1.11"/>
</dependencies>
</ivy-module>
Make sure you are using JRuby 1.7.4, we ran into additional problems with 1.7.8 and backed out for now. Now create a file called topology_dependencies.xml in your ivy directory

<?xml version="1.0"?>
<ivy-module version="2.0" xmlns:m="http://ant.apache.org/ivy/maven">
<info organisation="redstorm" module="topology-deps"/>
<dependencies>
<dependency org="org.jruby" name="jruby-core" rev="1.7.4" conf="default" transitive="true"/>
<!-- explicitely specify jffi to also fetch the native jar. make sure to update jffi version matching jruby-core version -->
<!-- this is the only way I found using Ivy to fetch the native jar -->
<dependency org="com.github.jnr" name="jffi" rev="1.2.5" conf="default" transitive="true">
<artifact name="jffi" type="jar" />
<artifact name="jffi" type="jar" m:classifier="native"/>
</dependency>
</dependencies>
</ivy-module>
This will deploy your topology with JRuby 1.7.4 and it's dependencies.

You should now be able to run your topology in local mode, the next step is to deploy it to your cluster. First, we'll look at the changes required in your topology code. There are several bugs in 1.7.4 related to loading files from a jar. Make sure you only require RUBY-6970.rb in your topology. Thanks to Jordan Sissel for doing most of the work for this fix.

# @see JRUBY-6970.rb
# encoding: utf-8
# a 'require "openssl" has occurred.
class OpenSSL::SSL::SSLContext
alias_method :ca_path_JRUBY_6970=, :ca_path=
alias_method :ca_file_JRUBY_6970=, :ca_file=
def ca_file=(arg)
if arg =~ /^jar:file:\//
return ca_file_JRUBY_6970=(arg.gsub(/^jar:/, ""))
end
return ca_file_JRUBY_6970=(arg)
end
def ca_path=(arg)
if arg =~ /^jar:file:\//
return ca_path_JRUBY_6970=(arg.gsub(/^jar:/, ""))
end
return ca_path_JRUBY_6970=(arg)
end
end
# encoding: utf-8
# Monkeypatch for JRUBY-6970
#
# Should solve this error:
# Caused by: org.jruby.exceptions.RaiseException: (SSLError) jar:file:/mnt/storm/supervisor/stormdist/topology-name-1-1385830575/stormjar.jar!/gems/gems/aws-sdk-1.8.0/ca-bundle.crt
# at org.jruby.ext.openssl.SSLContext.setup(org/jruby/ext/openssl/SSLContext.java:230)
# at org.jruby.ext.openssl.SSLSocket.initialize(org/jruby/ext/openssl/SSLSocket.java:145)
module Kernel
alias_method :require_JRUBY_6970_hack, :require
def require(path)
if path =~ /^jar:file:.+!.+/
path = path.gsub(/^jar:/, "")
end
# JRUBY-7065
path = File.expand_path(path) if path.include?("/../")
rc = require_JRUBY_6970_hack(path)
# Only monkeypatch openssl after it's been loaded.
if path == "openssl"
require_relative "JRUBY-6970-openssl"
end
return rc
end
end
# Work around for a bug in File.expand_path that doesn't account for resources
# in jar paths.
#
# Should solve (Errno::ENOENT) errors with file: paths
class File
class << self
alias_method :expand_path_JRUBY_6970, :expand_path
def expand_path(path, dir=nil)
if path =~ /(jar:)?file:\/.*\.jar!/
jar, resource = path.split("!", 2)
if resource.nil? || resource == ""
# Nothing after the "!", nothing special to handle.
return expand_path_JRUBY_6970(path, dir)
else
resource = expand_path_JRUBY_6970(resource, dir)
return fix_jar_path(jar, resource)
end
elsif dir =~ /(jar:)?file:\/.*\.jar!/
jar, dir = dir.split("!", 2)
if dir.empty?
# sometimes the original dir is just 'file:/foo.jar!'
return File.join("#{jar}!", path)
end
dir = expand_path_JRUBY_6970(path, dir)
return fix_jar_path(jar, dir)
else
return expand_path_JRUBY_6970(path, dir)
end
end
end
protected
def self.fix_jar_path(jar, resource)
if RbConfig::CONFIG["host_os"] == "mswin32"
# 'expand_path' on "/" will return "C:/" on windows.
# So like.. we don't want that because technically this
# is the root of the jar, not of a disk.
return "#{jar}!#{resource.gsub(/^[A-Za-z]:/, "")}"
else
return "#{jar}!#{resource}"
end
end
end
view raw JRUBY-6970.rb hosted with ❤ by GitHub
Since we had to upgrade snakeyaml to run it locally, we also have to update it in the storm cluster. The following change is required for project.clj in the root of your storm distribution

diff --git a/project.clj b/project.clj
index 7f59387..8799476 100644
--- a/project.clj
+++ b/project.clj
@@ -18,7 +18,7 @@
[org.clojure/tools.logging "0.2.3"]
[org.clojure/math.numeric-tower "0.0.1"]
[storm/carbonite "1.5.0"]
- [org.yaml/snakeyaml "1.9"]
+ [org.yaml/snakeyaml "1.11"]
[org.apache.httpcomponents/httpclient "4.1.1"]
[storm/tools.cli "0.2.2"]
[com.googlecode.disruptor/disruptor "2.10.1"]
The last thing you may have to do is make a small change to the storm code for a bug that occurs if you're deploying on Ubuntu (possibly others??). If you're using storm-deploy you can fork the storm github repo to make the changes and point storm-deploy to your repo in the storm.clj file

diff --git a/src/clj/backtype/storm/testing4j.clj b/src/clj/backtype/storm/testing4j.clj
index 0e517f6..dd52573 100644
--- a/src/clj/backtype/storm/testing4j.clj
+++ b/src/clj/backtype/storm/testing4j.clj
@@ -1,5 +1,6 @@
(ns backtype.storm.testing4j
(:import [java.util Map List Collection ArrayList])
+ (:require [backtype.storm LocalCluster])
(:import [backtype.storm Config ILocalCluster LocalCluster])
(:import [backtype.storm.generated StormTopology])
(:import [backtype.storm.daemon nimbus])
diff --git a/src/clj/storm/trident/testing.clj b/src/clj/storm/trident/testing.clj
index 0b7de3e..c42b9be 100644
--- a/src/clj/storm/trident/testing.clj
+++ b/src/clj/storm/trident/testing.clj
@@ -1,5 +1,6 @@
(ns storm.trident.testing
(:import [storm.trident.testing FeederBatchSpout FeederCommitterBatchSpout MemoryMapState MemoryMapState$Factory TuplifyArgs])
+ (:require [backtype.storm LocalDRPC])
(:import [backtype.storm LocalDRPC])
(:import [backtype.storm.tuple Fields])
(:import [backtype.storm.generated KillOptions])
view raw storm.diff hosted with ❤ by GitHub
A few other related notes:

For java.lang.OutOfMemoryError: PermGen space errors you can put this in your .ruby-env file if you're using a compatible ruby version manager like RVM:

JRUBY_OPTS=--1.9 -J-XX:+CMSClassUnloadingEnabled -J-XX:MaxPermSize=256m -J-Xmx1024m -J-Xms1024m
view raw .ruvy-env hosted with ❤ by GitHub
Use t.options = '--use-color' if you are missing color output in your tests

I didn't track this as I was doing it so let me know if I missed anything. Good luck!

No comments:

Post a Comment