Sunday, November 18, 2007

Why I love Ruby? Reason 1: because strings change and when manipulated should stay strings and be readable


Why strings are critical
Many modern languages come built in with string libraries or better with strings as a first class concept. Arguably, languages that have not done so have suffered significantly from that shortage; case in point C++ which, in the early 1990's, saw a variety of string libraries from vendors fighting for dominance, while the language itself became less relevant and surpassed by better alternatives.

The issue may on the surface appear to be unimportant, however, in practice having excellent string support in a language is hugely critical. Most tasks involve some type of string manipulation or usage, e.g., file input and output, strings as keys to hashes, and for user interface, just to name a few. This simply implies that a language that facilitates strings manipulations results in a real productivity boosts for programmers. In the case of Ruby, as I hope to convince you, the string support is so comprehensive and advanced that the common string tasks become a breeze, but also have a nice side effect of keeping the code rather readable.

Basics
The Ruby designers have chosen to make strings a first class type and the language includes a comprehensive library supporting strings. Unlike some other languages, strings in Ruby are mutable. This means that strings, when manipulated, do change and do not result in unique copies---Ruby has the concept and basic type of symbols that can be used for that that purpose. This decision has interesting consequences, though let's look into the basic string type usage and implementation in Ruby. The following Ruby code snippets illustrate the usage:

s = "this is a string" # creates a string
s += ', and this is another' # shows that strings are created with double or single quotes

The built-in String class contains tons of methods and also includes various modules (more on that in a future WILR Reason) that provides enumeration, comparison, and partitioning. Some basic methods:

s.capitalize # => "This is a string" as a new string
s.capitalize! # same result but modifies the receiver

s.upcase # => "THIS IS A STRING" (a new string)
s.downcase # => "this is a string" (a new string)

s.split # => ['This', 'is', 'a', 'string'] as an array of the elements
s.split('is a') # => ['This is ', ' string']

s.gsub! 'is ', '' # => "Tha string"

NOTE: s is modified to new value, Ruby idiom to use ! for methods with side effects

Advance manipulations

s.include? 'is' # => true
s.include? 'Is' # => false

s.insert 9, 'nother' # => "This is another string"
s.replace 'string' # => "string" (modifies string s)
s.center 20, '_' # => "_______string_______"
s.squeeze # => "_string_"
s.crypt 'password' # => "paHjoO.AYUKRQ" (one-way cryptography)

There are also various methods for matching with regular expressions, e.g., String#match, String#grep, and String#scan . I'll discuss them in a future WILR Reason since regex support is one of the reasons :-)

Features
Perhaps the most interesting and powerful aspect of Ruby strings are their support for having inline statements and the ease in which multiline strings can be defined and created.

a_string = 'cool'
"this is a #{a_string} string".capitalize! # => "This is a cool string"

x, y = 22.0, 7.0
"#{x}/#{y} = #{x/y} is an overestimate approximation for Pi which is closer to #{Math::PI}" # => "22.0/7.0 = 3.14285714285714 is an overestimate approximation for Pi which is closer to 3.14159265358979"

Creating multilines strings is as simple as:

s1 = %{Ruby on Rails
A clean, agile, and powerful Web framework
Created by David H. Hanson at 37signals}

s2 = <<END_STRING
Ruby on Rails
A clean, agile, and powerful Web framework
Created by David H. Hanson at 37signals
END_STRING

s1 == s2 # => false since s1 does not include the last newline
s1 += "\n"
s1 == s2 # => true

Final thoughts
As you can see, Ruby's String support is comprehensive and very easy to learn. The String class comes also built in with enumeration and comparable features. These aspects allow strings to be used like any other enumerable data types (e.g., arrays and lists) and also be compared for sorting and other comparison-based algorithms.

Because strings are mutable implies that the Ruby virtual machine can make efficient allocation for String instances. This is in sharp contrast to the Java programming language, for instance, which mandates immutability of strings and where multiple copies of strings with the same value can eat at the heap space.

Finally, the fact that strings can have embedded statements and creating multiline strings in Ruby is so easy has an interesting side effect when doing metaprogramming. In a nutshell, with some careful usage the metaprograms can be as readable as the code that they are representing. More on this in a later WILR Reason.

No comments: