Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Tuesday, June 9, 2009

Cloud computing - a programming perspective


This is a repost of a blog entry created for the OOPSLA 2009 official blog.

Cloud computing is the “new hot” topic. Simply put, various business pressures, a multitude of pain points, and the maturity of a series of Web technologies (networking, APIs, and standards) have made it possible and cost-effective for businesses, small and large, to completely host data- and application-centers virtually... in the cloud, if you may.

Cloud computing providers, e.g., Amazon, reuse their expertise in efficiently managing and hosting their own Web systems and applications, and expose that core expertise as a set of Web APIs. Using the Amazon Web Services Elastic Compute Cloud (EC2), anyone with a credit card and some programming can provision a server instance and install a Web application on it and thus immediately have a presence on the Web. Using economies of scale for server hardware combined with virtual machine technologies, data- and application-centers automation expertise, as well as extensive instrumentations, Amazon is able to provide that service globally for pennies at the hour. There are no binding contractual agreements and Amazon will only bill you for the hours you have used.

In addition to compute instances, Amazon also provides various other compute resources on their cloud platform, e.g., storage (file and block), message queues, batch data processing, and others. Following Amazon’s lead, various companies, including Google, IBM, and Microsoft, are also exposing frameworks, services, platforms, and applications to a world-wide audience from within a Web browser and with simple Web APIs. Cloud computing is no less than a democratization of compute resources. With cloud computing, vast compute resources no longer require huge and long-term investments but instead can be had and consumed, as Amazon chairman Jeff Bezos, like to say, “by the drink”.

Whether cloud computing will fulfill the high-expectations that many are advocating is still to be determined. Various challenges remain and, in our opinion, we are reaching the peak of the typical hype curve that new technologies follow. However, regardless of whether cloud computing will be a bust or continue to be the hit that it has certainly been so far, there is one undeniable truth that some seem to ignore... The current success of cloud computing and, we believe, its future successes, are heavily tied to how easy the cloud and cloud applications are to program as well as to maintain and to scale. And this is precisely why OOPSLA matters to cloud computing advocates, users, and providers and vice-versa.

As we mentioned, with the cloud, computing resources are cheap and widely available. In a matter of minutes, one can provision 100s of server instances on the Amazon EC2 cloud along with terabytes of storage and more aggregate MIPS than what is available on most recent mini-computers. All of this for around $10 an hour. While most anyone could afford such computing capacity at these price points, what is hard for most is to take advantage of that cheap capacity. The problem is no longer one of provisioning the resources, but rather one of taking advantage of these resources and of efficiently doing so.

We are at the beginning of a new evolution of programming. One that is taking place with this move to cloud computing. For lack of a better moniker, we call it cloud programming. It is about being able to scale programs to take advantages of these on-demand cloud compute resources. Programming distributed nodes of computation has always been one of the classic ongoing problems of computer science. The cloud, it seems, has thrusted this problem and associated corollary issues to the forefront...

While cloud programming has some resemblance to old-style distributed programming or super computing or multicore programming, it is a different problem due to the changes in the core assumptions and constraints. On the cloud, most compute resources are essentially server instances with virtual compute capacities or virtualized services. The network is the Internet and assumptions about co-locations, latency, and errors cannot be made. The same concerns one has with real servers in your data centers also still persist. That is, securing, upgrading, automating, and managing these virtual instances are still very much part of the programming that one must do to reap the benefits of new cloud infrastructures. Scripting languages, e.g., Ruby, Python, and Groovy, are already taking center stage to solve some of these issues.

Additionally, now that storage can shrink and grow on demand and for very low costs, while keeping reasonably good qualities of service, the other issue is how to manipulate the vast amount of data that one can now store. Google had a similar concern years ago as it improved its search engine while managing expenses in growing its data centers to match the unprecedented growth of the Web. Google engineers and scientists cleverly figured out how to parallelize data computation over large clusters of cheap and replaceable compute nodes. The MapReduce programming model is specifically designed to help engineer algorithms that can scale and run on the resulting big data that one now accumulates...

Programming for massive scale is the key challenge. We firmly believe that new styles of programming, new programming frameworks, and new programming languages may be one of the key sources of innovations for the cloud. Imagine when cloud frameworks and cloud programming environments provide, in near real time: the cost, the energy impact, and the automation facilities that a cloud computing infrastructure enables. Plus now imagine being able to program these multiple cloud nodes either in batch or in real-time, while satisfying best practices of Web security and privacy. The combined results would be the nirvana of Web programming. Scaling automatically your compute resources in a cost-efficient and environmentally friendly fashion while managing the resulting deluge of data and potential influx of users...

Surely there are many PhD theses to be had to help address some of the fundamental scientific and engineering issues involved in achieving such an idealized state of Web computing. In some ways we maybe vastly simplifying the issues and that many of the challenges involved have been studied in various branches of computer science and software engineering for the past 30 years. However, the point here is not to claim that cloud computing is the assured next wave of computing, we don’t know; but rather, we would like this post to simply serve as a reminder that the various issues in system, data, and distributed computing that cloud computing brings to the forefront could be addressed from innovations in frameworks, programming styles, and programming languages... OOPSLA, it seems, from its long historical track record of ground breaking innovations in this space, may be a natural choice for the genesis of some of these new future eureka moments.


Updates
06/01/09 - fixed typos: accumulate => accumulates

Sunday, February 24, 2008

The prospect of Ruby and Rails

Andrew McAfee on Enterprise 2.0

I recently attended a talk by Andrew McAfee (Harvard Business School) at the famed Palo Alto Research Center (PARC). Andrew is one of the more eloquent speakers from business schools on matters of technologies. His talk at PARC was on Enterprise 2.0 or how the recent evolution of the Web is infiltrating inside enterprises and what enterprises can do to take advantages of new Web innovations.


Andrew McAfee at PARC on 21 February 2008. Credit, self with iPhone

While Andrew's talk was delivered passionately, most of what he discussed was not new to me, and I imagine not new to anyone monitoring the valley or Web 2.0. However, Andrew mentioned one thing that caught my attention and made me decide to post this entry...

Prospect Theory and TiVo

After discussing the various ways in which Web 2.0 is impacting enterprises, e.g., blogs, wikis, social networking, and virtual markets, Andrew highlighted some challenges that still remain. In particular, he explained that Web 2.0 technologies will only slowly become mainstream in enterprises. This is because, in most decision makers' minds, Web 2.0 innovations are not 10x (or more) better than current alternatives. The 10x figure is not magic, it comes from Prospect Theory.

Developed by Daniel Kahneman (Princeton University and Hebrew University) and Amos Tversky (1937 - 1996), Prospect Theory (PT), for which Kahneman would win the Nobel prize in economics in 2002, tries to explain the psychology of decision-making of agents that are faced with deciding between alternatives in the presence of risks. In particular the theory tries to explain an agent's decision-making in the face of the prospect of "better" alternatives (B).

One interesting result of the theory, confirmed many times empirically, is that when agents are faced with the decision to move (or change) to alternative B from a known, workable alternative (A), they tend to overestimate alternative A as being three times better than what it is and at the same time underestimate the prospective B as being three times less valuable than what it is... This implies that in order for alternative B to become mainstream (thereby displacing A), assuming no additional changes to the market and rounding the numbers, the new prospect B should be at least 10x more "valuable" than the alternative it is displacing, i.e., A.

Andrew explained that the 3x figures are supported by various empirical data drawn from many experiments. He illustrated his point by making a quick (unscientific) poll with the audience. He asked who had, loved, and evangelized TiVo. Interestingly, about 10 to 15 percent of the audience raised their hands and never let it down... The point is that while TiVo has a good, passionate, following, however, it is not generally seen as being 10x better than current TV experience, and thus is not causing a mainstream change in current TV viewers.

I am not an economist, but I immediately saw the importance of Andrew's point and how he may be right applying PT's results to the infiltration of Web 2.0 tools into the enterprise. I also quickly thought of applying the same to one of my research area: Web computing.

The future of Web programming

Web programming has evolved from CGI scripts written mostly Perl, to server-side applications running on complicated containers written in Java and .NET, to now increasingly using various agile and lightweight server-side frameworks using dynamic scripting languages such as JavaScript, Python and Django, Project Zero, and Ruby and Rails. This evolution is not accidental, but rather, I believe the result of the need for agile reaction to changes in requirements for such applications as well as the new complexities of Web applications; and finally the increase performance of networks and of the devices that we are using to connect to the Web.

It is well documented that individual and teams of programmers can quickly build Web 2.0 Web sites in Ruby on Rails much faster than they can with Java EE or .NET or Perl. Naturally, there will be cases where this productivity is not applicable, say areas where lots of legacy code is to be reused, or where skilled programmers cannot be found, or where scalability (to many millions of concurrent connections) is required. However, on average, it is fair to assume that building Web applications using modern scripting frameworks, such as Rails, allows a huge advantage to programmers.

The DSL hypothesis and the prospect of Ruby and Rails

The way to understand this increase in productivity, I believe, is in what I call "the domain-specific language (DSL) hypothesis." Dynamic languages, especially Ruby, allows the creation of libraries and mini-languages (DSLs) that allows one to represent domain-level concepts in a way that they can be programmed directly. For instance, in Rails programmers can represent the data model of their application using the ActiveRecord DSL. Using ActiveRecord constructs, programmers directly represent the relationships amongst the models and their "shape". Using metaprogramming the constructs are translated into code that is executed by the framework and the underlying database. This meta-level expression capability saves the programmers a huge burden and increases her productivity.

However, while these dynamic languages and DSL are increasing productivity of programmers everywhere, they are not making as much inroads into enterprises... Why is that? Maybe Prospect Theory is the answer; or maybe, we are just too early in this wave change and we need to give enterprises and enterprise programmers more time to catch up with the rest of the community. I don't have enough evidence to suggest that languages and frameworks like Ruby and Rails provide 10x increase in productivity over Java EE or .NET, but from my own experience I believe the figure is close...

To help make my point and to encourage naysayers to look seriously into these alternative Web computing paradigms, I have created a list of 10 reasons Why I love Ruby. I posted reason #1 last year and promised to continue posting the subsequent ones over the next few months. I am not claiming that together these reasons will prove that Ruby and Rails will become mainstream, however, I hope they will encourage enterprise programmers to give strong considerations to trying these alternative paradigms, as I believe once they get a taste of the productivity boost, they will raise their hands like these TiVo passionate in Andrew's lecture.

Change log
24 February 2008 - first posted

Sunday, November 18, 2007

Why I love Ruby? Reason 1: because strings change and when manipulated should stay strings and be readable


Why strings are critical
Many modern languages come built in with string libraries or better with strings as a first class concept. Arguably, languages that have not done so have suffered significantly from that shortage; case in point C++ which, in the early 1990's, saw a variety of string libraries from vendors fighting for dominance, while the language itself became less relevant and surpassed by better alternatives.

The issue may on the surface appear to be unimportant, however, in practice having excellent string support in a language is hugely critical. Most tasks involve some type of string manipulation or usage, e.g., file input and output, strings as keys to hashes, and for user interface, just to name a few. This simply implies that a language that facilitates strings manipulations results in a real productivity boosts for programmers. In the case of Ruby, as I hope to convince you, the string support is so comprehensive and advanced that the common string tasks become a breeze, but also have a nice side effect of keeping the code rather readable.

Basics
The Ruby designers have chosen to make strings a first class type and the language includes a comprehensive library supporting strings. Unlike some other languages, strings in Ruby are mutable. This means that strings, when manipulated, do change and do not result in unique copies---Ruby has the concept and basic type of symbols that can be used for that that purpose. This decision has interesting consequences, though let's look into the basic string type usage and implementation in Ruby. The following Ruby code snippets illustrate the usage:

s = "this is a string" # creates a string
s += ', and this is another' # shows that strings are created with double or single quotes

The built-in String class contains tons of methods and also includes various modules (more on that in a future WILR Reason) that provides enumeration, comparison, and partitioning. Some basic methods:

s.capitalize # => "This is a string" as a new string
s.capitalize! # same result but modifies the receiver

s.upcase # => "THIS IS A STRING" (a new string)
s.downcase # => "this is a string" (a new string)

s.split # => ['This', 'is', 'a', 'string'] as an array of the elements
s.split('is a') # => ['This is ', ' string']

s.gsub! 'is ', '' # => "Tha string"

NOTE: s is modified to new value, Ruby idiom to use ! for methods with side effects

Advance manipulations

s.include? 'is' # => true
s.include? 'Is' # => false

s.insert 9, 'nother' # => "This is another string"
s.replace 'string' # => "string" (modifies string s)
s.center 20, '_' # => "_______string_______"
s.squeeze # => "_string_"
s.crypt 'password' # => "paHjoO.AYUKRQ" (one-way cryptography)

There are also various methods for matching with regular expressions, e.g., String#match, String#grep, and String#scan . I'll discuss them in a future WILR Reason since regex support is one of the reasons :-)

Features
Perhaps the most interesting and powerful aspect of Ruby strings are their support for having inline statements and the ease in which multiline strings can be defined and created.

a_string = 'cool'
"this is a #{a_string} string".capitalize! # => "This is a cool string"

x, y = 22.0, 7.0
"#{x}/#{y} = #{x/y} is an overestimate approximation for Pi which is closer to #{Math::PI}" # => "22.0/7.0 = 3.14285714285714 is an overestimate approximation for Pi which is closer to 3.14159265358979"

Creating multilines strings is as simple as:

s1 = %{Ruby on Rails
A clean, agile, and powerful Web framework
Created by David H. Hanson at 37signals}

s2 = <<END_STRING
Ruby on Rails
A clean, agile, and powerful Web framework
Created by David H. Hanson at 37signals
END_STRING

s1 == s2 # => false since s1 does not include the last newline
s1 += "\n"
s1 == s2 # => true

Final thoughts
As you can see, Ruby's String support is comprehensive and very easy to learn. The String class comes also built in with enumeration and comparable features. These aspects allow strings to be used like any other enumerable data types (e.g., arrays and lists) and also be compared for sorting and other comparison-based algorithms.

Because strings are mutable implies that the Ruby virtual machine can make efficient allocation for String instances. This is in sharp contrast to the Java programming language, for instance, which mandates immutability of strings and where multiple copies of strings with the same value can eat at the heap space.

Finally, the fact that strings can have embedded statements and creating multiline strings in Ruby is so easy has an interesting side effect when doing metaprogramming. In a nutshell, with some careful usage the metaprograms can be as readable as the code that they are representing. More on this in a later WILR Reason.

Tuesday, October 30, 2007

Why I love Ruby?

Since I am a vocal advocate of the Ruby programming language and the various tools, frameworks, libraries, and domain-specific languages (or DSLs) that are mushrooming around it, I often am asked to explain the difference or similarities between Ruby and other languages, e.g., Java, Python, PHP, and so on; or simply people often asked me simply why I so vocally advocate Ruby?


To make it easy to answer such questions and shed some lights on the matter, I decided to aggregate my answers and refine my thoughts into a series of blog entries that attempt to shed light summarize the technical reasons and give you my personal views as to why I am such a fanboy of the Ruby programming language and of the grassroots movement that accompanies it.

(photo credit: http://flickr.com/photos/19414947@N00/818191499/)

I hope this will be a conversation over the next months or so, and while I have at least 10 prepared reasons (posts) to answer this question, I plan to unveil them every other day week (or so) and refine them with real code examples or pointers.

I would love to hear your thoughts on the subject, so please feel free to add comments --- views in agreement or violent descents are welcome equally.

So without more introduction, here is why I love Ruby and why I think it is a programmer's and software researcher's best friend.

Minor updates on 11/18/2007