My Photo

Enter your email address:

Delivered by FeedBurner

AddThis Social Bookmark Button

Add to Google Reader or Homepage

Add to Pageflakes

Add to netvibes

Subscribe in Bloglines

« Great Startups: WetPaint Wiki, Blist, Zillow, RedFin | Main | Entrepreneurs receiving Advisor Feedback »

July 07, 2008

Engineering Building Blocks for a Startup Company

The following are important engineering building blocks for internet companies.

I've had a friend who has worked on Win32 applications for about 10 years and is going to an internet company.  I was telling him that these are the computer science advances that people at internet companies consider as core competencies.  They are a must read for people who have not been working at an internet company.

Internet companies are quick to build because they use these open source components, and then they build "callbacks" or "plugins" to make them solve the problems for their customers.  These are stable because they have had years or a decade of improvements from large companies like Yahoo, etc.

  • Rails: I consider this the 4th generation of programming languages. (Generation #1 is assembly, Generation #2 is C/C++, Generation #3 is Java/C#)   Creating an internet company is all about speed. That means drop the time building UI to as close to zero as you can.  The real value is building the content driving the company (YouTube, Flickr, eBay) or the algorithms adding value to the customer (TalentSpring, Zillow, Farecast, PayScale, PayPal) or customer service (Craigslist, Zappos).   Rails gets the development efficiencies that you need to compete.
  • Hadoop: This is a common framework to submit a task and it can be completed across a farm of servers.  Each server picking up the task, sends it across a series of servers.  Each server runs the task scoped to solve a partition of the problem.  This is used so often that it feeds into MapReduce (see below).  People who want to write an internet crawler run Nutch tasks as the task.  Each task is a JAR file (a bundled batch of Java files). This is similar to the OS thread scheduler where each server is a thread, except it helps coordinate as the problem is partitioned to be completed separately.   Also see here and here.
  • Nutch: This is the web crawler open source component.
  • HBase: When you need a database, and the database doesn’t need to scale beyond one machine – then MySQL/Postgres/MS SQL is fine.  When you need a database that scales across a series of machines, then HBase is a good solution. This is like Google’s BigTable as one database to drive all of their products, or Amazon’s SimpleDB to drive all of their products.  Also see here.
  • PageRank: Google uses this as their core to drive the quality in their search.  We here at TalentSpring use this in powerful ways.  Also see here.
  • MapReduce: MapReduce is a core competency for internet companies (Google, Amazon, Yahoo, Windows Live Search, etc.).  This is like window messages for Win32 programmers or CLR framework for MSFT server programmers.   This is also useful.
  • Carrot2: Google is a keyword search engine (with PageRank helping weightings).  The generation of search beyond keyword search is Semantic search.  Carrot2 is a search engine using clustering to generate the results.  Also see the demo here.
  • Lucene: This is the open source search component.  It is a search engine for a local database of content.  (You need to add nutch to create an internet search engine)
  • SOLR: This goes along with Lucene.
  • Drupal: This is a framework to build a website immediately.  Building internet companies is about not re-inventing the wheel.  This gives the web site sign-in/user pictures, and all of the basic support.  A WIKI takes a web page to the next level with instant editability and low-cost-of-ownership. The parallel is DRUPAL is to a WEB SITE what WIKI is to a single page. Drupal gives you the toolbars, pages that you can create when in edit mode, and plug-in modules to give you features similar to: DIGG, FLICKR, YouTube, etc.  If you create a web site, it is ridiculous to do so without seriously considering starting it with Drupal.  If you create an internet company, you should seriously consider starting it on top of Drupal.  Why re-invent a toolbar & navigation system, help system, simple pages for marketing, sign-on support with User Pictures and forget password emails, and search that works across the entire site.
  • EC2: Do you want a server farm of 300 computers for 2 days?  And only pay a few dollars since you only need it for a few days?  EC2 is the solution from Amazon.  We built an internet crawler very quickly using Hadoop+Nutch+EC2.  Crawling a significant part of the internet is easy with a hundred servers and it only takes a day or so.  EC2 makes it very inexpensive.  Envision thinking about having EC2 available, and your horizons of creativity expand.
  • And of course everyone knows about WordPress, MySQL, Postres.
  • These are also interesting but lower priority: Google Analytics, OpenID, Mechanical Turk, oDesk, S3, OpenSocial, JOONE, GRETL, and Ning.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/1064690/30976308

Listed below are links to weblogs that reference Engineering Building Blocks for a Startup Company:

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In