« Great Startups: WetPaint Wiki, Blist, Zillow, RedFin | Main | Entrepreneurs receiving Advisor Feedback »

Engineering Building Blocks for a Startup Company

The following are important engineering building blocks for internet companies.

I've had a friend who has worked on Win32 applications for about 10 years and is going to an internet company.  I was telling him that these are the computer science advances that people at internet companies consider as core competencies.  They are a must read for people who have not been working at an internet company.

Internet companies are quick to build because they use these open source components, and then they build "callbacks" or "plugins" to make them solve the problems for their customers.  These are stable because they have had years or a decade of improvements from large companies like Yahoo, etc.

  • Rails: I consider this the 4th generation of programming languages. (Generation #1 is assembly, Generation #2 is C/C++, Generation #3 is Java/C#)   Creating an internet company is all about speed. That means drop the time building UI to as close to zero as you can.  The real value is building the content driving the company (YouTube, Flickr, eBay) or the algorithms adding value to the customer (TalentSpring, Zillow, Farecast, PayScale, PayPal) or customer service (Craigslist, Zappos).   Rails gets the development efficiencies that you need to compete.
  • Hadoop: This is a common framework to submit a task and it can be completed across a farm of servers.  Each server picking up the task, sends it across a series of servers.  Each server runs the task scoped to solve a partition of the problem.  This is used so often that it feeds into MapReduce (see below).  People who want to write an internet crawler run Nutch tasks as the task.  Each task is a JAR file (a bundled batch of Java files). This is similar to the OS thread scheduler where each server is a thread, except it helps coordinate as the problem is partitioned to be completed separately.   Also see here and here.
  • Nutch: This is the web crawler open source component.
  • HBase: When you need a database, and the database doesn’t need to scale beyond one machine – then MySQL/Postgres/MS SQL is fine.  When you need a database that scales across a series of machines, then HBase is a good solution. This is like Google’s BigTable as one database to drive all of their products, or Amazon’s SimpleDB to drive all of their products.  Also see here.
  • PageRank: Google uses this as their core to drive the quality in their search.  We here at TalentSpring use this in powerful ways.  Also see here.
  • MapReduce: MapReduce is a core competency for internet companies (Google, Amazon, Yahoo, Windows Live Search, etc.).  This is like window messages for Win32 programmers or CLR framework for MSFT server programmers.   This is also useful.
  • Carrot2: Google is a keyword search engine (with PageRank helping weightings).  The generation of search beyond keyword search is Semantic search.  Carrot2 is a search engine using clustering to generate the results.  Also see the demo here.
  • Lucene: This is the open source search component.  It is a search engine for a local database of content.  (You need to add nutch to create an internet search engine)
  • SOLR: This goes along with Lucene.
  • Drupal: This is a framework to build a website immediately.  Building internet companies is about not re-inventing the wheel.  This gives the web site sign-in/user pictures, and all of the basic support.  A WIKI takes a web page to the next level with instant editability and low-cost-of-ownership. The parallel is DRUPAL is to a WEB SITE what WIKI is to a single page. Drupal gives you the toolbars, pages that you can create when in edit mode, and plug-in modules to give you features similar to: DIGG, FLICKR, YouTube, etc.  If you create a web site, it is ridiculous to do so without seriously considering starting it with Drupal.  If you create an internet company, you should seriously consider starting it on top of Drupal.  Why re-invent a toolbar & navigation system, help system, simple pages for marketing, sign-on support with User Pictures and forget password emails, and search that works across the entire site.
  • EC2: Do you want a server farm of 300 computers for 2 days?  And only pay a few dollars since you only need it for a few days?  EC2 is the solution from Amazon.  We built an internet crawler very quickly using Hadoop+Nutch+EC2.  Crawling a significant part of the internet is easy with a hundred servers and it only takes a day or so.  EC2 makes it very inexpensive.  Envision thinking about having EC2 available, and your horizons of creativity expand.
  • And of course everyone knows about WordPress, MySQL, Postres.
  • These are also interesting but lower priority: Google Analytics, OpenID, Mechanical Turk, oDesk, S3, OpenSocial, JOONE, GRETL, and Ning.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345182e669e200e553a8e1c88834

Listed below are links to weblogs that reference Engineering Building Blocks for a Startup Company:

Comments

Yep! Internet websites are a dime a dozen but only a few are worth checking out...

These are very valuable and vital tools for an architect.

well I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Well i must say the things you mentioned above was really informative and we should take care of these things when opening a company .

I started a course on computer programming a while back and just couldn't handle all the crazyness that came along with it...LOL but I do understand some of these things that you have mentioned (cause I still tinker with it here and there) and thing they you are definitely right on.

Wow great information about engineering building blocks or infrastructure for internet companies. Will learn this thing while am looking for a job.

Interesting blog, feel free to check out mine.

-Scott

hey how you doing! Nice posting. I enjoyed reading it. I too run a blog on internet and traffic building, and checking
out what others may have written.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

My Photo

About TalentSpring

  • TalentSpring is a marketplace of resumes. Resumes are ranked within their Industry so employers can immediately go to great resumes.
AddThis Social Bookmark Button

Add to Google Reader or Homepage

Add to Pageflakes

Add to netvibes

Subscribe in Bloglines

Your email address:


Powered by FeedBlitz