Tuesday, March 23, 2010

Software to learn

Here's software that I'm going to try in a near future.

Distributed processing

Hadoop
Hadoop is a distributed computing environment, best known for its implementation of MapReduce. I've already played around with Hadoop. It's easy to start, cause you begin with MapReduce only, no need to run all the modules on the fist day. But now it's time to learn some more.

Mahout
Apache Mahout is a suite of machine learning libraries. It works on top of Hadoop, which means it should be easily scalable to thousands of machines. Machine learning is not my cup of tea, but I'm going to have a quick look.

CouchDB
You can't work in IT and not hear about NoSQL movement. I've never used a non-relational DB in my projects before, time to change it. CouchDB goes first, since it's widely deployed in both small-scale (Ubuntu's DesktopCouch) and large-scale projects.


Web

Skipfish
A fully automated, active web application security reconnaissance tool. Not much to learn really, I'm just run it against my servers right now and I'll see the results in a few hours.

nginx
An extremely high performance HTTP server supporting static files, caching and proxy, even dynamic content with FastCGI. It's often orders of magnitude faster then Apache.

Memcached
Memcached stores object in memory. Technically, it's a database, but the name suggests the most common use: to cache objects (eg. database queries) in web application. See wikipedia article for good introduction. Another way to drastically increase performance of a web app. I've used memcached with MediaWiki (where it's just a matter of a few lines in a config file), but never used it for my project.


You probably noticed that half of these are Apache projects. I didn't intentionally select them that way, but now I'm going to have a closer look at other Apache software - besides those listed and those I already know. It's no coincidence that one organization supports innovative distributed programming frameworks, the most successful web server, widely used J2EE app server, popular version control tool and build automation system, Java libraries and a load of other software. Apache Foundation clearly have both high standards and a knack for choosing the right projects.

1 comment:

  1. Hi,

    If you have Java project you may use EhCache instead of Memcached (eventually backed up by Terracotta) which provides you much better performance.
    Btw. your blog is fantastic.
    Tamas

    ReplyDelete