Introduction to Cluster Computing

 

What is a Cluster?


A computing cluster is a group of closely linked computers that work together as a single computer. This configuration gives better performance than any of the single computers, called nodes, are able to obtain working alone. In the NIC cluster nodes are divided into three types. The vast majority are computer nodes. These are what actually carry out the computations. There are login nodes, which handle login requests and allow you to access your home directory. And there is the head node, which controls the cluster.

The head node uses a resource manager to manage the cluster's resources such as node availability, RAM, CPU time, etc. PBS/TORQUE is the current resource manager on the cluster. The head node passes information from the resource manager to a scheduler. The scheduler then decides when and where to run jobs, based on their job files. The scheduler decides which jobs to run based on several criteria such as job size, run time, and group's queue size. Maui is the current cluster scheduler. The queue size is directly related to the number of processors that the group has purchased.

Some Common Terms

  • cpu time: the amount of time a CPU is in use, measured per CPU ( so a 4 processor job that runs for 15 minutes takes 1 hour of cpu time)
  • job: any user-submitted program executed on the cluster
  • job file: a file containing information about a job used by the resource manager and the scheduler to schedule the job
  • queue: an ordered group of jobs waiting to run
  • wall-clock: time the amount of time a job runs on the cluster ( a 4 processor job that runs for 15 minutes takes 15 minutes of wall-clock time )