Install Sun (Oracle) Grid Engine

In this tutorial I will show you how to deploy a Sun Grig Engine infrastructure to transform your cluster into an HPC environment. The Sun Grid Engine used will be 6.2 update 5.

First we will take a short look at the Grid Engine architecture and the terms that must be understant before procedding with the installation process.

Terms list

  • slot – the smallest entity of computational resources that can be reserved. Usualy this is a core of a multicore procesor or the enterire processor.
  • queue – method for storeing the jobs that were submited to execution. It every job has a priority that can be change dynamicaly.
  • parallel environment – it is an adition to the queue process so that a user can request more than one slot for his parallel or distributed job.
  • master host/daemon – this is the main host in a cluster. It is resposable with job management and resource alocation. The deamon process that runs on this machine do all the scheduling work. The host can be a common PC because here no job will run.
  • execution host/daemon – there are the cluster worker nodes. These must be dedicated computers with as much cpu power and memory as needed.

Cluster architecture

A generic cluster has the folowing elements:

  • a storage system
  • a control unit
  • execution hosts
  • front end

The storage system is the place were all the data produced or proccesed by the cluster will be put. This includes the users home directories, global configuration files, repositories, etc. This system must be able to export the storage element by using SAN or NAS protocols.

A control unit usualy is the host that runs the master daemon. This host is the gate to the rest of the machines in cluster.

Execution hosts are the actual computation workers. Here the numbers are crunched.

The cluster must have a front end where user can submit jobs and view their results. The front end may be a terminal computer or a graphical environment (qmon).

Software (Grid Engine) architecture

The master host and the execution daemon were discused earlier.

ARCo – Accounting and Reporting Console – it is an interface for gathering statistic data from the cluster.

DRMAA – Distributed Resource Management Application API – automates Sun Grid Engine functions by writing scripts that run Sun Grid Engine commands and parse the results

Shadow Master Host – this is a method to reduce the cluster downtime. It the master host fails the shadow master takes his place.

More information abou how the system works is available at http://wikis.sun.com/display/gridengine62u5/How+the+System+Operates.

Install process

Prereq

Before we start the install process the folowing services must be properly configured:

1. name resolution service – SGE relies on a good name resolution system: all the hosts must have a name and a static ip (dhcp static alocation is ok) and each ip must be translated into a name, so you must provide forward and reverse name resolution.

2. users – SGE proceses are advised to run under a unpriviledged user. The default user is sgeadmin. It must have the same uid:gid an all hosts that run SGE components. Cluster users must exists (with the same uid:gid) on all hosts that run SGE components. The best way to this is to configure a LDAP service for central user authentication. This part (LDAP) will not be covered in this tutorial.

3. shared home directories – it is required that the users home diretories be the same on each host that runs a SGE component. Usualy users will submit jobs from their home dir (-cwd) and the execution hosts must be able to run that program from that place.

4. Sun Grid Engine 6.2u5 binaries: http://www.sun.com/software/sge/get_it.jsp.

After the prereq are satisfied it is time to begin the installation process. The commands are from an Oracle Enterprize Server 5.4 edition but with little efort may be applied on other distribution. I will try no to use distribution specific commands. Be carefull with the firewall on the servers, I sugest to deactivate it to avoid posible connection errors.

The cluster in this example will lock like this:

Our domain will be mygrid.net and the servers from the diagram have the coresponding ips. The fqdn for nfs server will be nfs.mygrid.net .

Ones that the the infrastructure is up and running (the servers have an operating system and the networking is done) we will need some form of automation to do work on hosts from one point. For this the ssh root key from the control.mygrid.net server have to be copied to all other servers to permit password-less authentication.

control# ssh-keygent -t rsa -b 4096
control# ssh-copy-id -i .ssh/id_rsa.pub root@node1.mygrid.net
control# ssh-copy-id -i .ssh/id_rsa.pub root@node2.mygrid.net
control# ssh-copy-id -i .ssh/id_rsa.pub root@nfs.mygrid.net

Because we don’t have a directory service to store our user account data, this job will be assigned to control server. To replicate the data into the cluster we simply copy over ssh the /etc/passwd file on the rest of the servers.

The nfs server will store user home directories. It will export for the 10.0.0.0/24 network the /mygrid/home/ directory:

nfs# vi /etc/exports
/mygrid/home/ 10.0.0.0/24(rw,sync,no_root_squash)

The rest of the server will mount at startup the nfs share. For this the /etc/fstab file must be edited:

{control,node1,node2}# vi /etc/fstab
10.0.0.4:/mygrid/home /mygrid/home/ nfs defaults 0 0

When a user is added the home directory will look like /mygrid/home/username.

Make SGE adminitrative user

After the infrastructure is ready we have to make the group and user for the Sun Grig Engine system. The uid:gid will be 500:500 and the name of both user and group will be sgeadmin.

groupadd -g 500 sgeadmin
useradd -m -d /opt/sge-6.2u5/ -s /bin/bash -u 500  -g sgeadmin -c "SGE Admin User" sgeadmin

This command will also create the /opt/sge-6.2u5 directory with the apropriate rights.

Install the master (control) host

Now it is time to install the Sun Grid Engine master daemon on our control host. First unpack the binaries downloaded from the Sun website into the /opt/sge-6.2u5 directory. Go to that directory and execute ./inst_sge -m. This will gide you trough the install process of the master daemon. Please answer to the question with the following information:

  • sge_master_port 49100
  • sge_exection_port 49101
  • cell_name mygrid
  • spool type classic
  • create startup script yes
  • add execution hosts yes and enter “node1.mygrid.net node2.mygrid.net” without “”

After the instalation is complete the sge_master daemon will be started. You can check that with ps ax | grep -i sge_master .

Now it is time to add our execution hosts to the system. Please make sure thier hostname are properly resolved. The execution host must be added first as administrative host, then installed.

To prepare our environemnt variables first we have to source the cell configuration file:

source /opt/sge-6.2u5/mygrid/common/settings.sh

and then tell the master daemon witch are our execution hosts:

qconf -ah node1.mygrid.net
qconf -ah node2.mygrid.net

qconf is the command line configuration tool. If you prefer a graphic one, use qmon (the X server must be installed on the control host). To be able to submit jobs from this host, it must be declared as a submition host:

qconf -as control.mygrid.net

Install the execution hosts

The execution hosts need the cell configuration files. In this way any change made on the master is instantly propagated to execution hosts. This usefull in case of the master hosts fails and the shadow master anounce himself as a new master daemon. To accomplish this we need to export the /opt/sge-6.2u5/mygrid directory via nfs.

So on the control host add the folowing line to the /etc/exports file:

control# vi /etc/exports
/opt/sge-6.2u5/mygrid 10.0.0.0/24(rw,sync,no_root_squash)

On the execution host the nfs export must be mounted on the same place. So we append the /etc/fstab file to automaticaly mount the directory on boot time:

{node1,node2}# vi /etc/fstab
10.0.0.1:/opt/sge-6.2u5/mygrid /opt/sge-6.2u5/mygrid nfs defaults 0 0

The target directory must exists on the nodes.

To install the execution hosts unpack the two zip archive in the /opt/sge-6.2u5/ directory. Source the configuration file (this tells the install program were the master is and on what port is it running):

source /opt/sge-6.2u5/mygrid/settings.sh

then run the ./inst_sge -x command from the /opt/sge-6.2u5/ directory. The installation procces will gide you (the most answers will autocomplete themself) and at the end you are promted to choose whether on not to add the node (node1 and node2) to all.q queue as execution hosts.

Testing the system

Now the install proccess is allmost complete. It is time for running some tests. The test assumes that the shared home directories are mounted on all hosts but nfs.mygrid.net as described in this tutorial.

Step1:

From the control host add a new user. Let the user name be sgeuser:

groupadd sgeusers
useradd -m -d /mygrid/home/sgeuser -g sgeusers sgeuser
passwd sgeuser

Step2:

Copy from the control host the /etc/passwd and the /etc/group  files to the rest of the hosts. Use the ssh key authentication accomplished earlier.

On any host you should be able to run

 id sgeuser

and get full information about the user. You can set the sgeuser’s password if you like (or copy the /etc/shadown file from the control host).

Step3:

Login as our new user from the control host: sgeuser. Source the cell configuration file located in /opt/sge-6.2u5/mygrid/common/settings.sh. The

qstat -f

should show you that the node1 and node2 are running well:

queuename                          qtype resv/used/tot. load_avg  arch                     states
———————————————————————————————-
all.q@node1.mygrid.net  BIP       0/0/8                 0.03        lx24-amd64
———————————————————————————————-
all.q@node2.mygrid.net  BIP      0/0/8                 0.10         lx24-amd64

If it looks like this it is time to run some jobs. We will submit a dummy job from the provided examples.

qsub -q all.q -cwd /opt/n1sge6/sge-6.2u5/examples/jobs/sleeper.sh

To watch the queue progress:

watch "qstat -f"

Good luck!

Instalare Sun Grid Engine

Instalarea Sun Grid Engine in sine nu este un proces dificil dar necesita putina atentie.

Pentru inceput trebuie pregatit terenul si aici revin la arhitectura sistemului de planificare a executiei joburilor.

Din figura se poate deduce ca avem nevoie de o masina (fizica sau virtuala) care sa joace rol de master (pe ea va rula sge_master) si una sau mai mai multe masini (fizice sau virtuale) ce vor avea rol de execution nodes (pe ele va rula sge_execd).

SGE ofera posibilitatea de a rula sub un anumit user. Aceasta poate fi considerat un bonus la capitolul securitate: daca cineva reuseste sa compromita unul din seviciile sale ce ascultata pe retea nu va putea face prea mult rau sistemului, din lipsa de drepturi. Este recomandata rularea sub un user neprivilegiat (si SUN recomanda acest lucru). Important de retinut ca userul va trebui sa aibe acelasi uid:gid pe toate masinile din ce vor fi integrate in SGE.

Inaintea inceperii instalarii trebuie ca urmatoarele servicii sa fie functionale:

- rezolvarea de nume pentru toate hosturile din cluster, directa si inversa

- autentificare centralizata, de preferinta LDAP pentru user management

- user home directories exportate prin nfs

Instalarea propriu-siza consta din instalarea procesului master (sge_master) pe o masina separata. Aceasta se face prin invorcarea comenzii ./inst_sge -m. Nodurile worker se instaleaza cu ./inst_sge -x.

Pentru mai multe detalii accesati sectiunea de tutoriale sau link urmator: http://doraz.ro/tutorials/installing-sun-grid-engine/.

Sun Grig Engine

Sun Grid Engine este o solutie de planificare pe griduri oferita de Sun Microsystems.

De ce planificare pe griduri?

In primul rand este nevoie de rezervare de resurse: vrem ca programul (job) pe care il rulam sa ruleze cu performante maxime iar pentru acest lucru nu trebuie ca doi utilizatori sa foloseasca aceeasi masina pentru procesare.

Securitatea este si ea un punct de interes in aces caz: datele unui job nu trebuie sa se amestece cu datele altui job al altui utilizator.

Arhitectura

Sun Grid Engine (SGE) foloseste modelul master – worker in care exista un master (qmaster) ce gestioneaza noduri worker, noduri pe care se ruleaza efectiv joburile. Astfel se poate face si o gestionare centralizata a nodurilor folosind utilitare incluse in pachet (qlogin, qrsh).

Slot

Un slot este o unitate de executie. Poate fi un core, procesor sau masina. In functie de acest lucru este necesar sa nu folosirea de medii paralele de executie.

Queue

In queue este o coada de prioritati dinamica in care joburile asteapta sa fie rulate. Se pot creea oricate cozi in functie de necesitate. Un caz comun este creearea a cate o coada pentru fiecare tip de procesor existent in sistem si includerea acelor noduri ca si noduri worker pentru acea coada.

Queue-urile ofera facilitatea de a rula scripturi particulare inainte si dupa rularea unui job (prolog si epilog scripts). Acestea sunt utile atunci cand se doreste rulare de joburi IntelMPI. IntelMPI nu se integreaza cu SGE si astfel layerul de comunicare peste retea trebuie bootat manual (looseless integration). OpenMPI nu are aceasta problema.

Parallel environment

Am vorbit mai sus de joburi MPI. In functie de cum este alocat un Slot este nevoie de a rezerva mai mult de unul pentru a rula cu un numar de procese dorit. Astfel la rularea jobului pe langa specificarea queue-ului este necesara si specificare pe-ului parallel environement) impreuna cu numarul de sloturi de rezervat.

Instalare

Pentru instalarea SGE este necesar cel putin o masina. Aceasta va avea rol de master host si de submiter host dar poate avea si rol de execution host (in cazul in care se doreste executarea seriala a unor procese batch). Pentru performanta se recomanda ca nodurile worker sa fie diferite de master.

Atasez mai jos o prezentare pe care am realizat-o pentru un curs de la facultate.

sprc-prezentare