( SCI Sockets

Info Catalog ( MySQL Cluster Interconnects ( MySQL Cluster Interconnects ( Performance figures
 16.6.1 Configuring MySQL Cluster to use SCI Sockets
 In this section we will show how one can use a cluster configured for
 normal TCP/IP communication to instead use SCI Sockets. Prerequisites
 for doing this is that the machines to communicate needs to be equipped
 with SCI cards. This documentation is based on the SCI Socket version
 2.3.0 as of 1 october 2004.
 To use SCI Sockets one can use any version of MySQL Cluster. The tests
 were performed on an early 4.1.6 version. No special builds are needed
 since it uses normal socket calls which is the normal configuration
 set-up for MySQL Cluster. SCI Sockets are only supported on Linux 2.4
 and 2.6 kernels at the moment. SCI Transporters works on more OS's
 although only Linux 2.4 have been verified.
 There are essentially four things needed to enable SCI Sockets. First
 it is necessary to build the SCI Socket libraries. Second the SCI
 Socket kernel libraries need to be installed. Third one or two
 configuration files needs to be installed.  At last the SCI Socket
 kernel library needs to be enabled either for the entire machine or for
 the shell where the MySQL Cluster processes are started from. This
 process needs to be repeated for each machine in cluster which will use
 SCI Sockets to communicate.
 Two packages need to be retrieved to get SCI Sockets working. The first
 package builds the libraries which SCI Sockets are built upon and the
 second is the actual SCI Socket libraries. Currently the distribution
 is only in source code format.
 The latest versions of these packages is currently found at. Check
 for latest versions.
 The next step is to unpack those directories, SCI Sockets is unpacked
 below the DIS code. Then the code base is compiled. The example below
 shows the commands used in Linux/x86 to perform this.
      shell> tar xzf DIS_GPL_2_5_0_SEP_10_2004.tar.gz
      shell> cd DIS_GPL_2_5_0_SEP_10_2004/src/
      shell> tar xzf ../../SCI_SOCKET_2_3_0_OKT_01_2004.tar.gz
      shell> cd ../adm/bin/Linux_pkgs
      shell> ./make_PSB_66_release
 If the build is made on an Opteron box and is to use the 64 bit
 extensions then use make_PSB_66_X86_64_release instead, if the build is
 made on an Itanium box then use make_PSB_66_IA64_release instead. The
 X86-64 variant should work for Intel EM64T architectures but no known
 tests of this exists yet.
 After building the code base it has been put into a zipped tar filed DIS
 and OS and date. It is time to install the package in the proper place.
 In this example we will place the installation in /opt/DIS. These
 actions will most likely require you to log in as root-user.
      shell> cp DIS_Linux_2.4.20-8_181004.tar.gz /opt/
      shell> cd /opt
      shell> tar xzf DIS_Linux_2.4.20-8_181004.tar.gz
      shell> mv DIS_Linux_2.4.20-8_181004 DIS
 Once all the libraries and binaries are in their proper place we need to
 ensure that SCI cards gets proper node identities within the SCI
 address space.  Since SCI is a networking gear it is necessary to
 decide on the network structure at first.
 There are three types of network structures, the first is a simple
 one-dimensional ring, the second uses SCI switch(es) with one ring per
 switch port and finally there are 2D/3D torus. Each has its standard of
 providing node ids.
 A simple ring uses simply node ids displaced by 4.
      4, 8, 12, ....
 The next possibility uses switch(es). The SCI switch has 8 ports. On
 each port it is possible to place a ring. It is here necessary to
 ensure that the rings on the switch uses different node id spaces. So
 the first port uses node ids below 64 and the next 64 node ids are
 allocated for the next port and so forth.
      4,8, 12, ... , 60  Ring on first port
      68, 72, .... , 124 Ring on second port
      132, 136, ..., 188 Ring on third port
      452, 456, ..., 508 Ring on the eight port
 2D/3D torus network structures takes into account where each node is in
 each dimension, increment by 4 for each node in the first dimension, by
 64 in the second dimension and by 1024 in the third dimension. Please
 look in the Dolphin for more thorough documentation on this.
 In our testing we have used switches. Most of the really big cluster
 installations uses 2D/3D torus. The extra feature which switches provide
 is that with dual SCI cards and dual switches we can easily build a
 redundant network where failover times on the SCI network is around 100
 microseconds. This feature is supported by the SCI transporter and is
 currently also developed for the SCI Socket implementation.
 Failover for 2D/3D torus is also possible but requires sending out new
 routing indexes to all nodes. Even this will complete in around 100
 milliseconds and should be ok for most high-availability cases.
 By placing the NDB nodes in proper places in the switched architecture
 it is possible to use 2 switches to build a structure where 16
 computers can be interconnected and no single failure can hamper more
 than one computer.  With 32 computers and 2 switches it is possible to
 configure the cluster in such a manner that no single failure can
 hamper more than two nodes and in this case it is also known which pair
 will be hit. Thus by placing those two in separate NDB node groups it
 is possible to build a safe MySQL Cluster installation. We won't go
 into details in how this is done, since it is likely to be only of
 interest for users wanting to go real deep into this.
 To set the node id of an SCI card use the following command still being
 in the `/opt/DIS/sbin' directory. -c 1 refers to the number of the SCI
 card, where 1 is this number if only 1 card is in the machine. In this
 case use adapter 0 always (set by -a 0). 68 is the node id set in this
      shell> ./sciconfig -c 1 -a 0 -n 68
 In case you have several SCI cards in your machine the only safe to
 discover which card has which slot is by issuing the following command
      shell> ./sciconfig -c 1 -gsn
 This will give the serial number which can be found at the back of the
 SCI card and on the card itself. Do this then for -c 2 and onwards as
 many cards there are in the machine. This will identify which cards
 uses which id. Then set node ids for all cards.
 We have installed the necessary libraries and binaries. We have also
 set the SCI node ids. The next step is to set the mapping from
 hostnames (or IP addresses) to SCI node ids.
 The configuration file for SCI Sockets is to be placed in the file
 `/etc/sci/scisock.conf'. This file contains a mapping from hostnames
 (or IP addresses) to SCI node ids. The SCI node id will map the hostname
 to communicate through the proper SCI card. Below is a very simple such
 configuration file.
      #host           #nodeId
      alpha           8
      beta            12   16
 It is also possible to limit this configuration to only apply for a
 subset of the ports of these hostnames. To do this another
 configuration is used which is placed in `/etc/sci/scisock_opt.conf'.
      #-key                        -type        -values
      EnablePortsByDefault		          yes
      EnablePort                  tcp           2200
      DisablePort                 tcp           2201
      EnablePortRange             tcp           2202 2219
      DisablePortRange            tcp           2220 2231
 We are ready to install the drivers. We need to first install the
 low-level drivers and then the SCI Socket driver.
      shell> cd DIS/sbin/
      shell> ./drv-install add PSB66
      shell> ./scisocket-install add
 If desirable one can check the installation by invoking a script which
 checks that all nodes in the SCI Socket config files are accessible.
      shell> cd /opt/DIS/sbin/
      shell> ./
 If you discover an error and need to change the SCI Socket config files
 then it is necessary to use a program ksocketconfig to change the
      shell> cd /opt/DIS/util
      shell> ./ksocketconfig -f
 To check that SCI Sockets are actually used you can use a test program
 `latency_bench' which needs to have a server component and clients can
 connect to the server to test the latency, whether SCI is enabled is
 very clear from the latency you get. Before you use those programs you
 also need to set the LD_PRELOAD variable in the same manner as shown
 To set up a server use the command
      shell> cd /opt/DIS/bin/socket
      shell> ./latency_bench -server
 To run a client use the following command
      shell> cd /opt/DIS/bin/socket
      shell> ./latency_bench -client hostname_of_server
 The SCI Socket configuration is completed. MySQL Cluster is now ready
 to use both SCI Sockets and the SCI transporter documented in 
 MySQL Cluster SCI Definition.
 The next step is to start-up MySQL Cluster. To enable usage of SCI
 Sockets it is necessary to set the environment variable LD_PRELOAD
 before starting the ndbd, mysqld and ndb_mgmd processes to use SCI
 Sockets. The LD_PRELOAD variable should point to the kernel library for
 SCI Sockets.
 So as an example to start up ndbd in a bash-shell use the following
      bash-shell> export LD_PRELOAD=/opt/DIS/lib/
      bash-shell> ndbd
 From a tcsh environment the same thing would be accomplished with the
 following commands.
      tcsh-shell> setenv LD_PRELOAD=/opt/DIS/lib/
      tcsh-shell> ndbd
 Noteworthy here is that MySQL Cluster can only use the kernel variant of
 SCI Sockets.
Info Catalog ( MySQL Cluster Interconnects ( MySQL Cluster Interconnects ( Performance figures
automatically generated byinfo2html