(mysql.info.gz) SCI Sockets
Info Catalog
(mysql.info.gz) MySQL Cluster Interconnects
(mysql.info.gz) MySQL Cluster Interconnects
(mysql.info.gz) Performance figures
16.6.1 Configuring MySQL Cluster to use SCI Sockets
---------------------------------------------------
In this section we will show how one can use a cluster configured for
normal TCP/IP communication to instead use SCI Sockets. Prerequisites
for doing this is that the machines to communicate needs to be equipped
with SCI cards. This documentation is based on the SCI Socket version
2.3.0 as of 1 october 2004.
To use SCI Sockets one can use any version of MySQL Cluster. The tests
were performed on an early 4.1.6 version. No special builds are needed
since it uses normal socket calls which is the normal configuration
set-up for MySQL Cluster. SCI Sockets are only supported on Linux 2.4
and 2.6 kernels at the moment. SCI Transporters works on more OS's
although only Linux 2.4 have been verified.
There are essentially four things needed to enable SCI Sockets. First
it is necessary to build the SCI Socket libraries. Second the SCI
Socket kernel libraries need to be installed. Third one or two
configuration files needs to be installed. At last the SCI Socket
kernel library needs to be enabled either for the entire machine or for
the shell where the MySQL Cluster processes are started from. This
process needs to be repeated for each machine in cluster which will use
SCI Sockets to communicate.
Two packages need to be retrieved to get SCI Sockets working. The first
package builds the libraries which SCI Sockets are built upon and the
second is the actual SCI Socket libraries. Currently the distribution
is only in source code format.
The latest versions of these packages is currently found at. Check
http://www.dolphinics.no/support/downloads.html
for latest versions.
http://www.dolphinics.no/ftp/source/DIS_GPL_2_5_0_SEP_10_2004.tar.gz
http://www.dolphinics.no/ftp/source/SCI_SOCKET_2_3_0_OKT_01_2004.tar.gz
The next step is to unpack those directories, SCI Sockets is unpacked
below the DIS code. Then the code base is compiled. The example below
shows the commands used in Linux/x86 to perform this.
shell> tar xzf DIS_GPL_2_5_0_SEP_10_2004.tar.gz
shell> cd DIS_GPL_2_5_0_SEP_10_2004/src/
shell> tar xzf ../../SCI_SOCKET_2_3_0_OKT_01_2004.tar.gz
shell> cd ../adm/bin/Linux_pkgs
shell> ./make_PSB_66_release
If the build is made on an Opteron box and is to use the 64 bit
extensions then use make_PSB_66_X86_64_release instead, if the build is
made on an Itanium box then use make_PSB_66_IA64_release instead. The
X86-64 variant should work for Intel EM64T architectures but no known
tests of this exists yet.
After building the code base it has been put into a zipped tar filed DIS
and OS and date. It is time to install the package in the proper place.
In this example we will place the installation in /opt/DIS. These
actions will most likely require you to log in as root-user.
shell> cp DIS_Linux_2.4.20-8_181004.tar.gz /opt/
shell> cd /opt
shell> tar xzf DIS_Linux_2.4.20-8_181004.tar.gz
shell> mv DIS_Linux_2.4.20-8_181004 DIS
Once all the libraries and binaries are in their proper place we need to
ensure that SCI cards gets proper node identities within the SCI
address space. Since SCI is a networking gear it is necessary to
decide on the network structure at first.
There are three types of network structures, the first is a simple
one-dimensional ring, the second uses SCI switch(es) with one ring per
switch port and finally there are 2D/3D torus. Each has its standard of
providing node ids.
A simple ring uses simply node ids displaced by 4.
4, 8, 12, ....
The next possibility uses switch(es). The SCI switch has 8 ports. On
each port it is possible to place a ring. It is here necessary to
ensure that the rings on the switch uses different node id spaces. So
the first port uses node ids below 64 and the next 64 node ids are
allocated for the next port and so forth.
4,8, 12, ... , 60 Ring on first port
68, 72, .... , 124 Ring on second port
132, 136, ..., 188 Ring on third port
..
452, 456, ..., 508 Ring on the eight port
2D/3D torus network structures takes into account where each node is in
each dimension, increment by 4 for each node in the first dimension, by
64 in the second dimension and by 1024 in the third dimension. Please
look in the Dolphin for more thorough documentation on this.
In our testing we have used switches. Most of the really big cluster
installations uses 2D/3D torus. The extra feature which switches provide
is that with dual SCI cards and dual switches we can easily build a
redundant network where failover times on the SCI network is around 100
microseconds. This feature is supported by the SCI transporter and is
currently also developed for the SCI Socket implementation.
Failover for 2D/3D torus is also possible but requires sending out new
routing indexes to all nodes. Even this will complete in around 100
milliseconds and should be ok for most high-availability cases.
By placing the NDB nodes in proper places in the switched architecture
it is possible to use 2 switches to build a structure where 16
computers can be interconnected and no single failure can hamper more
than one computer. With 32 computers and 2 switches it is possible to
configure the cluster in such a manner that no single failure can
hamper more than two nodes and in this case it is also known which pair
will be hit. Thus by placing those two in separate NDB node groups it
is possible to build a safe MySQL Cluster installation. We won't go
into details in how this is done, since it is likely to be only of
interest for users wanting to go real deep into this.
To set the node id of an SCI card use the following command still being
in the `/opt/DIS/sbin' directory. -c 1 refers to the number of the SCI
card, where 1 is this number if only 1 card is in the machine. In this
case use adapter 0 always (set by -a 0). 68 is the node id set in this
example.
shell> ./sciconfig -c 1 -a 0 -n 68
In case you have several SCI cards in your machine the only safe to
discover which card has which slot is by issuing the following command
shell> ./sciconfig -c 1 -gsn
This will give the serial number which can be found at the back of the
SCI card and on the card itself. Do this then for -c 2 and onwards as
many cards there are in the machine. This will identify which cards
uses which id. Then set node ids for all cards.
We have installed the necessary libraries and binaries. We have also
set the SCI node ids. The next step is to set the mapping from
hostnames (or IP addresses) to SCI node ids.
The configuration file for SCI Sockets is to be placed in the file
`/etc/sci/scisock.conf'. This file contains a mapping from hostnames
(or IP addresses) to SCI node ids. The SCI node id will map the hostname
to communicate through the proper SCI card. Below is a very simple such
configuration file.
#host #nodeId
alpha 8
beta 12
192.168.10.20 16
It is also possible to limit this configuration to only apply for a
subset of the ports of these hostnames. To do this another
configuration is used which is placed in `/etc/sci/scisock_opt.conf'.
#-key -type -values
EnablePortsByDefault yes
EnablePort tcp 2200
DisablePort tcp 2201
EnablePortRange tcp 2202 2219
DisablePortRange tcp 2220 2231
We are ready to install the drivers. We need to first install the
low-level drivers and then the SCI Socket driver.
shell> cd DIS/sbin/
shell> ./drv-install add PSB66
shell> ./scisocket-install add
If desirable one can check the installation by invoking a script which
checks that all nodes in the SCI Socket config files are accessible.
shell> cd /opt/DIS/sbin/
shell> ./status.sh
If you discover an error and need to change the SCI Socket config files
then it is necessary to use a program ksocketconfig to change the
configuration.
shell> cd /opt/DIS/util
shell> ./ksocketconfig -f
To check that SCI Sockets are actually used you can use a test program
`latency_bench' which needs to have a server component and clients can
connect to the server to test the latency, whether SCI is enabled is
very clear from the latency you get. Before you use those programs you
also need to set the LD_PRELOAD variable in the same manner as shown
below.
To set up a server use the command
shell> cd /opt/DIS/bin/socket
shell> ./latency_bench -server
To run a client use the following command
shell> cd /opt/DIS/bin/socket
shell> ./latency_bench -client hostname_of_server
The SCI Socket configuration is completed. MySQL Cluster is now ready
to use both SCI Sockets and the SCI transporter documented in
MySQL Cluster SCI Definition.
The next step is to start-up MySQL Cluster. To enable usage of SCI
Sockets it is necessary to set the environment variable LD_PRELOAD
before starting the ndbd, mysqld and ndb_mgmd processes to use SCI
Sockets. The LD_PRELOAD variable should point to the kernel library for
SCI Sockets.
So as an example to start up ndbd in a bash-shell use the following
commands.
bash-shell> export LD_PRELOAD=/opt/DIS/lib/libkscisock.so
bash-shell> ndbd
From a tcsh environment the same thing would be accomplished with the
following commands.
tcsh-shell> setenv LD_PRELOAD=/opt/DIS/lib/libkscisock.so
tcsh-shell> ndbd
Noteworthy here is that MySQL Cluster can only use the kernel variant of
SCI Sockets.
Info Catalog
(mysql.info.gz) MySQL Cluster Interconnects
(mysql.info.gz) MySQL Cluster Interconnects
(mysql.info.gz) Performance figures
automatically generated byinfo2html