(mysql.info.gz) MySQL Cluster DB Definition

(mysql.info.gz) MySQL Cluster MGM Definition
(mysql.info.gz) MySQL Cluster Config File
(mysql.info.gz) MySQL Cluster API Definition
 
 16.3.4.5 Defining MySQL Cluster Storage Nodes
 .............................................
 
 The `[DB]' section (or its alias `[NDBD]') is used to configure the
 behavior of the storage nodes.  There are many parameters specified that
 controls the buffer sizes, pool sizes, timeout parameters and so forth.
 The only mandatory parameter is either `ExecuteOnComputer' or
 `HostName' and the parameter `NoOfReplicas' which need to be defined in
 the `[DB DEFAULT]' section. Most parameters should be set in the `[DB
 DEFAULT]' section. Only parameters explicitly stated as possible to
 have local values are allowed to be changed in the `[DB]' section.
 `HostName', `Id' and `ExecuteOnComputer' needs to be defined in the
 local `[DB]' section.
 
 The `Id' value (that is, the identification of the storage node) can be
 allocated when the node is started. It is possible to assign a node ID
 in the configuration file.
 
 For each parameter it is possible to use k, M, or G as a suffix to
 indicate units of 1024, 1024*1024, or 1024*1024*1024. For example, 100k
 means 102400. Parameters and values are case sensitive.
 
 `[DB]Id'
      This identity is the node ID used as the address of the node in all
      cluster internal messages. This is an integer between 1 and 63.
      Each node in the cluster has a unique identity.
 
 `[DB]ExecuteOnComputer'
      This is referring to one of the computers defined in the computer
      section.
 
 `[DB]HostName'
      This parameter is similar to specifying a computer to execute on.
      It defines the host name of the computer the storage node is to
      reside on. Either this parameter or `ExecuteOnComputer' is
      required.
 
 `[DB]ServerPort'
      Each node in the cluster will use one port as the port other nodes
      use to connect the transporters to each other. This port is used
      also for non-TCP transporters in the connection setup phase. The
      default port will be calculated to ensure that no nodes on the
      same computer receive the same port number.
 
 `[DB]NoOfReplicas'
      This parameter can be set only in the `[DB DEFAULT]' section
      because it is a global parameter. It defines the number of
      replicas for each table stored in the cluster. This parameter also
      specifies the size of node groups. A node group is a set of nodes
      that all store the same information.
 
      Node groups are formed implicitly. The first node group is formed
      by the storage nodes with the lowest node identities. And the next
      by the next lowest node identities. As an example presume we have
      4 storage nodes and `NoOfReplicas' is set to 2. The four storage
      nodes have node IDs 2, 3, 4 and 5. Then the first node group will
      be formed by node 2 and node 3. The second node group will be
      formed by node 4 and node 5. It is important to configure the
      cluster in such a manner such that nodes in the same node groups
      are not placed on the same computer. This would cause a single HW
      failure to cause a cluster crash.
 
      If no node identities are provided then the order of the storage
      nodes will be the determining factor for the node group. The
      actual node group assigned will be printed by the `SHOW' command
      in the management client.
 
      There is no default value and the maximum number is 4.
 
 `[DB]DataDir'
      This parameter specifies the directory where trace files, log
      files, pid files and error logs are placed.
 
 `[DB]FileSystemPath'
      This parameter specifies the directory where all files created for
      metadata, REDO logs, UNDO logs and data files are placed. The
      default value is to use the same directory as the `DataDir'. The
      directory must be created before starting the `ndbd' process.
 
      If you use the recommended directory hierarchy, you will use a
      directory `/var/lib/mysql-cluster'. Under this directory a
      directory `ndb_2_fs' will be created (if node ID was 2) which will
      be the file system for that node.
 
 `[DB]BackupDataDir'
      It is possible also to specify the directory where backups will be
      placed.  By default, the directory `FileSystemPath/'`BACKUP' will
      be chosen.
 
 `DataMemory' and `IndexMemory' are the parameters that specify the size
 of memory segments used to store the actual records and their indexes.
 It is important to understand how `DataMemory' and `IndexMemory' are
 used to understand how to set these parameters.  For most uses, they
 need to be updated to reflect the usage of the cluster.
 
 `[DB]DataMemory'
      This parameter is one of the most important parameters because it
      defines the space available to store the actual records in the
      database. The entire `DataMemory' will be allocated in memory so
      it is important that the machine contains enough memory to handle
      the `DataMemory' size.
 
      The `DataMemory' is used to store two things. It stores the actual
      records. Each record is currently of fixed size. So `VARCHAR'
      columns are stored as fixed size columns. There is an overhead on
      each record of 16 bytes normally. Additionally each record is
      stored in a 32KB page with 128 byte page overhead.  There will
      also be a small amount of waste for each page because records are
      only stored in one page.  The maximum record size for the columns
      currently is 8052 bytes.
 
      The `DataMemory' is also used to store ordered indexes. Ordered
      indexes uses about 10 bytes per record. Each record in the table
      is always represented in the ordered index.
 
      The `DataMemory' consists of 32KB pages. These pages are allocated
      to partitions of the tables. Each table is normally partitioned
      with the same number of partitions as there are storage nodes in
      the cluster.  Thus for each node there are the same number of
      partitions (=fragments) as the `NoOfReplicas' is set to.  Once a
      page has been allocated to a partition it is currently not
      possible to bring it back to the pool of free pages. The method to
      restore pages to the pool is by deleting the table. Performing a
      node recovery also will compress the partition because all records
      are inserted into an empty partition from another live node.
 
      Another important aspect is that the `DataMemory' also contains
      UNDO information for records. For each update of a record a copy
      record is allocated in the `DataMemory'. Also each copy record
      will also have an instance in the ordered indexes of the table.
      Unique hash indexes are updated only when the unique index columns
      are updated and in that case a new entry in the index table is
      inserted and at commit the old entry is deleted. Thus it is
      necessary also to allocate memory to be able to handle the largest
      transactions which are performed in the cluster.
 
      Performing large transactions has no advantage in MySQL Cluster
      other than the consistency of using transactions which is the
      whole idea of transactions. It is not faster and consumes large
      amounts of memory.
 
      The default `DataMemory' size is 80MB. The minimum size is 1MB.
      There is no maximum size, but in reality the maximum size has to
      be adapted so that the process doesn't start swapping when using
      the maximum size of the memory.
 
 `[DB]IndexMemory'
      The `IndexMemory' is the parameter that controls the amount of
      storage used for hash indexes in MySQL Cluster. Hash indexes are
      always used for primary key indexes, unique indexes, and unique
      constraints.  Actually when defining a primary key and a unique
      index there will be two indexes created in MySQL Cluster.  One
      index is a hash index which is used for all tuple accesses and
      also for lock handling. It is also used to ensure unique
      constraints.
 
      The size of the hash index is 25 bytes plus the size of the
      primary key.  For primary keys larger than 32 bytes another 8
      bytes is added for some internal references.
 
      Thus for a table defined as
 
           CREATE TABLE example
           (
               a INT NOT NULL,
               b INT NOT NULL,
               c INT NOT NULL,
               PRIMARY KEY(a),
               UNIQUE(b)
           ) ENGINE=NDBCLUSTER;
 
      We will have 12 bytes overhead (having no nullable columns saves 4
      bytes of overhead) plus 12 bytes of data per record. In addition
      we will have two ordered indexes on a and b consuming about 10
      bytes each per record. We will also have a primary key hash index
      in the base table with roughly 29 bytes per record. The unique
      constraint is implemented by a separate table with b as primary
      key and a as a column. This table will consume another 29 bytes of
      index memory per record in the table and also 12 bytes of overhead
      plus 8 bytes of data in the record part.
 
      Thus for one million records, we will need 58MB of index memory to
      handle the hash indexes for the primary key and the unique
      constraint. For the `DataMemory' part we will need 64MB of memory
      to handle the records of the base table and the unique index table
      plus the two ordered index tables.
 
      The conclusion is that hash indexes takes up a fair amount of
      memory space but in return they provide very fast access to the
      data. They are also used in MySQL Cluster to handle uniqueness
      constraints.
 
      Currently the only partitioning algorithm is hashing and the
      ordered indexes are local to each node and can thus not be used to
      handle uniqueness constraints in the general case.
 
      An important point for both `IndexMemory' and `DataMemory' is that
      the total database size is the sum of all `DataMemory' and
      `IndexMemory' in each node group. Each node group is used to store
      replicated information, so if there are four nodes with 2 replicas
      there will be two node groups and thus the total `DataMemory'
      available is 2*`DataMemory' in each of the nodes.
 
      Another important point is about changes of `DataMemory' and
      `IndexMemory'.  First of all, it is highly recommended to have the
      same amount of `DataMemory' and `IndexMemory' in all nodes. Since
      data is distributed evenly over all nodes in the cluster the size
      available is no better than the smallest sized node in the cluster
      times the number of node groups.
 
      `DataMemory' and `IndexMemory' can be changed, but it is dangerous
      to decrease them because that can easily lead to a node that will
      not be able to restart or even a cluster not being able to restart
      since there is not enough memory space for the tables needed to
      restore into the starting node. Increasing them should be quite
      okay, but it is recommended that such upgrades are performed in
      the same manner as a software upgrade where first the
      configuration file is updated, then the management server is
      restarted and then one storage node at a time is restarted by
      command.
 
      More `IndexMemory' is not used due to updates but inserts are
      inserted immediately and deletes are not deleted until the
      transaction is committed.
 
      The default `IndexMemory' size is 18MB. The minimum size is 1MB.
 
 The next three parameters are important because they affect the number
 of parallel transactions and the sizes of transactions that can be
 handled by the system. `MaxNoOfConcurrentTransactions' sets the number
 of parallel transactions possible in a node and
 `MaxNoOfConcurrentOperations' sets the number of records that can be in
 update phase or locked simultaneously.
 
 Both of these parameters and particularly `MaxNoOfConcurrentOperations'
 are likely targets for users setting specific values and not using the
 default value. The default value is set for systems using small
 transactions and to ensure not using too much memory in the default
 case.
 
 `[DB]MaxNoOfConcurrentTransactions'
      For each active transaction in the cluster there needs to be also a
      transaction record in one of the nodes in the cluster. The role of
      transaction coordination is spread among the nodes and thus the
      total number of transactions records in the cluster is the amount
      in one times the number of nodes in the cluster.
 
      Actually transaction records are allocated to MySQL servers,
      normally there is at least one transaction record allocated in the
      cluster per connection that uses or have used a table in the
      cluster. Thus one should ensure that there is more transaction
      records in the cluster than there are concurrent connections to
      all MySQL servers in the cluster.
 
      This parameter has to be the same in all nodes in the cluster.
 
      Changing this parameter is never safe and can cause a cluster
      crash. When a node crashes one of the node (actually the oldest
      surviving node) will build up the transaction state of all
      transactions ongoing in the crashed node at the time of the crash.
      It is thus important that this node has as many transaction
      records as the failed node.
 
      The default value for this parameter is 4096.
 
 `[DB]MaxNoOfConcurrentOperations'
      This parameter is likely to be subject for change by users. Users
      performing only short, small transactions don't need to set this
      parameter very high. Applications desiring to be able to perform
      rather large transactions involving many records need to set this
      parameter higher.
 
      For each transaction that updates data in the cluster it is
      required to have operation records. There are operation records
      both in the transaction coordinator and in the nodes where the
      actual updates are performed.
 
      The operation records contain state information needed to be able
      to find UNDO records for rollback, lock queues, and much other
      state information.
 
      To dimension the cluster to handle transactions where one million
      records are updated simultaneously one should set this parameter
      to one million divided by the number of nodes. Thus for a cluster
      with four storage nodes one should set this parameter to 250000.
 
      Also read queries which set locks use up operation records. Some
      extra space is allocated in the local nodes to cater for cases
      where the distribution is not perfect over the nodes.
 
      When queries translate into using the unique hash index there will
      actually be two operation records used per record in the
      transaction. The first one represents the read in the index table
      and the second handles the operation on the base table.
 
      The default value for this parameter is 32768.
 
      This parameter actually handles two parts that can be configured
      separately.  The first part specifies how many operation records
      are to be placed in the transaction coordinator part. The second
      part specifies how many operation records that are to be used in
      the local database part.
 
      If a very big transaction is performed on a 8-node cluster then
      this will need as many operation records in the transaction
      coordinator as there are reads, updates, deletes involved in the
      transaction. The transaction will however spread the operation
      records of the actual reads, updates, and inserts over all eight
      nodes. Thus if it is necessary to configure the system for one
      very big transaction then it is a good idea to configure those
      separately. `MaxNoOfConcurrentOperations' will always be used to
      calculate the number of operation records in the transaction
      coordinator part of the node.
 
      It is also important to have an idea of the memory requirements
      for those operation records. In MySQL 4.1.5, operation records
      consume about 1KB per record. This figure will shrink in future
      5.x versions.
 
 `[DB]MaxNoOfLocalOperations'
      By default this parameter is calculated as 1.1 *
      `MaxNoOfConcurrentOperations' which fits systems with many
      simultaneous, not very large transactions. If the configuration
      needs to handle one very large transaction at a time and there are
      many nodes then it is a good idea to configure this separately.
 
 The next set of parameters are used for temporary storage in the midst
 of executing a part of a query in the cluster. All of these records
 will have been released when the query part is completed and is waiting
 for the commit or rollback.
 
 Most of the defaults for these parameters will be okay for most users.
 Some high-end users might want to increase those to enable more
 parallelism in the system and some low-end users might want to decrease
 them to save memory.
 
 `[DB]MaxNoOfConcurrentIndexOperations'
      For queries using a unique hash index another set of operation
      records are temporarily used in the execution phase of the query.
      This parameter sets the size of this pool. Thus this record is
      only allocated while executing a part of a query, as soon as this
      part has been executed the record is released. The state needed to
      handle aborts and commits is handled by the normal operation
      records where the pool size is set by the parameter
      `MaxNoOfConcurrentOperations'.
 
      The default value of this parameter is 8192. Only in rare cases of
      extremely high parallelism using unique hash indexes should this
      parameter be necessary to increase. To decrease could be performed
      for memory savings if the DBA is certain that such high
      parallelism is not occurring in the   cluster.
 
 `[DB]MaxNoOfFiredTriggers'
      The default value of `MaxNoOfFiredTriggers' is 4000. Normally this
      value should be sufficient for most systems. In some cases it
      could be decreased if the DBA feels certain the parallelism in the
      cluster is not so high.
 
      This record is used when an operation is performed that affects a
      unique hash index. Updating a column that is part of a unique hash
      index or inserting/deleting a record in a table with unique hash
      indexes will fire an insert or delete in the index table. This
      record is used to represent this index table operation while its
      waiting for the original operation that fired it to complete.
      Thus it is short lived but can still need a fair amount of records
      in its pool for temporary situations with many parallel write
      operations on a base table containing a set of unique hash indexes.
 
 `[DB]TransactionBufferMemory'
      This parameter is also used for keeping fired operations to update
      index tables. This part keeps the key and column information for
      the fired operations. It should be very rare that this parameter
      needs to be updated.
 
      Also normal read and write operations use a similar buffer. This
      buffer is even more short term in its usage so this is a compile
      time parameter set to 4000*128 bytes (500KB). The parameter is
      `ZATTRBUF_FILESIZE' in DBTC.HPP. A similar buffer for key info
      exists which contains 4000*16 bytes, 62.5KB of buffer space. The
      parameter in this case is `ZDATABUF_FILESIZE' in DBTC.HPP. `Dbtc'
      is the module for handling the transaction coordination.
 
      Similar parameters exist in the `Dblqh' module taking care of the
      reads and updates where the data is located. In `Dblqh.hpp' with
      `ZATTRINBUF_FILESIZE' set to 10000*128 bytes (1250KB) and
      `ZDATABUF_FILE_SIZE', set to 10000*16 bytes (roughly 156KB) of
      buffer space. No known instances of that any of those compile time
      limits haven't been big enough has been reported so far or
      discovered by any of our extensive test suites.
 
      The default size of the `TransactionBufferMemory' is 1MB.
 
 `[DB]MaxNoOfConcurrentScans'
      This parameter is used to control the amount of parallel scans
      that can be performed in the cluster.  Each transaction
      coordinator can handle the amount of parallel scans defined by
      this parameter. Each scan query is performed by scanning all
      partitions in parallel. Each partition scan will use a scan record
      in the node where the partition is located. The number of those
      records is the size of this parameter times the number of nodes so
      that the cluster should be able to sustain maximum number of scans
      in parallel from all nodes in the cluster.
 
      Scans are performed in two cases. The first case is when no hash
      or ordered indexes exists to handle the query. In this case the
      query is executed by performing a full table scan. The second case
      is when there is no hash index to support the query but there is
      an ordered index. Using the ordered index means executing a
      parallel range scan. Since the order is only kept on the local
      partitions it is necessary to perform the index scan on all
      partitions.
 
      The default value of `MaxNoOfConcurrentScans' is 256. The maximum
      value is 500.
 
      This parameter will always specify the number of scans possible in
      the transaction coordinator. If the number of local scan records
      is not provided it is calculated as the product of
      `MaxNoOfConcurrentScans' and the number of storage nodes in the
      system.
 
 `[DB]MaxNoOfLocalScans'
      Possible to specify the number of local scan records if many scans
      are not fully parallelized.
 
 `[DB]BatchSizePerLocalScan'
      This parameter is used to calculate the number of lock records
      which needs to be there to handle many concurrent scan operations.
 
      The default value is 64 and this value has a strong connection to
      the `ScanBatchSize' defined in the API nodes.
 
 `[DB]LongMessageBuffer'
      This is an internal buffer used for message passing internally in
      the node and for messages between nodes in the system. It is
      highly unlikely that anybody would need to change this parameter
      but it is configurable.  By default it is set to 1MB.
 
 `[DB]NoOfFragmentLogFiles'
      This is an important parameter that states the size of the REDO
      log files in the node. REDO log files are organized in a ring such
      that it is important that the tail and the head doesn't meet.
      When the tail and head have come to close the each other the node
      will start aborting all updating transactions because there is no
      room for the log records.
 
      REDO log records aren't removed until three local checkpoints have
      completed since the log record was inserted. The speed of
      checkpoint is controlled by a set of other parameters so these
      parameters are all glued together.
 
      The default parameter value is 8, which means 8 sets of 4 16MB
      files. Thus in total 512MB. Thus the unit is 64MB of REDO log
      space. In high update scenarios this parameter needs to be set
      very high.  Test cases where it has been necessary to set it to
      over 300 have been performed.
 
      If the checkpointing is slow and there are so many writes to the
      database that the log files are full and the log tail cannot be
      cut for recovery reasons then all updating transactions will be
      aborted with internal error code 410 which will be translated to
      `Out of log file space temporarily'.  This condition will prevail
      until a checkpoint has completed and the log tail can be moved
      forward.
 
 `[DB]MaxNoOfSavedMessages'
      This parameter sets the maximum number of trace files that will be
      kept before overwriting old trace files. Trace files are generated
      when the node crashes for some reason.
 
      The default is 25 trace files.
 
 The next set of parameters defines the pool sizes for metadata objects.
 It is necessary to define the maximum number of attributes, tables,
 indexes, and trigger objects used by indexes, events and replication
 between clusters.
 
 `[DB]MaxNoOfAttributes'
      This parameter defines the number of attributes that can be
      defined in the cluster.
 
      The default value of this parameter is 1000. The minimum value is
      32 and there is no maximum. Each attribute consumes around 200
      bytes of storage in each node because metadata is fully replicated
      in the servers.
 
 `[DB]MaxNoOfTables'
      A table object is allocated for each table, for each unique hash
      index, and for each ordered index.  This parameter sets the
      maximum number of table objects in the cluster.
 
      For each attribute that has a `BLOB' data type an extra table is
      used to store most of the `BLOB' data. These tables also must be
      taken into account when defining the number of tables.
 
      The default value of this parameter is 128. The minimum is 8 and
      the maximum is 1600. Each table object consumes around 20KB in
      each node.
 
 `[DB]MaxNoOfOrderedIndexes'
      For each ordered index in the cluster, objects are allocated to
      describe what it is indexing and its storage parts. By default
      each index defined will have an ordered index also defined. Unique
      indexes and primary key indexes have both an ordered index and a
      hash index.
 
      The default value of this parameter is 128. Each object consumes
      around 10KB of data per node.
 
 `[DB]MaxNoOfUniqueHashIndexes'
      For each unique index (not for primary keys) a special table is
      allocated that maps the unique key to the primary key of the
      indexed table. By default there will be an ordered index also
      defined for each unique index. To avoid this, you must use the
      `USING HASH' option in the unique index definition.
 
      The default value is 64. Each index will consume around 15KB per
      node.
 
 `[DB]MaxNoOfTriggers'
      For each unique hash index an internal update, insert and delete
      trigger is allocated. Thus three triggers per unique hash index.
      Ordered indexes use only one trigger object. Backups also use
      three trigger objects for each normal table in the cluster. When
      replication between clusters is supported it will also use
      internal triggers.
 
      This parameter sets the maximum number of trigger objects in the
      cluster.
 
      The default value of this parameter is 768.
 
 `[DB]MaxNoOfIndexes'
      This parameter was deprecated in MySQL 4.1.5. You should use
      `MaxNoOfOrderedIndexes' and `MaxNoOfUniqueHashIndexes' instead.
 
      This parameter is only used by unique hash indexes. There needs to
      be one record in this pool for each unique hash index defined in
      the cluster.
 
      The default value of this parameter is 128.
 
 There is a set of boolean parameters affecting the behavior of storage
 nodes. Boolean parameters can be specified to true by setting it to Y
 or 1 and to false by setting it to N or 0.
 
 `[DB]LockPagesInMainMemory'
      For a number of operating systems such as Solaris and Linux it is
      possible to lock a process into memory and avoid all swapping
      problems. This is an important feature to provide real-time
      characteristics of the cluster.
 
      The default is that this feature is not enabled.
 
 `[DB]StopOnError'
      This parameter states whether the process is to exit on error
      condition or whether it is perform an automatic restart.
 
      The default is that this feature is enabled.
 
 `[DB]Diskless'
      In the internal interfaces it is possible to set tables as
      diskless tables meaning that the tables are not checkpointed to
      disk and no logging occur.  They only exist in main memory. The
      tables will still exist after a crash but not the records in the
      table.
 
      This feature makes the entire cluster `Diskless', in this case even
      the tables doesn't exist anymore after a crash. Enabling this
      feature can be done by either setting it to Y or 1.
 
      When this feature is enabled, backups will be performed but will
      not be stored because there is no "disk". In future releases it is
      likely to make the backup diskless a separate configurable
      parameter.
 
      The default is that this feature is not enabled.
 
 `[DB]RestartOnErrorInsert'
      This feature is only accessible when building the debug version
      where it is possible to insert errors in the execution of various
      code parts to test failure cases.
 
      The default is that this feature is not enabled.
 
 There are quite a few parameters specifying timeouts and time intervals
 between various actions in the storage nodes. Most of the timeouts are
 specified in milliseconds with a few exceptions which will be mentioned
 below.
 
 `[DB]TimeBetweenWatchDogCheck'
      To ensure that the main thread doesn't get stuck in an eternal loop
      somewhere there is a watch dog thread which checks the main
      thread. This parameter states the number of milliseconds between
      each check. After three checks and still being in the same state
      the process is stopped by the watch dog thread.
 
      This parameter can easily be changed and can be different in the
      nodes although there seems to be little reason for such a
      difference.
 
      The default timeout is 4000 milliseconds (4 seconds).
 
 `[DB]StartPartialTimeout'
      This parameter specifies the time that the cluster will wait for
      all storage nodes to come up before the algorithm to start the
      cluster is invoked. This time out is used to avoid starting only a
      partial cluster if possible.
 
      The default value is 30000 milliseconds (30 seconds). 0 means
      eternal time out. Thus only start if all nodes are available.
 
 `[DB]StartPartitionedTimeout'
      If the cluster is ready start after waiting `StartPartialTimeout'
      but is still in a possibly partitioned state one waits until also
      this timeout has passed.
 
      The default timeout is 60000 milliseconds (60 seconds).
 
 `[DB]StartFailureTimeout'
      If the start is not completed within the time specified by this
      parameter the node start will fail. Setting this parameter to 0
      means no time out is applied on the time to start the cluster.
 
      The default value is 60000 milliseconds (60 seconds). For storage
      nodes containing large data sets this parameter needs to be
      increased because it could very well take 10-15 minutes to perform
      a node restart of a storage node with a few gigabytes of data.
 
 `[DB]HeartbeatIntervalDbDb'
      One of the main methods of discovering failed nodes is by
      heartbeats. This parameter states how often heartbeat signals are
      sent and how often to expect to receive them. After missing three
      heartbeat intervals in a row, the node is declared dead. Thus the
      maximum time of discovering a failure through the heartbeat
      mechanism is four times the heartbeat interval.
 
      The default heartbeat interval is 1500 milliseconds (1.5 seconds).
      This parameter must not be changed drastically. If one node uses
      5000 milliseconds and the node watching it uses 1000 milliseconds
      then obviously the node will be declared dead very quickly. So
      this parameter can be changed in small steps during an online
      software upgrade but not in large steps.
 
 `[DB]HeartbeatIntervalDbApi'
      In a similar manner each storage node sends heartbeats to each of
      the connected MySQL servers to ensure that they behave properly.
      If a MySQL server doesn't send a heartbeat in time (same algorithm
      as for storage node with three heartbeats missed causing failure)
      it is declared down and all ongoing transactions will be completed
      and all resources will be released and the MySQL server cannot
      reconnect until the completion of all activities started by the
      previous MySQL instance has been completed.
 
      The default interval is 1500 milliseconds. This interval can be
      different in the storage node because each storage node
      independently of all other storage nodes watches the MySQL servers
      connected to it.
 
 `[DB]TimeBetweenLocalCheckpoints'
      This parameter is an exception in that it doesn't state any time
      to wait before starting a new local checkpoint. This parameter is
      used to ensure that in a cluster where not so many updates are
      taking place that we don't perform local checkpoints. In most
      clusters with high update rates it is likely that a new local
      checkpoint is started immediately after the previous was completed.
 
      The size of all write operations executed since the start of the
      previous local checkpoints is added.  This parameter is specified
      as the logarithm of the number of words. So the default value 20
      means 4MB of write operations, 21 would mean 8MB and so forth up
      until the maximum value 31 which means 8GB of write operations.
 
      All the write operations in the cluster are added together.
      Setting it to 6 or lower means that local checkpoints will execute
      continuously without any wait between them independent of the
      workload in the cluster.
 
 `[DB]TimeBetweenGlobalCheckpoints'
      When a transaction is committed it is committed in main memory in
      all nodes where mirrors of the data existed. The log records of
      the transaction are not forced to disk as part of the commit
      however. The reasoning here is that having the transaction safely
      committed in at least two independent computers should be meeting
      standards of durability.
 
      At the same time it is also important to ensure that even the
      worst of cases when the cluster completely crashes is handled
      properly. To ensure this all transactions in a certain interval is
      put into a global checkpoint. A global checkpoint is very similar
      to a grouped commit of transactions. An entire group of
      transactions is sent to disk. Thus as part of the commit the
      transaction was put into a global checkpoint group. Later this
      groups log records are forced to disk and then the entire group of
      transaction is safely committed also on all computers disk storage
      as well.
 
      This parameter states the interval between global checkpoints. The
      default time is 2000 milliseconds.
 
 `[DB]TimeBetweenInactiveTransactionAbortCheck'
      Time-out handling is performed by checking each timer on each
      transaction every period of time in accordance with this
      parameter. Thus if this parameter is set to 1000 milliseconds,
      then every transaction will be checked for timeout once every
      second.
 
      The default for this parameter is 1000 milliseconds (1 second).
 
 `[DB]TransactionInactiveTimeout'
      If the transaction is currently not performing any queries but is
      waiting for further user input, this parameter states the maximum
      time that the   user can wait before the transaction is aborted.
 
      The default for this parameter is no timeout. For a real-time
      database that needs to control that no transaction keeps locks for
      a too long time this parameter should be set to a much smaller
      value. The unit is milliseconds.
 
 `[DB]TransactionDeadlockDetectionTimeout'
      When a transaction is involved in executing a query it waits for
      other nodes. If the other nodes doesn't respond it could depend on
      three things.  First, the node could be dead, second the operation
      could have entered a lock queue and finally the node requested to
      perform the action could be heavily overloaded.  This timeout
      parameter states how long the transaction coordinator will wait
      until it aborts the transaction when waiting for query execution
      of another node.
 
      Thus this parameter is important both for node failure handling
      and for deadlock detection. Setting it too high would cause a
      non-desirable behavior at deadlocks and node failures.
 
      The default time out is 1200 milliseconds (1.2 seconds).
 
 `[DB]NoOfDiskPagesToDiskAfterRestartTUP'
      When executing a local checkpoint the algorithm sends all data
      pages to disk during the local checkpoint.  Simply sending them
      there as quickly as possible will cause unnecessary load on both
      processors, networks, and disks. Thus to control the write speed
      this parameter specifies how many pages per 100 milliseconds is to
      be written. A page is here defined as 8KB.  The unit this
      parameter is specified in is thus 80KB per second. So setting it
      to 20 means writing 1.6MB of data pages to disk per second during
      a local checkpoint. Also writing of UNDO log records for data
      pages is part of this sum. Writing of index pages (see IndexMemory
      to understand what index pages are used for) and their UNDO log
      records is handled by the parameter
      `NoOfDiskPagesToDiskAfterRestartACC'. This parameter handles the
      limitation of writes from the `DataMemory'.
 
      So this parameter specifies how quickly local checkpoints will be
      executed.  This parameter is important in connection with
      `NoOfFragmentLogFiles', `DataMemory', `IndexMemory'.
 
      The default value is 40 (3.2MB of data pages per second).
 
 `[DB]NoOfDiskPagesToDiskAfterRestartACC'
      This parameter has the same unit as
      `NoOfDiskPagesToDiskAfterRestartTUP' but limits the speed of
      writing index pages from `IndexMemory'.
 
      The default value of this parameter is 20 (1.6MB per second).
 
 `[DB]NoOfDiskPagesToDiskDuringRestartTUP'
      This parameter specifies the same things as
      `NoOfDiskPagesToDiskAfterRestartTUP' and
      `NoOfDiskPagesToDiskAfterRestartACC', only it does it for local
      checkpoints executed in the node as part of a local checkpoint
      when the node is restarting. As part of all node restarts a local
      checkpoint is always performed. Since during a node restart it is
      possible to use a higher speed of writing to disk because fewer
      activities are performed in the node due to the restart phase.
 
      This parameter handles the `DataMemory' part.
 
      The default value is 40 (3.2MB per second).
 
 `[DB]NoOfDiskPagesToDiskDuringRestartACC'
      During Restart for `IndexMemory' part of local checkpoint.
 
      The default value is 20 (1.6MB per second).
 
 `[DB]ArbitrationTimeout'
      This parameter specifies the time that the storage node will wait
      for a response from the arbitrator when sending an arbitration
      message in the case of a split network.
 
      The default value is 1000 milliseconds (1 second).
 
 A number of new configuration parameters were introduced in MySQL 4.1.5.
 These correspond to values that previously were compile time
 parameters. The main reason for this is to enable the advanced user to
 have more control of the size of the process and adjust various buffer
 sizes according to his needs.
 
 All of these buffers are used as front-ends to the file system when
 writing log records of various kinds to disk. If the node runs with
 Diskless then these parameters can most definitely be set to their
 minimum values because all disk writes are faked as okay by the file
 system abstraction layer in the `NDB' storage engine.
 
 `[DB]UndoIndexBuffer'
      This buffer is used during local checkpoints. The `NDB' storage
      engine uses a recovery scheme based on a consistent checkpoint
      together with an operational REDO log. In order to produce a
      consistent checkpoint without blocking the entire system for
      writes, UNDO logging is done while performing the local
      checkpoint. The UNDO logging is only activated on one fragment of
      one table at a time. This optimization is possible because tables
      are entirely stored in main memory.
 
      This buffer is used for the updates on the primary key hash index.
      Inserts and deletes rearrange the hash index and the `NDB' storage
      engine writes UNDO log records that map all physical changes to an
      index page such that they can be undone at a system restart. It
      also logs all active insert operations at the start of a local
      checkpoint for the fragment.
 
      Reads and updates only set lock bits and update a header in the
      hash index entry. These changes are handled by the page write
      algorithm to ensure that these operations need no UNDO logging.
 
      This buffer is 2MB by default. The minimum value is 1MB. For most
      applications this is good enough. Applications doing extremely
      heavy inserts and deletes together with large transactions using
      large primary keys might need to extend this buffer.
 
      If this buffer is too small, the `NDB' storage engine issues the
      internal error code 677 which will be translated into "Index UNDO
      buffers overloaded".
 
 `[DB]UndoDataBuffer'
      This buffer has exactly the same role as the `UndoIndexBuffer' but
      is used for the data part. This buffer is used during local
      checkpoint of a fragment and inserts, deletes, and updates use the
      buffer.
 
      Since these UNDO log entries tend to be bigger and more things are
      logged, the buffer is also bigger by default. It is set to 16MB by
      default.  For some applications this might be too conservative and
      they might want to decrease this size, the minimum size is 1MB. It
      should be rare that applications need to increase this buffer
      size. If there is a need for this it is a good idea to check if
      the disks can actually handle the load that the update activity in
      the database causes. If they cannot then no size of this buffer
      will be big enough.
 
      If this buffer is too small and gets congested, the `NDB' storage
      engine issues the internal error code 891 which will be translated
      to "Data UNDO buffers overloaded".
 
 `[DB]RedoBuffer'
      All update activities also need to be logged. This enables a
      replay of these updates at system restart. The recovery algorithm
      uses a consistent checkpoint produced by a "fuzzy" checkpoint of
      the data together with UNDO logging of the pages. Then it applies
      the REDO log to play back all changes up until the time that will
      be restored in the system restart.
 
      This buffer is 8MB by default. The minimum value is 1MB.
 
      If this buffer is too small, the `NDB' storage engine issues the
      internal error code 1221 which will be translated into "REDO log
      buffers overloaded".
 
 For cluster management, it is important to be able to control the
 amount of log messages sent to stdout for various event types. The
 possible events will be listed in this manual soon.  There are 16
 levels possible from level 0 to level 15. Setting event reporting to
 level 15 means receiving all event reports of that category and setting
 it to 0 means getting no event reports in that category.
 
 The reason why most defaults are set to 0 and thus not causing any
 output to stdout is that the same message is sent to the cluster log in
 the management server. Only the startup message is by default generated
 to stdout.
 
 A similar set of levels can be set in management client to define what
 levels to record in the cluster log.
 
 `[DB]LogLevelStartup'
      Events generated during startup of the process.
 
      The default level is 1.
 
 `[DB]LogLevelShutdown'
      Events generated as part of graceful shutdown of a node.
 
      The default level is 0.
 
 `[DB]LogLevelStatistic'
      Statistical events such as how many primary key reads, updates,
      inserts and many other statistical information of buffer usage,
      and so forth.
 
      The default level is 0.
 
 `[DB]LogLevelCheckpoint'
      Events generated by local and global checkpoints.
 
      The default level is 0.
 
 `[DB]LogLevelNodeRestart'
      Events generated during node restart.
 
      The default level is 0.
 
 `[DB]LogLevelConnection'
      Events generated by connections between nodes in the cluster.
 
      The default level is 0.
 
 `[DB]LogLevelError'
      Events generated by errors and warnings in the cluster. These are
      errors not causing a node failure but still considered worth
      reporting.
 
      The default level is 0.
 
 `[DB]LogLevelInfo'
      Events generated for information about state of cluster and so
      forth.
 
      The default level is 0.
 
 There is a set of parameters defining memory buffers that are set aside
 for online backup execution.
 
 `[DB]BackupDataBufferSize'
      When executing a backup there are two buffers used for sending
      data to the disk. This buffer is used to fill in data recorded by
      scanning the tables in the node. When filling this to a certain
      level the pages are sent to disk. This level is specified by the
      `BackupWriteSize' parameter.  When sending data to the disk, the
      backup can continue filling this buffer until it runs out of
      buffer space. When running out of buffer space, it will simply
      stop the scan and wait until some disk writes return and thus free
      up memory buffers to use for further scanning.
 
      The default value is 2MB.
 
 `[DB]BackupLogBufferSize'
      This parameter has a similar role but instead used for writing a
      log of all writes to the tables during execution of the backup.
      The same principles apply for writing those pages as for
      `BackupDataBufferSize' except that when this part runs out of
      buffer space, it causes the backup to fail due to lack of backup
      buffers. Thus the size of this buffer must be big enough to handle
      the load caused by write activities during the backup execution.
 
      The default parameter should be big enough. Actually it is more
      likely that a backup failure is caused by a disk not able to write
      as quickly as it should. If the disk subsystem is not dimensioned
      for the write load caused by the applications this will create a
      cluster which will have great difficulties to perform the desired
      actions.
 
      It is important to dimension the nodes in such a manner that the
      processors becomes the bottleneck rather than the disks or the
      network connections.
 
      The default value is 2MB.
 
 `[DB]BackupMemory'
      This parameter is simply the sum of the two previous, the
      `BackupDataBufferSize' and `BackupLogBufferSize'.
 
      The default value is 4MB.
 
 `[DB]BackupWriteSize'
      This parameter specifies the size of the write messages to disk
      for the log and data buffer used for backups.
 
      The default value is 32KB.
Info Catalog
(mysql.info.gz) MySQL Cluster MGM Definition
(mysql.info.gz) MySQL Cluster Config File
(mysql.info.gz) MySQL Cluster API Definition
automatically generated byinfo2html