(mysql.info.gz) ORDER BY optimization

(mysql.info.gz) LEFT JOIN optimization
(mysql.info.gz) GROUP BY optimization
 
 7.2.10 How MySQL Optimizes `ORDER BY'
 -------------------------------------
 
 In some cases, MySQL can use an index to satisfy an `ORDER BY' clause
 without doing any extra sorting.
 
 The index can also be used even if the `ORDER BY' doesn't match the
 index exactly, as long as all the unused index parts and all the extra
 are `ORDER BY' columns are constants in the `WHERE' clause. The
 following queries will use the index to resolve the `ORDER BY' part:
 
      SELECT * FROM t1 ORDER BY KEY_PART1,KEY_PART2,... ;
      SELECT * FROM t1 WHERE KEY_PART1=CONSTANT ORDER BY KEY_PART2;
      SELECT * FROM t1 ORDER BY KEY_PART1 DESC, KEY_PART2 DESC;
      SELECT * FROM t1
          WHERE KEY_PART1=1 ORDER BY KEY_PART1 DESC, KEY_PART2 DESC;
 
 In some cases, MySQL _cannot_ use indexes to resolve the `ORDER BY',
 although it still will use indexes to find the rows that match the
 `WHERE' clause. These cases include the following:
 
    * You use `ORDER BY' on different keys:
 
           SELECT * FROM t1 ORDER BY KEY1, KEY2;
 
    * You use `ORDER BY' on non-consecutive key parts:
 
           SELECT * FROM t1 WHERE KEY2=CONSTANT ORDER BY KEY_PART2;
 
    * You mix `ASC' and `DESC':
 
           SELECT * FROM t1 ORDER BY KEY_PART1 DESC, KEY_PART2 ASC;
 
    * The key used to fetch the rows is not the same as the one used in
      the `ORDER BY':
 
           SELECT * FROM t1 WHERE KEY2=CONSTANT ORDER BY KEY1;
 
    * You are joining many tables, and the columns in the `ORDER BY' are
      not all from the first non-constant table that is used to retrieve
      rows. (This is the first table in the `EXPLAIN' output that
      doesn't have a `const' join type.)
 
    * You have different `ORDER BY' and `GROUP BY' expressions.
 
    * The type of table index used doesn't store rows in order.  For
      example, this is true for a `HASH' index in a `HEAP' table.
 
 
 With `EXPLAIN SELECT ... ORDER BY', you can check whether MySQL can use
 indexes to resolve the query.  It cannot if you see `Using filesort' in
 the `Extra' column.   `EXPLAIN' EXPLAIN.
 
 In those cases where MySQL must sort the result, it uses the following
 `filesort' algorithm before MySQL 4.1:
 
   1. Read all rows according to key or by table scanning.  Rows that
      don't match the `WHERE' clause are skipped.
 
   2. For each row, store a pair of values in a buffer (the sort key and
      the row pointer).  The size of the buffer is the value of the
      `sort_buffer_size' system variable.
 
   3. When the buffer gets full, run a qsort (quicksort) on it and store
      the result in a temporary file.  Save a pointer to the sorted
      block.  (If all pairs fit into the sort buffer, no temporary file
      is created.)
 
   4. Repeat the preceding steps until all rows have been read.
 
   5. Do a multi-merge of up to `MERGEBUFF' (7) regions to one block in
      another temporary file.  Repeat until all blocks from the first
      file are in the second file.
 
   6. Repeat the following until there are fewer than `MERGEBUFF2' (15)
      blocks left.
 
   7. On the last multi-merge, only the pointer to the row (the last
      part of the sort key) is written to a result file.
 
   8. Read the rows in sorted order by using the row pointers in the
      result file.  To optimize this, we read in a big block of row
      pointers, sort them, and use them to read the rows in sorted order
      into a row buffer. The size of the buffer is the value of the
      `read_rnd_buffer_size' system variable.  The code for this step is
      in the `sql/records.cc' source file.
 
 
 One problem with this approach is that it reads rows twice: One time
 when evaluating the `WHERE' clause, and again after sorting the pair
 values.  And even if the rows were accessed successively the first time
 (for example, if a table scan is done), the second time they are
 accessed randomly. (The sort keys are ordered, but the row positions
 are not.)
 
 In MySQL 4.1 and up, a `filesort' optimization is used that records not
 only the sort key value and row position, but also the columns required
 for the query.  This avoids reading the rows twice. The modified
 `filesort' algorithm works like this:
 
   1. Read the rows that match the `WHERE' clause, as before.
 
   2. For each row, record a tuple of values consisting of the sort key
      value and row position, and also the columns required for the
      query.
 
   3. Sort the tuples by sort key value
 
   4. Retrieve the rows in sorted order, but read the required columns
      directly from the sorted tuples rather than by accessing the table
      a second time.
 
 
 Using the modified `filesort' algorithm, the tuples are longer than the
 pairs used in the original method, and fewer of them fit in the sort
 buffer (the size of which is given by `sort_buffer_size'). As a result,
 it is possible for the extra I/O to make the modified approach slower,
 not faster.  To avoid a slowdown, the optimization is used only if the
 total size of the extra columns in the sort tuple does not exceed the
 value of the `max_length_for_sort_data' system variable. (A symptom of
 setting the value of this variable too high is that you will see high
 disk activity and low CPU activity.)
 
 If you want to increase `ORDER BY' speed, first see whether you can get
 MySQL to use indexes rather than an extra sorting phase. If this is not
 possible, you can try the following strategies:
 
    * Increase the size of the `sort_buffer_size' variable.
 
    * Increase the size of the `read_rnd_buffer_size' variable.
 
    * Change `tmpdir' to point to a dedicated filesystem with lots of
      empty space.  If you use MySQL 4.1 or later, this option accepts
      several paths that are used in round-robin fashion. Paths should
      be separated by colon characters (`:') on Unix and semicolon
      characters (`;') on Windows, NetWare, and OS/2.  You can use this
      feature to spread the load across several directories.  _Note:_
      The paths should be for directories in filesystems that are
      located on different _physical_ disks, not different partitions of
      the same disk.
 
 
 By default, MySQL sorts all `GROUP BY COL1, COL2, ...' queries as if
 you specified `ORDER BY COL1, COL2, ...' in the query as well. If you
 include an `ORDER BY' clause explicitly that contains the same column
 list,  MySQL optimizes it away without any speed penalty, although the
 sorting still occurs.  If a query includes `GROUP BY' but you want to
 avoid the overhead of sorting the result, you can suppress sorting by
 specifying `ORDER BY NULL'. For example:
 
      INSERT INTO foo
      SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
Info Catalog
(mysql.info.gz) LEFT JOIN optimization
(mysql.info.gz) Query Speed
(mysql.info.gz) GROUP BY optimization
automatically generated byinfo2html