(mysql.info.gz) ORDER BY optimization
Info Catalog
(mysql.info.gz) LEFT JOIN optimization
(mysql.info.gz) Query Speed
(mysql.info.gz) GROUP BY optimization
7.2.10 How MySQL Optimizes `ORDER BY'
-------------------------------------
In some cases, MySQL can use an index to satisfy an `ORDER BY' clause
without doing any extra sorting.
The index can also be used even if the `ORDER BY' doesn't match the
index exactly, as long as all the unused index parts and all the extra
are `ORDER BY' columns are constants in the `WHERE' clause. The
following queries will use the index to resolve the `ORDER BY' part:
SELECT * FROM t1 ORDER BY KEY_PART1,KEY_PART2,... ;
SELECT * FROM t1 WHERE KEY_PART1=CONSTANT ORDER BY KEY_PART2;
SELECT * FROM t1 ORDER BY KEY_PART1 DESC, KEY_PART2 DESC;
SELECT * FROM t1
WHERE KEY_PART1=1 ORDER BY KEY_PART1 DESC, KEY_PART2 DESC;
In some cases, MySQL _cannot_ use indexes to resolve the `ORDER BY',
although it still will use indexes to find the rows that match the
`WHERE' clause. These cases include the following:
* You use `ORDER BY' on different keys:
SELECT * FROM t1 ORDER BY KEY1, KEY2;
* You use `ORDER BY' on non-consecutive key parts:
SELECT * FROM t1 WHERE KEY2=CONSTANT ORDER BY KEY_PART2;
* You mix `ASC' and `DESC':
SELECT * FROM t1 ORDER BY KEY_PART1 DESC, KEY_PART2 ASC;
* The key used to fetch the rows is not the same as the one used in
the `ORDER BY':
SELECT * FROM t1 WHERE KEY2=CONSTANT ORDER BY KEY1;
* You are joining many tables, and the columns in the `ORDER BY' are
not all from the first non-constant table that is used to retrieve
rows. (This is the first table in the `EXPLAIN' output that
doesn't have a `const' join type.)
* You have different `ORDER BY' and `GROUP BY' expressions.
* The type of table index used doesn't store rows in order. For
example, this is true for a `HASH' index in a `HEAP' table.
With `EXPLAIN SELECT ... ORDER BY', you can check whether MySQL can use
indexes to resolve the query. It cannot if you see `Using filesort' in
the `Extra' column. `EXPLAIN' EXPLAIN.
In those cases where MySQL must sort the result, it uses the following
`filesort' algorithm before MySQL 4.1:
1. Read all rows according to key or by table scanning. Rows that
don't match the `WHERE' clause are skipped.
2. For each row, store a pair of values in a buffer (the sort key and
the row pointer). The size of the buffer is the value of the
`sort_buffer_size' system variable.
3. When the buffer gets full, run a qsort (quicksort) on it and store
the result in a temporary file. Save a pointer to the sorted
block. (If all pairs fit into the sort buffer, no temporary file
is created.)
4. Repeat the preceding steps until all rows have been read.
5. Do a multi-merge of up to `MERGEBUFF' (7) regions to one block in
another temporary file. Repeat until all blocks from the first
file are in the second file.
6. Repeat the following until there are fewer than `MERGEBUFF2' (15)
blocks left.
7. On the last multi-merge, only the pointer to the row (the last
part of the sort key) is written to a result file.
8. Read the rows in sorted order by using the row pointers in the
result file. To optimize this, we read in a big block of row
pointers, sort them, and use them to read the rows in sorted order
into a row buffer. The size of the buffer is the value of the
`read_rnd_buffer_size' system variable. The code for this step is
in the `sql/records.cc' source file.
One problem with this approach is that it reads rows twice: One time
when evaluating the `WHERE' clause, and again after sorting the pair
values. And even if the rows were accessed successively the first time
(for example, if a table scan is done), the second time they are
accessed randomly. (The sort keys are ordered, but the row positions
are not.)
In MySQL 4.1 and up, a `filesort' optimization is used that records not
only the sort key value and row position, but also the columns required
for the query. This avoids reading the rows twice. The modified
`filesort' algorithm works like this:
1. Read the rows that match the `WHERE' clause, as before.
2. For each row, record a tuple of values consisting of the sort key
value and row position, and also the columns required for the
query.
3. Sort the tuples by sort key value
4. Retrieve the rows in sorted order, but read the required columns
directly from the sorted tuples rather than by accessing the table
a second time.
Using the modified `filesort' algorithm, the tuples are longer than the
pairs used in the original method, and fewer of them fit in the sort
buffer (the size of which is given by `sort_buffer_size'). As a result,
it is possible for the extra I/O to make the modified approach slower,
not faster. To avoid a slowdown, the optimization is used only if the
total size of the extra columns in the sort tuple does not exceed the
value of the `max_length_for_sort_data' system variable. (A symptom of
setting the value of this variable too high is that you will see high
disk activity and low CPU activity.)
If you want to increase `ORDER BY' speed, first see whether you can get
MySQL to use indexes rather than an extra sorting phase. If this is not
possible, you can try the following strategies:
* Increase the size of the `sort_buffer_size' variable.
* Increase the size of the `read_rnd_buffer_size' variable.
* Change `tmpdir' to point to a dedicated filesystem with lots of
empty space. If you use MySQL 4.1 or later, this option accepts
several paths that are used in round-robin fashion. Paths should
be separated by colon characters (`:') on Unix and semicolon
characters (`;') on Windows, NetWare, and OS/2. You can use this
feature to spread the load across several directories. _Note:_
The paths should be for directories in filesystems that are
located on different _physical_ disks, not different partitions of
the same disk.
By default, MySQL sorts all `GROUP BY COL1, COL2, ...' queries as if
you specified `ORDER BY COL1, COL2, ...' in the query as well. If you
include an `ORDER BY' clause explicitly that contains the same column
list, MySQL optimizes it away without any speed penalty, although the
sorting still occurs. If a query includes `GROUP BY' but you want to
avoid the overhead of sorting the result, you can suppress sorting by
specifying `ORDER BY NULL'. For example:
INSERT INTO foo
SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
Info Catalog
(mysql.info.gz) LEFT JOIN optimization
(mysql.info.gz) Query Speed
(mysql.info.gz) GROUP BY optimization
automatically generated byinfo2html