Chapter 18. Partitioning

Table of Contents

18.1. Overview of Partitioning in MySQL
18.2. Partition Types
18.2.1. RANGE Partitioning
18.2.2. LIST Partitioning
18.2.3. HASH Partitioning
18.2.4. KEY Partitioning
18.2.5. Subpartitioning
18.2.6. How MySQL Partitioning Handles NULL Values
18.3. Partition Management
18.3.1. Management of RANGE and LIST Partitions
18.3.2. Management of HASH and KEY Partitions
18.3.3. Maintenance of Partitions
18.3.4. Obtaining Information About Partitions

This chapter discusses partitioning as implemented in MySQL 5.1. An introduction to partitioning and partitioning concepts may be found in Section 18.1, “Overview of Partitioning in MySQL”. MySQL 5.1 supports several types of partitioning, which are discussed in Section 18.2, “Partition Types”, as well as subpartitioning (also known sometimes as composite partitioning), which is described in Section 18.2.5, “Subpartitioning”. Methods of adding, removing, and altering partitions in existing partitioned tables are covered in Section 18.3, “Partition Management”. Table maintenance commands for use with partitioned tables are discussed in Section 18.3.3, “Maintenance of Partitions”.

Please Note: The partitioning implementation in MySQL 5.1 is still very new (pre-alpha quality) and is not production-ready at this time. Much the same is true of this chapter: Some of the features described herein are not yet actually implemented (partitioning maintenance and repartitioning commands), and others might not yet function exactly as described (for example, the DATA DIRECTORY and INDEX DIRECTORY options for partitions are adversely affected by Bug #13520). We have attempted to note these discrepancies in this chapter. Before filing bug reports, we encourage you to check the following resources:

MySQL 5.1 alpha binaries are now available from http://dev.mysql.com/downloads/mysql/5.1.html. However, for the latest partitioning bugfixes and feature additions, you can obtain the source from our BitKeeper repository. To enable partitioning, you need to compile the server using the --with-partition option. For more information about building MySQL, see Section 2.8, “MySQL Installation Using a Source Distribution”. If you have problems compiling a partitioning-enabled MySQL 5.1 build, check the MySQL Partitioning Forum and ask for assistance there if you don't find a solution to your problem already posted.

18.1. Overview of Partitioning in MySQL

This section provides a conceptual overview of partitioning in MySQL 5.1.

The SQL standard does not provide much in the way of guidance regarding the physical aspects of data storage. The SQL language itself is intended to work independently of any data structures or media underlying the schemas, tables, rows, or columns with which it works. Nonetheless, most advanced database management systems have evolved some means of determining the physical location to be used for storing specific pieces of data in terms of the filesystem, hardware or even both. In MySQL, the InnoDB storage engine has long supported the notion of a tablespace, and the MySQL Server, even prior to the introduction of partitioning, could be configured to employ different physical directories for storing different databases (see Section 7.6.1, “Using Symbolic Links”, for an explanation of how this is done).

Partitioning takes this notion a step further, by allowing you to distribute portions of individual tables across a filesystem according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MySQL can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function. The function is selected according to the partitioning type specified by the user, and takes as its parameter the value of a user-supplied expression. This expression can be either an integer column value, or a function acting on one or more column values and returning an integer. The value of this expression is passed to the partitioning function, which returns an integer value representing the number of the partition in which that particular record should be stored. This function must be non-constant and non-random. It may not contain any queries, but may use virtually any SQL expression that is valid in MySQL, so long as that expression returns a positive integer less than MAXVALUE (the greatest possible positive integer). Examples of partitioning functions can be found in the discussions of partitioning types later in this chapter (see Section 18.2, “Partition Types”), as well as in the partitioning syntax descriptions given in Section 13.1.5, “CREATE TABLE Syntax”.

This is known as horizontal partitioning — that is, different rows of a table may be assigned to different physical partitions. MySQL 5.1 does not support vertical partitioning, in which different columns of a table are assigned to different physical partitions. There are not at this time any plans to introduce vertical partitioning into MySQL 5.1.

Partitioning support is included in the -max releases of MySQL 5.1 (that is, the 5.1 -max binaries will be built with --with-partition). If the MySQL binary is built with partitioning support, nothing further needs to be done in order to enable it (for example, no special entries are required in your my.cnf file). You can determine whether or not your MySQL server supports partitioning by means of a SHOW VARIABLES command such as this one:

mysql> SHOW VARIABLES LIKE '%partition%';

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| have_partition_engine | YES   |
+-----------------------+-------+
1 row in set (0.00 sec)

If you do not see the have_partition_engine variable with the value YES listed as shown above in the output of an appropriate SHOW VARIABLES, then your version of MySQL does not support partitioning. (Note that there is nothing in the output of SHOW ENGINES that displays any information relating to partitioning support; you must use SHOW VARIABLES to make this determination.)

For creating partitioned tables, you can use any storage engine that is supported by your MySQL server; the MySQL partitioning engine runs in a separate layer and can interact with any of these. In MySQL 5.1, all partitions of the same partitioned table must use the same storage engine; for example, you cannot use MyISAM for one partition and InnoDB for another. However, there is nothing preventing you from using different storage engines for different partitioned tables on the same MySQL server or even in the same database.

To employ a particular storage engine for a partitioned table, it is necessary only to use the [STORAGE] ENGINE option just as you would for a non-partitioned table. However, you should keep in mind that [STORAGE] ENGINE (and other table options) need to be listed before any partitioning options are used in a CREATE TABLE statement. This example shows how to create a table that is partitioned by hash into 6 partitions and which uses the InnoDB storage engine:

CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
    ENGINE=INNODB
    PARTITION BY HASH(MONTH(tr_date))
    PARTITIONS 6;

(Note that each PARTITION clause can include a [STORAGE] ENGINE option, but in MySQL 5.1 this has no effect.)

It is possible to create partitioned temporary tables; however, the lifetime of such tables is only as long as the current MySQL session. This is the same as for non-partitioned temporary tables.

Note: Partitioning applies to all data and indexes of a table; you cannot partition only the data and not the indexes, or vice versa, nor can you partition only a portion of the table.

Data and indexes for each partition can be assigned to a specific directory using the DATA DIRECTORY and INDEX DIRECTORY options for the PARTITION clause of the CREATE TABLE statement used to create the partitioned table. In addition, MAX_ROWS and MIN_ROWS can be used to determine the maximum and minimum numbers of rows, respectively, that can be stored in each partition. See Section 18.3, “Partition Management”, for more information on these options. Note: This particular feature is currently nonfunctional due to Bug #13250; we should have this fixed by the time the first 5.1 binaries are made available.

Some of the advantages of partitioning include:

  • Being able to store more data in one table than can be held on a single disk or filesystem partition.

  • Data that loses its usefulness can often be easily be removed from the table by dropping the partition containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding a new partition specifically for that data.

Other benefits usually associated with partitioning include those in the following list. These features are not currently implemented in MySQL Partitioning, but are high on our list of priorities; we hope to include them in the 5.1 production release.

  • Some queries can be greatly optimized in virtue of the fact that data satisfying a given WHERE clause can be stored only on one or more partitions, thereby excluding any remaining partitions from the search. Because partitions can be altered after a partitioned table has been created, you can reorganize your data to enhance frequent queries that may not have been so when the partitioning scheme was first set up.

  • Queries involving aggregate functions such as SUM() and COUNT() can easily be parallelized. A simple example of such a query might be SELECT salesperson_id, COUNT(orders) as order_total FROM sales GROUP BY salesperson_id;. By “parallelized,” we mean that the query can be run simultaneously on each partition, and the final result obtained merely by summing the results obtained for all partitions.

  • Achieving greater query throughput in virtue of spreading data seeks over multiple disks.

Be sure to check this page and chapter frequently for updates as Partitioning development for MySQL 5.1 continues.

18.2. Partition Types

This section discusses the types of partitioning which are available in MySQL 5.1. These include:

  • RANGE partitioning: Assigns rows to partitions based on column values falling within a given range. See Section 18.2.1, “RANGE Partitioning”.

  • LIST partitioning: Similar to partitioning by range, except that the partition is selected based on columns matching one of a set of discrete values. See Section 18.2.2, “LIST Partitioning”.

  • HASH partitioning: A partition is selected based on the value returned by a user-defined expression that operates on column values in rows to be inserted into the table. The function may consist of any expression valid in MySQL that yields a non-negative integer value. See Section 18.2.3, “HASH Partitioning”.

  • KEY partitioning: Similar to partitioning by hash, except that only one or more columns to be evaluated are supplied, and the MySQL server provides its own hashing function. The column or columns must contain only integer values. See Section 18.2.4, “KEY Partitioning”.

It is important to remember — regardless of the type of partitioning that you use — that partitions are always numbered automatically and in sequence when created, starting with 0. When a new row is inserted into a partitioned table, it is these partition numbers that are used in identifying the correct partition. For example, if your table uses 4 partitions, these partitions are numbered 0, 1, 2, and 3. For the RANGE and LIST partitioning types, it is necessary to ensure that there is a partition defined for each partition number. For HASH partitioning, the user function employed must return an integer value greater than 0. For KEY partitioning, this issue is taken care of automatically by the hashing function which the MySQL server employs internally.

Names of partitions generally follow the rules governing other MySQL identifiers, such as those for tables and databases. However, you should note that partition names are not case-sensitive. For example, the following CREATE TABLE statement fails as shown:

mysql> CREATE TABLE t2 (val INT)
    -> PARTITION BY LIST(val)(
    ->     PARTITION mypart VALUES IN (1,3,5),
    ->     PARTITION MyPart VALUES IN (2,4,6)
    -> );
ERROR 1488 (HY000): All partitions must have unique names in the table

Failure occurs because MySQL sees no difference between the partition names mypart and MyPart.

In the sections that follow, we do not necessarily provide all possible forms for the syntax that can be used for creating each partition type; this information may be found in Section 13.1.5, “CREATE TABLE Syntax”.

18.2.1. RANGE Partitioning

A table that is partitioned by range is partitioned in such a way that each partition contains rows for which the partitioning expression value lies within a given range. Ranges should be contiguous but not overlapping, and are defined using the VALUES LESS THAN operator. For the next few examples, suppose that you are creating a table such as the following to hold personnel records for a chain of 20 video stores, numbered 1 through 20:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT NOT NULL,
    store_id INT NOT NULL
);

This table can be partitioned by range in a number of ways, depending on your needs. One way would be to use the store_id column. For instance, you might decide to partition the table 4 ways by adding a PARTITION BY RANGE clause as shown here:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT NOT NULL,
    store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
    PARTITION p0 VALUES LESS THAN (6),
    PARTITION p1 VALUES LESS THAN (11),
    PARTITION p2 VALUES LESS THAN (16),
    PARTITION p3 VALUES LESS THAN (21)
);

In this partitioning scheme, all rows corresponding to employees working at stores 1 through 5 are stored in partition p0, to those employed at stores 6 through 10 are stored in partition p1, and so on. Note that each partition is defined in order, from lowest to highest. This is a requirement of the PARTITION BY RANGE syntax; you can think of it as being analogous to a switch ... case in C or Java in this regard.

It is easy to determine that a new row containing the data (72, 'Michael', 'Widenius', '1998-06-25', NULL, 13) is inserted into partition p2, but what happens when your chain adds a 21st store? Under this scheme, there is no rule that covers a row whose store_id is greater than 20, so an error results because the server does not know where to place it. You can keep this from occurring by using a “catchallVALUES LESS THAN clause in the CREATE TABLE statement that provides for all values greater than highest value explicitly named:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT NOT NULL,
    store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
    PARTITION p0 VALUES LESS THAN (6),
    PARTITION p1 VALUES LESS THAN (11),
    PARTITION p2 VALUES LESS THAN (16),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

MAXVALUE represents the greatest possible integer value. Now, any rows whose store_id column value is greater than or equal to 16 (the highest value defined) are stored in partition p3. At some point in the future — when the number of stores has increased to 25, 30, or more — you can use an ALTER TABLE statement to add new partitions for stores 21-25, 26-30, and so on (see Section 18.3, “Partition Management”, for details of how to do this).

In much the same fashion, you could partition the table based on employee job codes — that is, based on ranges of job_code column values. For example — assuming that two-digit job codes are used for regular (in-store) workers, three-digit codes are used for office and support personnel, and four-digit codes are used for management positions — you could create the partitioned table using the following:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT NOT NULL,
    store_id INT NOT NULL
)
PARTITION BY RANGE (job_code) (
    PARTITION p0 VALUES LESS THAN (100),
    PARTITION p1 VALUES LESS THAN (1000),
    PARTITION p2 VALUES LESS THAN (10000)
);

In this instance, all rows relating to in-store workers would be stored in partition p0, those relating to office and support staff in p1, and those relating to managers in partition p2.

It is also possible to use an expression in VALUES LESS THAN clauses. The most noteworthy restriction here is that MySQL must be able to evaluate the expression's return value as part of a LESS THAN (<) comparison; thus, the expression's value cannot be NULL. It is for this reason that the hired, separated, job_code, and store_id columns of the employees table have been defined as NOT NULL.

Rather than splitting up the table data according to store number, you can use an expression based on one of the two DATE columns instead. For example, let us suppose that you wish to partition based on the year that each employee left the company, that is, the value of YEAR(separated). An example of a CREATE TABLE statement that implements such a partitioning scheme is shown here:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY RANGE ( YEAR(separated) ) (
    PARTITION p0 VALUES LESS THAN (1991),
    PARTITION p1 VALUES LESS THAN (1996),
    PARTITION p2 VALUES LESS THAN (2001),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

In this scheme, for all employees who left before 1991, the rows are stored in partition p0; for those who left in the years 1991 through 1995, in p1; for those who left in the years 1996 through 2000, in p2; and for any workers who left after the year 2000, in p3.

Range partitioning is particularly useful when:

  • You want or need to delete “old” data. If you are using the partitioning scheme shown immediately above, you can simply use ALTER TABLE employees DROP PARTITION p0; to delete all rows relating to employees who stopped working for the firm prior to 1991. (See Section 13.1.2, “ALTER TABLE Syntax”, and Section 18.3, “Partition Management”, for more information.) For a table with a great many rows, this can be much more efficient than running a DELETE query such as DELETE FROM employees WHERE YEAR(separated) <= 1990;.

  • You want to use a column containing date or time values, or containing values arising from some other series.

  • You frequently run queries that depend directly on the column used for partitioning the table. For example, when executing a query such as SELECT COUNT(*) FROM employees WHERE YEAR(separated) = 2000 GROUP BY store_id;, MySQL can quickly determine that only partition p2 needs to be scanned because the remaining partitions cannot contain any records satisfying the WHERE clause. Note: This optimization has not yet been enabled in the MySQL 5.1 sources; however, work is in progress.

18.2.2. LIST Partitioning

List partitioning in MySQL is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined. The chief difference is that, in list partitioning, each partition is defined and selected based on the membership of a column value in one of a set of value lists, rather than in one of a set of contiguous ranges of values. This is done by using PARTITION BY LIST(expr) where expr is a column value or an expression based on a column value and returning an integer value, and then defining each partition by means of a VALUES IN (value_list), where value_list is a comma-separated list of integers.

Note: In MySQL 5.1, it is possible to match against only a list of integers when partitioning by LIST.

Unlike the case with partitions defined by range, list partitions do not need to be declared in any particular order. For more detailed syntactical information, see Section 13.1.5, “CREATE TABLE Syntax”.

For the examples that follow, we assume that the basic definition of the table to be partitioned is provided by the CREATE TABLE statement shown here:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
);

(This is the same table used as a basis for the examples in Section 18.2.1, “RANGE Partitioning”.)

Suppose that there are 20 video stores distributed among 4 franchises as shown in the following table:

RegionStore ID Numbers
North3, 5, 6, 9, 17
East1, 2, 10, 11, 19, 20
West4, 12, 13, 14, 18
Central7, 8, 15, 16

To partition this table in such a way that rows for stores belonging to the same region are stored in the same partition, you could use the CREATE TABLE statement shown here:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY LIST(store_id) (
    PARTITION pNorth VALUES IN (3,5,6,9,17),
    PARTITION pEast VALUES IN (1,2,10,11,19,20),
    PARTITION pWest VALUES IN (4,12,13,14,18),
    PARTITION pCentral VALUES IN (7,8,15,16)
);

This makes it easy to add or drop employee records relating to specific regions to or from the table. For instance, suppose that all stores in the West region are sold to another company. All rows relating to employees working at stores in that region can be deleted with the query ALTER TABLE employees DROP PARTITION pWest;, which can be executed much more efficiently than the equivalent DELETE query DELETE FROM employees WHERE store_id IN (4,12,13,14,18);.

Important: If you try to insert a row such that the column value (or the partitioning expression's return value) is not found in any of the partitioning value lists, the INSERT query will fail with an error. For example, given the LIST partitioning scheme just outlined, this query will fail:

INSERT INTO employees VALUES 
    (224, 'Linus', 'Torvalds', '2002-05-01', '2004-10-12', 42, 21);

Failure occurs because the store_id column value 21 is not found in any of the value lists used to define partitions pNorth, pEast, pWest, or pCentral. It is important to note that there is no “catch-all” definition for list partitions analogous to VALUES LESS THAN MAXVALUE which accommodates values not found in any of the value lists. In other words, any value which is to be matched must be found in one of the value lists.

As with RANGE partitioning, it is possible to combine LIST partitioning with partitioning by hash or key to produce a composite partitioning (subpartitioning). See Section 18.2.5, “Subpartitioning”.

18.2.3. HASH Partitioning

Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions. With range or list partitioning, you must specify explicitly into which partition a given column value or set of column values is to be stored; with hash partitioning, MySQL takes care of this for you, and you need only specify a column value or expression based on a column value to be hashed and the number of partitions into which the partitioned table is to be divided.

To partition a table using HASH partitioning, it is necessary to append to the CREATE TABLE statement a PARTITION BY HASH (expr) clause, where expr is an expression that returns an integer. This can simply be the name of a column whose type is one of MySQL's integer types. In addition, you will most likely want to follow this with a PARTITIONS num clause, where num is a non-negative integer representing the number of partitions into which the table is to be divided.

For example, the following statement creates a table that uses hashing on the store_id column and is divided into 4 partitions:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY HASH(store_id)
PARTITIONS 4;

If you do not include a PARTITIONS clause, the number of partitions defaults to 1. Exception: For NDB Cluster tables, the default number of partitions is the same as the number of cluster data nodes, possibly modified to take into account any MAX_ROWS setting in order to ensure that all rows can fit into the partitions. (See Chapter 17, MySQL Cluster.)

Using the PARTITIONS keyword without a number following it results in a syntax error.

You can also use an SQL expression that returns an integer for expr. For instance, you might want to partition based on the year in which an employee was hired. This can be done as shown here:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY HASH( YEAR(hired) )
PARTITIONS 4;

You may use any function or other expression for expr that is valid in MySQL, so long as it returns a non-constant, non-random integer value. (In other words, it should be varying but deterministic.) However, you should keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.

The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.

For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col). Even so, YEAR(date_col) is a good candidate for a hashing function, because it varies directly with a portion of date_col and there is no possible change in date_col that produces a disproportionate change in YEAR(date_col).

By way of contrast, suppose you have a column named int_col whose type is INT. Now consider the expression POW(5-int_col,3) + 6. This would be a poor choice for a hashing function because a change in the value of int_col is not guaranteed to produce a proportional change in the value of the expression. Changing the value of int_col by a given amount can produce by widely different changes in the value of the expression. For example, changing int_col from 5 to 6 produces a change of -1 in the value of the expression, but changing the value of int_col from 6 to 7 produces a change of -7 in the expression value.

In other words, the more closely the graph of the column value versus the value of the expression follows a straight line as traced by the equation y=nx where n is some nonzero constant, the better the expression is suited to hashing. This has to do with the fact that the more nonlinear an expression is, the more uneven the distribution of data among the partitions it tends to produce.

In theory, pruning is also possible for expressions involving more than column value, but determining which of these are suitable can be quite difficult and time-consuming. For this reason, the use of hashing expressions involving multiple columns is not particularly recommended.

When PARTITION BY HASH is used, MySQL determines which partition of num partitions to use based on the modulus of the result of the user function. In other words, for an expression expr, the partition in which the record is stored is partition number N, where N = MOD(expr, num). For example, suppose table t1 is defined as follows, so that it has 4 partitions:

CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
    PARTITION BY HASH( YEAR(col3) )
    PARTITIONS 4;

If you insert a record into t1 whose col3 value is '2005-09-15', then the partition in which it is stored is determined as follows:

MOD(YEAR('2005-09-01'),4)
=  MOD(2005,4)
=  1

MySQL 5.1 also supports a variant of HASH partitioning known as linear hashing which employs a more complex algorithm for determining the placement of new rows inserted into the partitioned table. See Section 18.2.3.1, “LINEAR HASH Partitioning”, for a description of this algorithm.

The user function is evaluated each time a record is inserted or updated. It may also — depending on the circumstances — be evaluated when records are deleted.

Note: If the table to be partitioned has a UNIQUE key, then any columns supplied as arguments to the HASH user function or to the KEY's column_list must be part of that key.

18.2.3.1. LINEAR HASH Partitioning

MySQL also supports linear hashing, which differs from regular hashing in that linear hashing utilizes a linear powers-of-two algorithm whereas regular hashing employs the modulus of the hashing function's value.

Syntactically, the only difference between linear-hash partitioning and regular hashing is the addition of the LINEAR keyword in the PARTITION BY clause, as shown here:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY LINEAR HASH(YEAR(hired))
PARTITIONS 4;

Given an expression expr, the partition in which the record is stored when linear hashing is used is partition number N from among num partitions, where N is derived according to the following algorithm:

  1. Find the next power of 2 greater than num. We call this value V; it can be calculated as:

    V = POWER(2, CEILING(LOG(2, num)))
    

    (For example, suppose that num is 13. Then LOG(2,13) is 3.7004397181411. CEILING(3.7004397181411) is 4, and V = POWER(2,4), which is 16.)

  2. Set N = F(column_list) & (V - 1).

  3. While N >= num:

    • Set V = CEIL(V / 2)

    • Set N = N & (V - 1)

For example, suppose that the table t1, using linear hash partitioning and having 6 partitions, is created using this statement:

CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
    PARTITION BY LINEAR HASH( YEAR(col3) )
    PARTITIONS 6;

Now assume that you want to insert two records into t1 having the col3 column values '2003-04-14' and '1998-10-19'. The partition number for the first of these is determined as follows:

V = POWER(2, CEILING(LOG(2,7))) = 8
N = YEAR('2003-04-14') & (8 - 1)
   = 2003 & 7
   = 3

(3 >= 6 is FALSE: record stored in partition #3)

The number of the partition where the second record is stored is calculated as shown here:

V = 8
N = YEAR('1998-10-19') & (8-1)
  = 1998 & 7
  = 6

(6 >= 6 is TRUE: additional step required)

N = 6 & CEILING(5 / 2)
  = 6 & 3
  = 2

(2 >= 6 is FALSE: record stored in partition #2)

The advantage in partitioning by linear hash is that the adding, dropping, merging, and splitting of partitions is made much faster, which can be beneficial when dealing with tables containing extremely large amounts (terabytes) of data. The disadvantage is that data is less likely to be evenly distributed between partitions as compared with the distribution obtained using regular hash partitioning.

18.2.4. KEY Partitioning

Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. MySQL Cluster uses MD5() for this purpose; for tables using other storage engines, the server employs its own internal hashing function which is based on the same algorithm as PASSWORD().

The syntax rules for CREATE TABLE ... PARTITION BY KEY are similar to those for creating a table that is partitioned by hash. The only differences are that you use KEY rather than HASH, and that KEY takes only a list of one or more column names.

It is also possible to partition a table by linear key. Here is a simple example:

CREATE TABLE tk (
    col1 INT NOT NULL,
    col2 CHAR(5),
    col3 DATE
) 
PARTITION BY LINEAR KEY (col1)
PARTITIONS 3;

Using LINEAR has the same effect on KEY partitioning as it does on HASH partitioning, with the partition number being derived using a powers-of-two algorithm rather than modulo arithmetic. See Section 18.2.3.1, “LINEAR HASH Partitioning”, for a description of this algorithm and its implications.

18.2.5. Subpartitioning

Subpartitioning — also known as composite partitioning — is the further division of each partition in a partitioned table. For example, consider the following CREATE TABLE statement:

CREATE TABLE ts (id INT, purchased DATE)
    PARTITION BY RANGE(YEAR(purchased))
    SUBPARTITION BY HASH(TO_DAYS(purchased))
    SUBPARTITIONS 2 (
        PARTITION p0 VALUES LESS THAN (1990),
        PARTITION p1 VALUES LESS THAN (2000),
        PARTITION p2 VALUES LESS THAN MAXVALUE
    );

Table ts has 3 RANGE partitions. Each of these partitions — p0, p1, and p2 — is further divided into 2 subpartitions. In effect, the entire table is divided into 3 * 2 = 6 partitions. However, due to the action of the PARTITION BY RANGE clause, the first 2 of these store only those records with a value less than 1990 in the purchased column.

In MySQL 5.1, it is possible to subpartition tables that are partitioned by RANGE or LIST. Subpartitions may use either HASH or KEY partitioning. This is also known as composite partitioning.

It is also possible to define subpartitions explicitly using SUBPARTITION clauses in order to specify options for individual subpartitions. For example, a more verbose fashion of creating the same table ts as shown in the previous example would be:

CREATE TABLE ts (id INT, purchased DATE)
    PARTITION BY RANGE( YEAR(purchased) )
    SUBPARTITION BY HASH( TO_DAYS(purchased) ) (
        PARTITION p0 VALUES LESS THAN (1990) (
            SUBPARTITION s0,
            SUBPARTITION s1
        ),
        PARTITION p1 VALUES LESS THAN (2000) (
            SUBPARTITION s2,
            SUBPARTITION s3
        ),
        PARTITION p2 VALUES LESS THAN MAXVALUE (
            SUBPARTITION s4,
            SUBPARTITION s5
        )
    );

Some syntactical items of note:

  • Each partition must have the same number of subpartitions.

  • If you explicitly define any subpartitions using SUBPARTITION on any partition of a partitioned table, you must define them all. In other words, the following statement will fail:

    CREATE TABLE ts (id INT, purchased DATE)
        PARTITION BY RANGE( YEAR(purchased) )
        SUBPARTITION BY HASH( TO_DAYS(purchased) ) (
            PARTITION p0 VALUES LESS THAN (1990) (
                SUBPARTITION s0,
                SUBPARTITION s1
            ),
            PARTITION p1 VALUES LESS THAN (2000),
            PARTITION p2 VALUES LESS THAN MAXVALUE (
                SUBPARTITION s2,
                SUBPARTITION s3
            )
        );
    

    This statement would still fail even if it included a SUBPARTITIONS 2 clause.

  • Each SUBPARTITION clause must include (at a minimum) a name for the subpartition. Otherwise, you may set any desired option for the subpartition or allow it to assume its default setting for that option.

  • Names of subpartitions must be unique within each partition, but do not have to be unique within the table as a whole. For example, the following CREATE TABLE statement is valid:

    CREATE TABLE ts (id INT, purchased DATE)
        PARTITION BY RANGE( YEAR(purchased) )
        SUBPARTITION BY HASH( TO_DAYS(purchased) ) (
            PARTITION p0 VALUES LESS THAN (1990) (
                SUBPARTITION s0,
                SUBPARTITION s1
            ),
            PARTITION p1 VALUES LESS THAN (2000) (
                SUBPARTITION s0,
                SUBPARTITION s1
            ),
            PARTITION p2 VALUES LESS THAN MAXVALUE (
                SUBPARTITION s0,
                SUBPARTITION s1
            )
        );
    

Subpartitions can be used with especially large tables to distribute data and indexes across many disks. Suppose that you have 6 disks mounted as /disk0, /disk1, /disk2, and so on. Now consider the following example:

CREATE TABLE ts (id INT, purchased DATE)
    PARTITION BY RANGE( YEAR(purchased) )
    SUBPARTITION BY HASH( TO_DAYS(purchased) ) (
        PARTITION p0 VALUES LESS THAN (1990) (
            SUBPARTITION s0 
                DATA DIRECTORY = '/disk0/data' 
                INDEX DIRECTORY = '/disk0/idx',
            SUBPARTITION s1 
                DATA DIRECTORY = '/disk1/data' 
                INDEX DIRECTORY = '/disk1/idx'
        ),
        PARTITION p1 VALUES LESS THAN (2000) (
            SUBPARTITION s0 
                DATA DIRECTORY = '/disk2/data' 
                INDEX DIRECTORY = '/disk2/idx',
            SUBPARTITION s1 
                DATA DIRECTORY = '/disk3/data' 
                INDEX DIRECTORY = '/disk3/idx'
        ),
        PARTITION p2 VALUES LESS THAN MAXVALUE (
            SUBPARTITION s0 
                DATA DIRECTORY = '/disk4/data' 
                INDEX DIRECTORY = '/disk4/idx',
            SUBPARTITION s1 
                DATA DIRECTORY = '/disk5/data' 
                INDEX DIRECTORY = '/disk5/idx'
        )
    );

In this case, a separate disk is used for the data and for the indexes of each RANGE. Many other variations are possible; another example might be:

CREATE TABLE ts (id INT, purchased DATE)
    PARTITION BY RANGE(YEAR(purchased))
    SUBPARTITION BY HASH(TO_DAYS(purchased)) (
        PARTITION p0 VALUES LESS THAN (1990) (
            SUBPARTITION s0a 
                DATA DIRECTORY = '/disk0' 
                INDEX DIRECTORY = '/disk1',
            SUBPARTITION s0b 
                DATA DIRECTORY = '/disk2' 
                INDEX DIRECTORY = '/disk3'
        ),
        PARTITION p1 VALUES LESS THAN (2000) (
            SUBPARTITION s1a 
                DATA DIRECTORY = '/disk4/data' 
                INDEX DIRECTORY = '/disk4/idx',
            SUBPARTITION s1b 
                DATA DIRECTORY = '/disk5/data' 
                INDEX DIRECTORY = '/disk5/idx'
        ),
        PARTITION p2 VALUES LESS THAN MAXVALUE (
            SUBPARTITION s2a,
            SUBPARTITION s2b
        )
    );

Here, the storage is as follows:

  • Rows with purchased dates from before 1990 take up a vast amount of space, so are split up 4 ways, with a separate disk dedicated to the data and to the indexes for each of the two subpartitions (s0a and s0b) making up partition p0. In other words:

    • The data for subpartition s0a is stored on /disk0.

    • The indexes for subpartition s0a are stored on /disk1.

    • The data for subpartition s0b is stored on /disk2.

    • The indexes for subpartition s0b are stored on /disk3.

  • Rows containing dates ranging from 1990 to 1999 (partition p1) do not require as much room as those from before 1990. These are split between 2 disks (/disk4 and /disk5) rather than 4 disks as with the legacy records stored in p0:

    • Data and indexes belonging to p1's first subpartition (s1a) are stored on /disk4 — the data in /disk4/data, and the indexes in /disk4/idx.

    • Data and indexes belonging to p1's second subpartition (s1b) are stored on /disk5 — the data in /disk5/data, and the indexes in /disk5/idx.

  • Rows reflecting dates from the year 2000 to the present (partition p2) do not take up as much space as required by either of the two previous ranges. Currently, it is sufficient to store all of these in the default location.

    In future, when the number of purchases for the decade beginning with the year 2000 grows to a point where the default location no longer provides sufficient space, the corresponding rows can be moved using an ALTER TABLE ... REORGANIZE PARTITION statement. See Section 18.3, “Partition Management”, for an explanation of how this can be done.

18.2.6. How MySQL Partitioning Handles NULL Values

Partitioning in MySQL does nothing to disallow NULL as the value of a partitioning expression, whether it is a column value or the value of a user-supplied expression. In general, MySQL will treat a NULL as a zero in such cases. If you wish to circumvent this behavior, you should design tables so as not to allow nulls; most likely you can do so by declaring columns NOT NULL.

In this section, we provide some examples illustrating how MySQL handles NULL values when determining the partition in which a row should be stored.

If you insert a row into a table partitioned by RANGE or LIST such that the column value used to determine the partition is NULL, it is treated as 0. For example, consider these two tables, created and populated as follows:

mysql> CREATE TABLE tnlist (
    ->     id INT,
    ->     name VARCHAR(5)
    -> )
    -> PARTITION BY LIST(id) (
    ->     PARTITION p1 VALUES IN (0),
    ->     PARTITION p2 VALUES IN (1)
    -> );
Query OK, 0 rows affected (0.09 sec)

mysql> CREATE TABLE tnrange (
    ->     id INT,
    ->     name VARCHAR(5)
    -> )
    -> PARTITION BY RANGE(id) (
    ->     PARTITION p1 VALUES LESS THAN (1),
    ->     PARTITION p2 VALUES LESS THAN MAXVALUE
    -> );
Query OK, 0 rows affected (0.09 sec)

mysql> INSERT INTO tnlist VALUES (NULL, 'bob');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO tnrange VALUES (NULL, 'jim');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM tnlist;
+------+------+
| id   | name |
+------+------+
| NULL | bob  |
+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM tnrange;
+------+------+
| id   | name |
+------+------+
| NULL | jim  |
+------+------+
1 row in set (0.00 sec)

In both tables, the id column was not declared as NOT NULL, which means that it admits NULL values. You can verify that the rows were stored in the partitions p1 of the each table by dropping these partitions, and then re-running the SELECT statements:

mysql> ALTER TABLE tnlist DROP PARTITION p1;
Query OK, 0 rows affected (0.16 sec)

mysql> ALTER TABLE tnrange DROP PARTITION p1;
Query OK, 0 rows affected (0.16 sec)

mysql> SELECT * FROM tnlist;
Empty set (0.00 sec)

mysql> SELECT * FROM tnrange;
Empty set (0.00 sec)

In the case of partitioning by HASH or KEY, any partition expression that yields a NULL value is treated as though its return value were zero. We can verify this behavior by examining the effects on the filesystem of creating a table partitioned by HASH and populating it with a record containing appropriate values. Suppose you have a table tnhash, created in the test database, using this statement:

CREATE TABLE tnhash (
    id INT,
    name VARCHAR(5)
)
PARTITION BY HASH(id)
PARTITIONS 2;

Assuming an RPM installation of MySQL on Linux, this statement creates two .MYD files in /var/lib/mysql/test, which can be viewed in the bash shell as follows:

/var/lib/mysql/test> ls *.MYD -l
-rw-rw----  1 mysql mysql 0 2005-11-04 18:41 tnhash_p0.MYD
-rw-rw----  1 mysql mysql 0 2005-11-04 18:41 tnhash_p1.MYD

Note that the size of each file is 0 bytes. Now insert a row into tnhash whose id column value is NULL and verify that this row was inserted:

mysql> INSERT INTO tnhash VALUES (NULL, 'sam');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM tnhash;
+------+------+
| id   | name |
+------+------+
| NULL | sam  |
+------+------+
1 row in set (0.01 sec)

Recall that for any integer N, the value of NULL MOD N is always NULL. This result is treated for determining the correct partition as 0. Returning to the system shell (still assuming bash for this purpose), we can see that the value was inserted into the first partition (named p0 by default) by listing the data files once again:

var/lib/mysql/test> ls *.MYD -l
-rw-rw----  1 mysql mysql 20 2005-11-04 18:44 tnhash_p0.MYD
-rw-rw----  1 mysql mysql  0 2005-11-04 18:41 tnhash_p1.MYD

You can see that the INSERT statement modified only the file tnhash_p0.MYD, increasing its size on disk, without affecting the other data file.

Suppose we have a table such as this one:

CREATE TABLE tndate (
    id INT,
    dt DATE
)
PARTITION BY RANGE( YEAR(dt) ) (
    PARTITION p0 VALUES LESS THAN (1990),
    PARTITION p1 VALUES LESS THAN (2000),
    PARTITION p2 VALUES LESS THAN MAXVALUE
);

As with other MySQL functions, YEAR(NULL) returns NULL. A row with a dt column value of NULL is treated as though the partitioning expression evaluated to 0, and is inserted into partition p0.

18.3. Partition Management

MySQL 5.1 provides a number of ways to modify partitioned tables. It is possible to add, drop, redefine, merge, or split existing partitions. All of these actions can be carried out using the partitioning extensions to the ALTER TABLE command (see Section 13.1.2, “ALTER TABLE Syntax”, for syntax definitions). There are also ways to obtain information about partitioned tables and partitions. We discuss these topics in the sections that follow.

Note: In MySQL 5.1, all partitions of a partitioned table must have the same number of subpartitions, and it is not possible to change the subpartitioning once the table has been created.

Important: Currently, ALTER TABLE ... PARTITION BY ... is accepted as valid syntax by a MySQL server built from the 5.1 tree, but the statement actually does nothing. We expect to implement the behavior of this statement as described below by the time that MySQL 5.1 reaches production status.

To change a table's partitioning scheme, it is necessary only to use the ALTER TABLE command with a partition_options clause. This clause has the same syntax as that as used with CREATE TABLE for creating a partitioned table, and always begins with the keywords PARTITION BY. For example, suppose you have a table partitioned by range using the following CREATE TABLE statement:

CREATE TABLE trb3 (id INT, name VARCHAR(50), purchased DATE)
    PARTITION BY RANGE( YEAR(purchased) ) (
        PARTITION p0 VALUES LESS THAN (1990),
        PARTITION p1 VALUES LESS THAN (1995),
        PARTITION p2 VALUES LESS THAN (2000),
        PARTITION p3 VALUES LESS THAN (2005)
    );

To repartition this table so that it is partitioned by key into two partitions using the id column value as the basis for the key, you can use this statement:

ALTER TABLE trb3 PARTITION BY KEY(id) PARTITIONS 2;

This has the same effect on the structure of the table as dropping the table and re-creating it using CREATE TABLE trb3 PARTITION BY KEY(id) PARTITIONS 2;.

18.3.1. Management of RANGE and LIST Partitions

Range and list partitions are very similar with regard to how the adding and dropping of partitions are handled. For this reason we discuss the management of both sorts of partitioning in this section. For information about working with tables that are partitioned by hash or key, see Section 18.3.2, “Management of HASH and KEY Partitions”. Dropping a RANGE or LIST partition is more straightforward than adding one, so we discuss this first.

Dropping a partition from a table that is partitioned by either RANGE or by LIST can be accomplished using the ALTER TABLE command with a DROP PARTITION clause. Here is a very basic example, which supposes that you have already created a table which is partitioned by range and then populated with 10 records using the following CREATE TABLE and INSERT statements:

mysql> CREATE TABLE tr (id INT, name VARCHAR(50), purchased DATE)
    ->     PARTITION BY RANGE( YEAR(purchased) ) (
    ->         PARTITION p0 VALUES LESS THAN (1990),
    ->         PARTITION p1 VALUES LESS THAN (1995),
    ->         PARTITION p2 VALUES LESS THAN (2000),
    ->         PARTITION p3 VALUES LESS THAN (2005)
    ->     );
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO tr VALUES
    ->     (1, 'desk organiser', '2003-10-15'),
    ->     (2, 'CD player', '1993-11-05'),
    ->     (3, 'TV set', '1996-03-10'),
    ->     (4, 'bookcase', '1982-01-10'),
    ->     (5, 'exercise bike', '2004-05-09'),
    ->     (6, 'sofa', '1987-06-05'),
    ->     (7, 'popcorn maker', '2001-11-22'),
    ->     (8, 'aquarium', '1992-08-04'),
    ->     (9, 'study desk', '1984-09-16'),
    ->     (10, 'lava lamp', '1998-12-25');
Query OK, 10 rows affected (0.01 sec)                  

You can see which items should have been inserted into partition p2 as shown here:

mysql> SELECT * FROM tr
    -> WHERE purchased BETWEEN '1995-01-01' AND '1999-12-31';
+------+-----------+------------+
| id   | name      | purchased  |
+------+-----------+------------+
|    3 | TV set    | 1996-03-10 |
|   10 | lava lamp | 1998-12-25 |
+------+-----------+------------+
2 rows in set (0.00 sec)

To drop the partition named p2, execute the following command:

mysql> ALTER TABLE tr DROP PARTITION p2;
Query OK, 0 rows affected (0.03 sec)

Note: In MySQL 5.1, the NDB Cluster storage engine does not support ALTER TABLE ... DROP PARTITION. It does, however, support the other partitioning-related extensions to ALTER TABLE that are described in this chapter.

It is very important to remember that, when you drop a partition, you also delete all the data that was stored in that partition. You can see that this is the case by re-running the previous SELECT query:

mysql> SELECT * FROM tr WHERE purchased 
    -> BETWEEN '1995-01-01' AND '1999-12-31';
Empty set (0.00 sec)

If you wish to drop all data from all partitions while preserving the table definition and its partitioning scheme, use the TRUNCATE TABLE command. (See Section 13.2.9, “TRUNCATE Syntax”.)

If you intend to change the partitioning of a table without losing data, use ALTER TABLE ... REORGANIZE PARTITION instead. See below or in Section 13.1.2, “ALTER TABLE Syntax”, for information about REORGANIZE PARTITION.

If you now execute a SHOW CREATE TABLE command, you can see how the partitioning makeup of the table has been changed:

mysql> SHOW CREATE TABLE tr\G
*************************** 1. row ***************************
       Table: tr
Create Table: CREATE TABLE `tr` (
  `id` int(11) default NULL,
  `name` varchar(50) default NULL,
  `purchased` date default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 
PARTITION BY RANGE ( YEAR(purchased) ) (
  PARTITION p0 VALUES LESS THAN (1990) ENGINE = MyISAM, 
  PARTITION p1 VALUES LESS THAN (1995) ENGINE = MyISAM, 
  PARTITION p3 VALUES LESS THAN (2005) ENGINE = MyISAM
)
1 row in set (0.01 sec)

When you insert new rows into the changed table with purchased column values between '1995-01-01' and '2004-12-31' inclusive, those rows will be stored in partition p3. You can verify this as follows:

mysql> INSERT INTO tr VALUES (11, 'pencil holder', '1995-07-12');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM tr WHERE purchased 
    -> BETWEEN '1995-01-01' AND '2004-12-31';
+------+----------------+------------+
| id   | name           | purchased  |
+------+----------------+------------+
|   11 | pencil holder  | 1995-07-12 |
|    1 | desk organiser | 2003-10-15 |
|    5 | exercise bike  | 2004-05-09 |
|    7 | popcorn maker  | 2001-11-22 |
+------+----------------+------------+
4 rows in set (0.00 sec)

mysql> ALTER TABLE tr DROP PARTITION p3;
Query OK, 0 rows affected (0.03 sec)

mysql> SELECT * FROM tr WHERE purchased 
    -> BETWEEN '1995-01-01' AND '2004-12-31';
Empty set (0.00 sec)

Note that the number of rows dropped from the table as a result of ALTER TABLE ... DROP PARTITION is not reported by the server as it would be by the equivalent DELETE query.

Dropping LIST partitions uses exactly the same ALTER TABLE ... DROP PARTITION syntax as use for dropping RANGE partitions. However, there is one important difference in the effect this has on your use of the table afterwards: You can no longer insert into the table any rows having any of the values that were included in the value list defining the deleted partition. (See Section 18.2.2, “LIST Partitioning”, for an example.)

To add a new range or list partition to a previously partitioned table, use the ALTER TABLE ... ADD PARTITION statement. For tables which are partitioned by RANGE, this can be used to add a new range to the beginning or end of the list of existing partitions. For example, suppose that you have a partitioned table containing membership data for your organisation, which is defined as follows:

CREATE TABLE members (
    id INT, 
    fname VARCHAR(25),
    lname VARCHAR(25), 
    dob DATE
)
PARTITION BY RANGE( YEAR(dob) ) (
    PARTITION p0 VALUES LESS THAN (1970),
    PARTITION p1 VALUES LESS THAN (1980),
    PARTITION p2 VALUES LESS THAN (1990)
);

Suppose further that the minimum age for members is 16. As the calendar approaches the end of 2005, you realize that you will soon be admitting members who were born in 1990 (and later in years to come). You can modify the members table to accommodate new members born in the years 1990-1999 as shown here:

ALTER TABLE ADD PARTITION (PARTITION p3 VALUES LESS THAN (2000));

Important: With tables that are partitioned by range, you can use ADD PARTITION to add new partitions to the high end of the partitions list only. Trying to add a new partition in this manner between or before existing partitions will result in an error as shown here:

mysql> ALTER TABLE members
     >     ADD PARTITION (
     >     PARTITION p3 VALUES LESS THAN (1960));
ERROR 1463 (HY000): VALUES LESS THAN value must be strictly increasing for each partition

In a similar fashion, you can add new partitions to a table that is partitioned by LIST. For example, given a table defined like so:

CREATE TABLE tt (
    id INT, 
    data INT
)
PARTITION BY LIST(data) (
    PARTITION p0 VALUES IN (5, 10, 15),
    PARTITION p1 VALUES IN (6, 12, 18)
);

You can add a new partition in which to store rows having the data column values 7, 14, and 21 as shown:

ALTER TABLE tt ADD PARTITION (PARTITION p2 VALUES IN (7, 14, 21));

Note that you cannot add a new LIST partition encompassing any values that are already included in the value list of an existing partition. If you attempt to do so, an error will result:

mysql> ALTER TABLE tt ADD PARTITION 
     >     (PARTITION np VALUES IN (4, 8, 12));
ERROR 1465 (HY000): Multiple definition of same constant in list partitioning

Because any rows with the data column value 12 have already been assigned to partition p1, you cannot create a new partition on table tt that includes 12 in its value list. In order to accomplish this, you could drop p1, and add np and then a new p1 with a modified definition. However, as discussed earlier, this would result in the loss of all data stored in p1 — and it is often the case that this is not what you really want to do. Another solution might appear to be to make a copy of the table with the new partitioning and to copy the data into it using CREATE TABLE ... SELECT ..., then drop the old table and rename the new one, but this could be very time-consuming when dealing with a large amounts of data. This also might not be feasible in situations where high availability is a requirement.

Fortunately, MySQL's partitioning implementation provides ways to redefine partitions without losing data. Let us look first at a couple of simple examples involving RANGE partitioning. Recall the members table which is now defined as shown here:

mysql> SHOW CREATE TABLE members\G
*************************** 1. row ***************************
       Table: members
Create Table: CREATE TABLE `members` (
  `id` int(11) default NULL,
  `fname` varchar(25) default NULL,
  `lname` varchar(25) default NULL,
  `dob` date default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 
PARTITION BY RANGE ( YEAR(dob) ) (
  PARTITION p0 VALUES LESS THAN (1970) ENGINE = MyISAM, 
  PARTITION p1 VALUES LESS THAN (1980) ENGINE = MyISAM, 
  PARTITION p2 VALUES LESS THAN (1990) ENGINE = MyISAM.
  PARTITION p3 VALUES LESS THAN (2000) ENGINE = MyISAM
)

Suppose that you would like to move all rows representing members born before 1960 into a separate partition. As we have already seen, this cannot be done using ALTER TABLE ... ADD PARTITION. However, you can use another partition-related extension to ALTER TABLE in order to accomplish this:

ALTER TABLE members REORGANIZE PARTITION p0 INTO (
    PARTITION s0 VALUES LESS THAN (1960),
    PARTITION s1 VALUES LESS THAN (1970)
);

In effect, this command splits partition p0 into two new partitions s0 and s1. It also moves the data that was stored in p0 into the new partitions according to the rules embodied in the two PARTITION ... VALUES ... clauses, so that s0 contains only those records for which YEAR(dob) is less than 1960 and s1 contains those rows in which YEAR(dob) is greater than or equal to 1960 but less than 1970.

A REORGANIZE PARTITION clause may also be used for merging adjacent partitions. You can return the members table to its previous partitioning as shown here:

ALTER TABLE members REORGANIZE PARTITION s0,s1 INTO (
    PARTITION p0 VALUES LESS THAN (1970)
);

No data is lost in splitting or merging partitions using REORGANIZE PARTITION. In executing the above statement, MySQL moves all of the records that were stored in partitions s0 and s1 into partition p0.

The general syntax for REORGANIZE PARTITION is:

ALTER TABLE tbl_name 
    REORGANIZE PARTITION partition_list 
    INTO (partition_definitions);

Here, tbl_name is the name of the partitioned table, and partition_list is a comma-separated list of names of one or more existing partitions to be changed. partition_definitions is a comma-separated list of new partition definitions, which follow the same rules as for the partition_definitions list used in CREATE TABLE (see Section 13.1.5, “CREATE TABLE Syntax”). It should be noted that you are not limited to merging several partitions into one, or to splitting one partition into many, when using REORGANIZE PARTITION. For example, you can reorganize all four partitions of the members table into two, as follows:

ALTER TABLE members REORGANIZE PARTITION p0,p1,p2,p3 INTO (
    PARTITION m0 VALUES LESS THAN (1980),
    PARTITION m1 VALUES LESS THAN (2000)
);

You can also use REORGANIZE PARTITION with tables that are partitioned by LIST. Let us return to the problem of adding a new partition to the list-partitioned tt table and failing because the new partition had a value that was already present in the value-list of one of the existing partitions. We can handle this by adding a partition that contains only non-conflicting values, and then reorganizing the new partition and the existing one so that the value which was stored in the existing one is now moved to the new one:

ALTER TABLE tt ADD PARTITION (PARTITION np VALUES IN (4, 8));
ALTER TABLE tt REORGANIZE PARTITION p1,np INTO (
    PARTITION p1 VALUES IN (6, 18),
    PARTITION np VALUES in (4, 8, 12)
);

Here are some key points to keep in mind when using ALTER TABLE ... REORGANIZE PARTITION to repartition tables that are partitioned by RANGE or LIST:

  • The PARTITION clauses used to determine the new partitioning scheme are subject to the same rules as those used with a CREATE TABLE statement.

    Most importantly, you should remember that the new partitioning scheme cannot have any overlapping ranges (applies to tables partitioned by RANGE) or sets of values (when reorganizing tables partitioned by LIST).

    Note: Prior to MySQL 5.1.4, you could not reuse the names of existing partitions in the INTO clause, even when those partitions were being dropped or redefined. See Section D.1.3, “Changes in release 5.1.4 (21 December 2005)”, for more information.

  • The combination of partitions in the partition_definitions list should account for the same range or set of values overall as the combined partitions named in the partition_list.

    For instance, in the members table used as an example in this section, partitions p1 and p2 together cover the years 1980 through 1999. Therefore, any reorganization of these two partitions should cover the same range of years overall.

  • For tables partitioned by RANGE, you can reorganize only adjacent partitions; you cannot skip over range partitions.

    For instance, you could not reorganize the members table used as an example in this section using a statement beginning with ALTER TABLE members REORGANIZE PARTITION p0,p2 INTO ... because p0 covers the years prior to 1970 and p2 the years from 1990 through 1999 inclusive, and thus the two are not adjacent partitions.

  • You cannot use REORGANIZE PARTITION to change the table's partitioning type; that is, you cannot (for example) change RANGE partitions to HASH partitions or vice versa. You also cannot use this command to change the partitioning expression or column. To accomplish either of these tasks without dropping and re-creating the table, you can use ALTER TABLE ... PARTITION BY .... For example:

    ALTER TABLE members 
        PARTITION BY HASH(YEAR(dob))
        PARTITIONS 8;
    

    Note: In MySQL 5.1 5.1.5-alpha, ALTER TABLE ... PARTITION BY ... is not yet implemented. Instead, you must either drop and re-create the table using the desired partitioning, or — if you need to retain data already stored in the table — you can use CREATE TABLE ... SELECT ... to create the new table and copy the data from the old one, and then drop the old table, renaming the new one as a final step if desired.

18.3.2. Management of HASH and KEY Partitions

Tables which are partitioned by hash or by key are very similar to one another with regard to making changes in a partitioning setup, and both differ in a number of ways from tables which have been partitioned by range or list. For that reason, this section addresses the modification of tables partitioned by hash or by key only. For a discussion of adding and dropping of partitions of tables that are partitioned by range or list, see Section 18.3.1, “Management of RANGE and LIST Partitions”.

You cannot drop partitions from tables that are partitioned by HASH or KEY in the same way that you can from tables that are partitioned by RANGE or LIST. However, you can merge HASH or KEY partitions using the ALTER TABLE ... COALESCE PARTITION command. For example, suppose you have a table containing data about clients, which is divided into twelve partitions. The clients table is defined as shown here:

CREATE TABLE clients (
    id INT,
    fname VARCHAR(30),
    lname VARCHAR(30),
    signed DATE
)
PARTITION BY HASH( MONTH(signed) )
PARTITIONS 12;

To reduce the number of partitions from twelve to six, execute the following ALTER TABLE command:

mysql> ALTER TABLE clients COALESCE PARTITION 6;
Query OK, 0 rows affected (0.02 sec)

COALESCE works equally well with tables that are partitioned by HASH, KEY, LINEAR HASH, or LINEAR KEY. Here is an example similar to the previous one, differing only in that the table is partitioned by LINEAR KEY:

mysql> CREATE TABLE clients_lk (
    ->     id INT,
    ->     fname VARCHAR(30),
    ->     lname VARCHAR(30),
    ->     signed DATE
    -> )
    -> PARTITION BY LINEAR KEY(signed)
    -> PARTITIONS 12;
Query OK, 0 rows affected (0.03 sec)

mysql> ALTER TABLE clients_lk COALESCE PARTITION 6;
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

COALESCE cannot be used to increase the number of partitions, and if you attempt to do so, the result is an error like the one shown:

mysql> ALTER TABLE clients COALESCE PARTITION 18;
ERROR 1478 (HY000): Cannot remove all partitions, use DROP TABLE instead

To increase the number of partitions for the clients table from 12 to 18. use ALTER TABLE ... ADD PARTITION as shown here:

ALTER TABLE clients ADD PARTITION PARTITIONS 18;

Note: ALTER TABLE ... REORGANIZE PARTITION cannot be used with tables that are partitioned by HASH or KEY.

18.3.3. Maintenance of Partitions

Note: The commands discussed in this section are not yet actually implemented in MySQL 5.1 and are presented for purposes of eliciting feedback from users testing the software during the 5.1 pre-production development cycle. (In other words, “Please do not file bugs saying that these commands do not work.”) This information is highly subject to change as MySQL 5.1 partitioning development continues. We will update this section as our partitioning features are implemented and improved.

A number of partitioning maintenance tasks can be carried out in MySQL 5.1. MySQL does not support the commands CHECK TABLE, OPTIMIZE TABLE, ANALYZE TABLE, or REPAIR TABLE for partitioned tables. Instead, you can use a number of extensions to ALTER TABLE can be used for performing these operations on one or more partitions directly, as described in the following list:

  • Rebuilding partitions: Rebuilds the partition; this has the same effect as dropping all records stored in the partition, then reinserting them. This can be useful for purposes of defragmentation.

    Example:

    ALTER TABLE t1 REBUILD PARTITION (p0, p1);
    
  • Optimizing partitions: If you have deleted a large number of rows from a partition or if you have made many changes to a partitioned table with variable-length rows (that is, having VARCHAR, BLOB, or TEXT columns), you can use ALTER TABLE ... OPTIMIZE PARTITION to reclaim any unused space and to defragment the partition data file.

    Example:

    ALTER TABLE t1 OPTIMIZE PARTITION (p0, p1);
    

    Using OPTIMIZE PARTITION on a given partition is equivalent to running CHECK PARTITION, ANALYZE PARTITION, and REPAIR PARTITION on that partition.

  • Analyzing partitions: This reads and stores the key distributions for partitions.

    Example:

    ALTER TABLE t1 ANALYZE PARTITION (p3);
  • Repairing partitions: This repairs corrupted partitions.

    Example:

    ALTER TABLE t1 REPAIR PARTITION (p0,p1);
    
  • Checking partitions: You can check partitions for errors in much the same way that you can use CHECK TABLE with non-partitioned tables.

    Example:

    ALTER TABLE trb3 CHECK PARTITION (p1);
    

    This command will tell you if the data or indexes in partition p1 of table t1 are corrupted. If this is the case, use ALTER TABLE ... REPAIR PARTITION to repair the partition.

You can also use the mysqlcheck or myisamchk utility to accomplish these tasks, operating on the separate .MYI files generated by partitioning a table. See Section 8.8, “mysqlcheck — A Table Maintenance and Repair Program”. (This is already available in the pre-alpha code.)

18.3.4. Obtaining Information About Partitions

This section discusses obtaining information about existing partitions. This functionality is still in the planning stages, so what is described here actually at this time serves as a survey of what we intend to implement in MySQL 5.1.

As discussed elsewhere in this chapter, SHOW CREATE TABLE includes in its output the PARTITION BY clause used to create a partitioned table. For example:

mysql> SHOW CREATE TABLE trb3\G
*************************** 1. row ***************************
       Table: trb3
Create Table: CREATE TABLE `trb3` (
  `id` int(11) default NULL,
  `name` varchar(50) default NULL,
  `purchased` date default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 
PARTITION BY RANGE (YEAR(purchased)) (
  PARTITION p0 VALUES LESS THAN (1990) ENGINE = MyISAM, 
  PARTITION p1 VALUES LESS THAN (1995) ENGINE = MyISAM, 
  PARTITION p2 VALUES LESS THAN (2000) ENGINE = MyISAM, 
  PARTITION p3 VALUES LESS THAN (2005) ENGINE = MyISAM
)
1 row in set (0.00 sec)

Note: Currently the PARTITIONS clause is not shown for tables partitioned by HASH or KEY. (Bug #14327)

SHOW TABLE STATUS works with partitioned tables, and its output is the same as that for non-partitioned tables, except that the Engine column always contains the value 'PARTITION'. (See Section 13.5.4.18, “SHOW TABLE STATUS Syntax”, for more information about this command.) To obtain status information for individual partitions, we plan to implement a SHOW PARTITION STATUS command (see below).

Two additional SHOW commands are planned for use with partitioned tables:

  • SHOW PARTITIONS

    This command is expected to work similarly to SHOW TABLES and SHOW DATABASES, except that it will list partitions rather than tables or databases. The output from this command will likely consist of a single column named Partitions_in_tbl_name, where tbl_name is the name of the partitioned table. It is not possible to “select” a table in the sense that one selects a database and it thereafter serves as the default database for SHOW TABLES, so it is likely that SHOW PARTITIONS will require the use of a FROM clause so that MySQL knows which table is intended.

  • SHOW PARTITION STATUS

    This command will provide detailed status information about one or more partitions. Its output will likely contain the same columns as or columns similar to those found in the output of SHOW TABLE STATUS, with the addition of columns showing the data and index directories used for the partition. This command is likely to support LIKE and FROM clauses that will make it possible to obtain information about a given partition by name, or about partitions belonging to specific table or database.

Planning is also underway to augment the INFORMATION_SCHEMA database to provide information about partitioned tables and partitions. This planning is currently in a very early phase; we will update the relevant portions of the Manual as additional information becomes available and any new partitioning-related extensions to INFORMATION_SCHEMA are implemented.