devloop icon
Metastores

database performance: test setup: Test Code


Beyond JDBC

JDBC is a great abstaction layer for Java which allows us to write standard SQL compliant code and only rely on the JDBC URL settings to attach to one database or another. Anyway, that's the theory.

Even if some of its limitations are being addressed (most notably in Java 1.6 with the upgrade to JDBC 4.0), some issues inherent to the diversity of the backends remain, like the large variations in the interpretation of the standard datatypes, the syntax for creating tables and constraints, etc.

This is where the mts framework comes to the rescue by providing an abstraction layer on top of JDBC which allows us to write Java code that gets translated into the appropriate SQL code for each backend. ie:

DBTable table = new SimpleTable(this.db.getDataSource(), "MyTableName");
PrimaryKey pkc = new PrimaryKey("PKConstrName", "ID");
table.addConstraint(pkc);
this.db.getBuilder().doCreateTable(table);

Generally, specifying primary keys isn't too much of a problem but nullability and uniqueness constraints, foreigns keys and indexes often are.
Mts also provides datatype abstraction, allowing us to retrieve the suitable database datatype to use based on what needs to be stored in it rather than a name:
DBProtoType type = db.getDetector().getDBProtoType(Integer.class);
This will retrieve the prototype suitable for storing integers, the actual type instance can be obtained by calling:
DBTypeInstance typeInstance = type.getDBType();
This particularly useful parameterized types like VARCHAR and DECIMAL, where the prototype is just the generic SQL name, but depending on what needs to be stored the actual type generated may range from a VARCHAR(1) to a VARCHAR(NNNN).

Generic Code Concepts

All the test classes sub-class the abstract class PerformanceTestTemplate, which provides utility functions common to all performance tests.
To make it easier to change the test settings, the specifications of the tables are stored in text files which can be changed without recompiling the test code. ie:
columns=i1:integer,i2:integer,s1:varchar,s2:varchar
In order to measure the elapsed time accurately, we use Java's System.nanoTime();, giving us precision down to the nanosecond (supposedly).
If a test is taking too long, it will timeout and stop.

Before starting each test, we ensure that the load average per cpu is not above a certain threshold - giving the system a few minutes to recover if it is still high from running the previous test. This should prevent a previous test from having an impact on the next one, altough some database may decide to trigger background activites at any time.

Connections and Pooling

One of the most expensive aspects of interacting with a database is making the initial connection.
To reduce the impact of making new connections, we use some well known techniques:

General Operation

The obvious 4 main operations tested here are: INSERT, UPDATE, DELETE an SELECT, other tests do exist but are not as extensive.
The tests simply create a certain amount of initial data in a table, then measure the time it takes to perform a number of operations of a certain type.
Creating the schema can take a while on some databases and is not part of the measurements since this is something that does not need to be done often in the real world.

Non-Threaded Tests

These tests perform each operation sequencially in a single thread, typically repeating the same operation many times.
We vary the amount of initial data, the distribution of the data ("group size"), toggle on or off the usage of prepared statements and indexes.
We record the number of statements executed divided by the elapsed time.

Here is an example of the statements used for the Insert tests (MySQL backend syntax):

CREATE TABLE insert_0 (s2 VARCHAR(255), s1 VARCHAR(255), i2 INTEGER, i1 INTEGER);
CREATE INDEX idx_s2_1993 ON insert_0 (s2);
(...)
INSERT INTO insert_0(s2, s1, i2, i1) VALUES ('str997', 'str998', 999, 1000);
(...)
SELECT s2,s1,i2,i1 FROM insert_0;

Multi-Threaded Tests

The tests start by setting up a number of threads, each thread's actual work is reduced to a minimum by preparing as much work as possible before the thread actually starts. Then all the threads are started at once.
We measure both the elapsed time (from the moment the first thread is started until the last thread has finished) and the cumulated thread time (the cumulated time spent in each thread).
Again, we vary the amount of initial data, the distribution of the data ("group size") as well as the number of threads started.
The schema and statements are almost identical to the non-threaded tests.

Links

Jump to page:

Index