Updated 2021-05-17

MySQL

Introduction

This guide is intended to help users that need to work with their own MySQL databases. Currently, this guide works with RHEL6-based PACE systems.

Attribution

This documentation guide was started fully (with copy and paste) using the following the template of SRCC Sherlock, their license is GPL v3.0, with re-edits to cater to GT PACE computing. More specifically, GT PACE uses Moab for scheduling instead of Slurm. In addition, this guide adds the use of a specific database that comes in entirety from the Github project datacharmer/test_db.

MySQL is a relational database management system. MariaDB is also completely compatible with MySQL and could be use as a drop-in replacement in the vast majority of cases.

MySQL on PACE

There is no centralized database service at PACE, but PACE can provide centralized installation of MySQL, and each user is welcome to start their own instance of the database server to fit their jobs' needs.

Running a MySQL database entails the following:

  1. Configure and initialize your environment so you can start a database instance under your user account.
  2. Start the database server.
  3. Run SQL queries from the same node (via a local socket), or from other nodes and/or jobs (via the network).

Single-node access

In this example, the database server and client will run within the same job, on the same compute node.

Preparation

You first need to let MySQL know where to store its database, where to log things, and how to allow connections from clients. The commands below only need to be executed once.

For this, you'll need to create a .my.cnf file in your home directory. Assuming you'll want to store your database files in a db/ directory in your $SCRATCH folder, you can run the following commands:

$ export SCRATCH=$HOME/scratch
$ export DB_DIR=$SCRATCH/db
$ mkdir $DB_DIR

$ cat << EOF > ~/.my.cnf
[mysqld]
datadir=$DB_DIR
socket=$DB_DIR/mysqldb.sock
user=$USER
symbolic-links=0
skip-networking

[mysqld_safe]
log-error=$DB_DIR/mysqldb.log
pid-file=$DB_DIR/mysqldb.pid

[mysql]
socket=$DB_DIR/mysqldb.sock
EOF

.my.cnf doesn't support environment variables

Please note that if you edit your ~/.my.cnf file directly in a file editor, without using the HEREDOC syntax above, environment variables such as $DB_DIR, $HOME or $USER won't work: you will need to specify absolute paths explicitely, such as $HOME/scratch/db/mysqld.log.

If you use the HEREDOC syntax, you can verify that the resulting .my.cnf file does actually contain full paths, and not environment variables anymore.

Once you have the .my.cnf file in place, you need to initialize your database with some internal data that MySQL needs. In the same terminal, run the following commands:

$ mysql_install_db --datadir=$DB_DIR

Start the server

You can now start the MySQL server. For this, first get an allocation on a compute node, note the hostname of the compute node your job has been allocated, then run the mysqld_safe process:

$ qsub -I -q inferno -l nodes=1:ppn=1,walltime=01:00:00
$ cat $PBS_NODEFILE
iw-c39-29-r.pace.gatech.edu
$ mysqld_safe
190418 14:57:29 mysqld_safe Logging to '/nv/hp17/poweruser/scratch/db/mysqldb.log'.
190418 14:57:29 mysqld_safe Starting mysqld daemon with databases from /nv/hp17/poweruser/scratch/db

The mysqld_safe will be blocking, meaning it will not give the prompt back for as long as the MySQL server runs.

If it does return on its own, it probably means that something went wrong, and you'll find more information about the issue in the $DB_DIR/mysqld.log file you defined in ~/.my.cnf.

Run queries

You're now ready to run queries against that MySQL instance, from the same node your job is running on.

From another terminal on PACE, connect to your job's compute node (here, it's iw-c39-33-l, as shown above), and then run the mysql command: it will open the MySQL shell, ready to run your SQL queries:

$ ssh iw-c39-33-l
$ mysql

Secure access

We will now set a password for the MySQL root user to a random string, just for the purpose of preventing unauthorized access, since we won't need it for anything.

We will actually create a MySQL user with all privileges on the databases, that will be able to connect to this instance from any node. This user will need a real password, though. So please make sure to replace the my-secure-password string below by the actual password of your choice.

Choose a proper password

This password will only be used to access this specific instance of MySQL. Note that anybody knowing that password will be allowed to connect to your MySQL instances and modify data in the tables.

  • do NOT use my-secure-password
  • do NOT use your GTID password

Once you've chosen your password, you can start the mysqld process on a compute node, like before:

$ qsub -I -q inferno -l nodes=1:ppn=1,walltime=01:00:00
$ cat $PBS_NODEFILE
iw-c39-29-r.pace.gatech.edu
$ mysqld_safe

And then, from another terminal, run the following commands to secure access to your MySQL database.

$ ssh iw-c39-29-r
$ mysql -u root << EOF
UPDATE mysql.user SET Password=PASSWORD(RAND()) WHERE User='root';
DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1');
DELETE FROM mysql.user WHERE User='';
DELETE FROM mysql.db WHERE Db='test' OR Db='test_%';
GRANT ALL PRIVILEGES ON *.* TO '$USER'@'%' IDENTIFIED BY 'my-secure-password' WITH GRANT OPTION;
FLUSH PRIVILEGES;
EOF

Once you've done that, you're ready to terminate that interactive job, and start a dedicated MySQL server job.

Work with a sample database

It may be instructive to provide an example of installing a database, and querying it. Working from the node that is serving the database:

$ ssh iw-c39-33-l # or your existing allocated node
$ cd $HOME/scratch
$ module load git
$ git clone https://github.com/datacharmer/test_db.git
$ cd test_db
$ mysql < employees.sql # import it
Enter password:
INFO
CREATING DATABASE STRUCTURE
INFO
LOADING departments
INFO
LOADING employees
INFO
LOADING dept_emp
INFO
LOADING dept_manager
INFO
LOADING titles
INFO
LOADING salaries
data_load_time_diff
00:00:21
$ mysql -t < test_employees_md5.sql # test it
+----------------------+
| INFO                 |
+----------------------+
| TESTING INSTALLATION |
+----------------------+
+--------------+------------------+----------------------------------+
| table_name   | expected_records | expected_crc                     |
+--------------+------------------+----------------------------------+
| employees    |           300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments  |                9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager |               24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp     |           331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles       |           443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries     |          2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
+--------------+------------------+----------------------------------+
| table_name   | found_records    | found_crc                        |
+--------------+------------------+----------------------------------+
| employees    |           300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments  |                9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager |               24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp     |           331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles       |           443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries     |          2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
+--------------+---------------+-----------+
| table_name   | records_match | crc_match |
+--------------+---------------+-----------+
| employees    | OK            | ok        |
| departments  | OK            | ok        |
| dept_manager | OK            | ok        |
| dept_emp     | OK            | ok        |
| titles       | OK            | ok        |
| salaries     | OK            | ok        |
+--------------+---------------+-----------+
+------------------+
| computation_time |
+------------------+
| 00:00:06         |
+------------------+
+---------+--------+
| summary | result |
+---------+--------+
| CRC     | OK     |
| count   | OK     |
+---------+--------+

Once you're done with your MySQL instance, you can just terminate your job, and all the processes will be terminated automatically.

Multi-node access

In case you need to run a more persistent instance of MySQL, you can submit a dedicated job to run the server, make it accessible over the network, and run queries from other jobs and/or nodes.

Enable network access

The preparation steps are pretty similar to the single-node case, except the MySQL server instance will be accessed over the network rather than through a local socket. Also, the same rules for secure access apply.

Network access must be secured

When running an networked instance of MySQL, please keep in mind that any user on PACE will be able to connect to the TCP ports that mysqld runs on, and that proper configuration must be done to prevent unauthorized access.

Like in the single-node case, you need to create a ~/.my.cnf file, but without the skip-networking directive.

$ export DB_DIR=$SCRATCH/db
$ mkdir $DB_DIR

$ cat << EOF > ~/.my.cnf
[mysqld]
datadir=$DB_DIR
socket=$DB_DIR/mysqldb.sock
user=$USER
symbolic-links=0

[mysqld_safe]
log-error=$DB_DIR/mysqldb.log
pid-file=$DB_DIR/mysqldb.pid

[mysql]
socket=$DB_DIR/mariadb.sock
EOF

And then initiate the database:

$ mysql_install_db --datadir=$DB_DIR

Start MySQL in a job

You can use the following mysql.pbs job as a template:

#!/bin/bash
#PBS -J mysqldb
#PBS -Q inferno
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00

mysqld_safe

and submit it with:

$ qsub mysqldb.pbs

Concurrent instances will lead to data corruption

An important thing to keep in mind is that having multiple instances of a MySQL server running at the same time, using the same database files, will certainly lead to catastrophic situations and the corruption of those files.

Connect to the running instance

Now, from any node on PACE, whether from a login node, an interactive job, or a batch job, using the mysql CLI or any application binding in any language, you should be able to connect to your running MySQL instance,

First, identify the node your job is running on with qstat:

$ qstat -u $USER -n1 25434450

shared-sched.pace.gatech.edu:
                                                                                  Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
25434450.shared-sched.  poweruser   iw-share mysqldb           29037     1      1       --   01:00:00 R  00:11:41   iw-c39-29-l

and then, point your MySQL/MariaDB client to that node:

$ mysql -h iw-c39-29-l -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 15
Server version: 10.2.11-MariaDB Source distribution

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
"site_files/docs/software/mysql.md" 396L, 13133C written

That's it! You can now run SQL queries from anywhere on PACE to your own MySQL instance.

Persistent DB instances

SQL data is persistent

All the data you import in your SQL databases will be persistent across jobs. Meaning that you can run a MySQL server job for the day, import data in its database, stop the job, and resubmit the same MySQL server job the next day: all your data will still be there as long as the location you've chosen for your database (the $DB_DIR defined in the Preparation steps) is on a persistent storage location.