Updated 2021-05-17
MySQL
Introduction¶
This guide is intended to help users that need to work with their own MySQL databases. Currently, this guide works with RHEL6-based PACE systems.
Attribution¶
This documentation guide was started fully (with copy and paste) using the following the template of SRCC Sherlock, their license is GPL v3.0, with re-edits to cater to GT PACE computing. More specifically, GT PACE uses Moab for scheduling instead of Slurm. In addition, this guide adds the use of a specific database that comes in entirety from the Github project datacharmer/test_db.
MySQL is a relational database management system. MariaDB is also completely compatible with MySQL and could be use as a drop-in replacement in the vast majority of cases.
MySQL on PACE¶
There is no centralized database service at PACE, but PACE can provide centralized installation of MySQL, and each user is welcome to start their own instance of the database server to fit their jobs' needs.
Running a MySQL database entails the following:
- Configure and initialize your environment so you can start a database instance under your user account.
- Start the database server.
- Run SQL queries from the same node (via a local socket), or from other nodes and/or jobs (via the network).
Single-node access¶
In this example, the database server and client will run within the same job, on the same compute node.
Preparation¶
You first need to let MySQL know where to store its database, where to log things, and how to allow connections from clients. The commands below only need to be executed once.
For this, you'll need to create a .my.cnf
file in your home directory.
Assuming you'll want to store your database files in a db/
directory in your
$SCRATCH
folder, you can run the following commands:
$ export SCRATCH=$HOME/scratch
$ export DB_DIR=$SCRATCH/db
$ mkdir $DB_DIR
$ cat << EOF > ~/.my.cnf
[mysqld]
datadir=$DB_DIR
socket=$DB_DIR/mysqldb.sock
user=$USER
symbolic-links=0
skip-networking
[mysqld_safe]
log-error=$DB_DIR/mysqldb.log
pid-file=$DB_DIR/mysqldb.pid
[mysql]
socket=$DB_DIR/mysqldb.sock
EOF
.my.cnf
doesn't support environment variables
Please note that if you edit your ~/.my.cnf
file directly in a file
editor, without using the HEREDOC syntax above,
environment variables such as $DB_DIR
, $HOME
or $USER
won't work: you
will need to specify absolute paths explicitely, such as
$HOME/scratch/db/mysqld.log
.
If you use the HEREDOC syntax, you can verify that the resulting .my.cnf
file does actually contain full paths, and not environment variables
anymore.
Once you have the .my.cnf
file in place, you need to initialize your database
with some internal data that MySQL needs. In the same terminal, run the
following commands:
$ mysql_install_db --datadir=$DB_DIR
Start the server¶
You can now start the MySQL server. For this, first get an allocation on a
compute node, note the hostname of the compute node your job has been
allocated, then run the mysqld_safe
process:
$ qsub -I -q inferno -l nodes=1:ppn=1,walltime=01:00:00
$ cat $PBS_NODEFILE
iw-c39-29-r.pace.gatech.edu
$ mysqld_safe
190418 14:57:29 mysqld_safe Logging to '/nv/hp17/poweruser/scratch/db/mysqldb.log'.
190418 14:57:29 mysqld_safe Starting mysqld daemon with databases from /nv/hp17/poweruser/scratch/db
The mysqld_safe
will be blocking, meaning it will not give the prompt back
for as long as the MySQL server runs.
If it does return on its own, it probably means that something went wrong, and
you'll find more information about the issue in the $DB_DIR/mysqld.log
file
you defined in ~/.my.cnf
.
Run queries¶
You're now ready to run queries against that MySQL instance, from the same node your job is running on.
From another terminal on PACE, connect to your job's compute node (here, it's
iw-c39-33-l
, as shown above), and then run the mysql
command: it will open
the MySQL shell, ready to run your SQL queries:
$ ssh iw-c39-33-l
$ mysql
Secure access¶
We will now set a password for the MySQL root
user to a random string,
just for the purpose of preventing unauthorized access, since we won't need it for
anything.
We will actually create a MySQL user with all privileges on the databases,
that will be able to connect to this instance from any node. This user will
need a real password, though. So please make sure to replace the
my-secure-password
string below by the actual password of your choice.
Choose a proper password
This password will only be used to access this specific instance of MySQL. Note that anybody knowing that password will be allowed to connect to your MySQL instances and modify data in the tables.
- do NOT use
my-secure-password
- do NOT use your GTID password
Once you've chosen your password, you can start the mysqld
process on a
compute node, like before:
$ qsub -I -q inferno -l nodes=1:ppn=1,walltime=01:00:00
$ cat $PBS_NODEFILE
iw-c39-29-r.pace.gatech.edu
$ mysqld_safe
And then, from another terminal, run the following commands to secure access to your MySQL database.
$ ssh iw-c39-29-r
$ mysql -u root << EOF
UPDATE mysql.user SET Password=PASSWORD(RAND()) WHERE User='root';
DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1');
DELETE FROM mysql.user WHERE User='';
DELETE FROM mysql.db WHERE Db='test' OR Db='test_%';
GRANT ALL PRIVILEGES ON *.* TO '$USER'@'%' IDENTIFIED BY 'my-secure-password' WITH GRANT OPTION;
FLUSH PRIVILEGES;
EOF
Once you've done that, you're ready to terminate that interactive job, and start a dedicated MySQL server job.
Work with a sample database¶
It may be instructive to provide an example of installing a database, and querying it. Working from the node that is serving the database:
$ ssh iw-c39-33-l # or your existing allocated node
$ cd $HOME/scratch
$ module load git
$ git clone https://github.com/datacharmer/test_db.git
$ cd test_db
$ mysql < employees.sql # import it
Enter password:
INFO
CREATING DATABASE STRUCTURE
INFO
LOADING departments
INFO
LOADING employees
INFO
LOADING dept_emp
INFO
LOADING dept_manager
INFO
LOADING titles
INFO
LOADING salaries
data_load_time_diff
00:00:21
$ mysql -t < test_employees_md5.sql # test it
+----------------------+
| INFO |
+----------------------+
| TESTING INSTALLATION |
+----------------------+
+--------------+------------------+----------------------------------+
| table_name | expected_records | expected_crc |
+--------------+------------------+----------------------------------+
| employees | 300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments | 9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager | 24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp | 331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles | 443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries | 2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
+--------------+------------------+----------------------------------+
| table_name | found_records | found_crc |
+--------------+------------------+----------------------------------+
| employees | 300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments | 9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager | 24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp | 331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles | 443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries | 2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
+--------------+---------------+-----------+
| table_name | records_match | crc_match |
+--------------+---------------+-----------+
| employees | OK | ok |
| departments | OK | ok |
| dept_manager | OK | ok |
| dept_emp | OK | ok |
| titles | OK | ok |
| salaries | OK | ok |
+--------------+---------------+-----------+
+------------------+
| computation_time |
+------------------+
| 00:00:06 |
+------------------+
+---------+--------+
| summary | result |
+---------+--------+
| CRC | OK |
| count | OK |
+---------+--------+
Once you're done with your MySQL instance, you can just terminate your job, and all the processes will be terminated automatically.
Multi-node access¶
In case you need to run a more persistent instance of MySQL, you can submit a dedicated job to run the server, make it accessible over the network, and run queries from other jobs and/or nodes.
Enable network access¶
The preparation steps are pretty similar to the single-node case, except the MySQL server instance will be accessed over the network rather than through a local socket. Also, the same rules for secure access apply.
Network access must be secured
When running an networked instance of MySQL, please keep in mind that any
user on PACE will be able to connect to the TCP ports that mysqld
runs on, and that proper configuration must be done to prevent unauthorized
access.
Like in the single-node case, you need to create a ~/.my.cnf
file, but
without the skip-networking
directive.
$ export DB_DIR=$SCRATCH/db
$ mkdir $DB_DIR
$ cat << EOF > ~/.my.cnf
[mysqld]
datadir=$DB_DIR
socket=$DB_DIR/mysqldb.sock
user=$USER
symbolic-links=0
[mysqld_safe]
log-error=$DB_DIR/mysqldb.log
pid-file=$DB_DIR/mysqldb.pid
[mysql]
socket=$DB_DIR/mariadb.sock
EOF
And then initiate the database:
$ mysql_install_db --datadir=$DB_DIR
Start MySQL in a job¶
You can use the following mysql.pbs
job as a template:
#!/bin/bash
#PBS -J mysqldb
#PBS -Q inferno
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
mysqld_safe
and submit it with:
$ qsub mysqldb.pbs
Concurrent instances will lead to data corruption
An important thing to keep in mind is that having multiple instances of a MySQL server running at the same time, using the same database files, will certainly lead to catastrophic situations and the corruption of those files.
Connect to the running instance¶
Now, from any node on PACE, whether from a login node, an interactive job, or a
batch job, using the mysql
CLI or any application binding in any language,
you should be able to connect to your running MySQL instance,
First, identify the node your job is running on with qstat
:
$ qstat -u $USER -n1 25434450
shared-sched.pace.gatech.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
25434450.shared-sched. poweruser iw-share mysqldb 29037 1 1 -- 01:00:00 R 00:11:41 iw-c39-29-l
and then, point your MySQL/MariaDB client to that node:
$ mysql -h iw-c39-29-l -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 15
Server version: 10.2.11-MariaDB Source distribution
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
"site_files/docs/software/mysql.md" 396L, 13133C written
That's it! You can now run SQL queries from anywhere on PACE to your own MySQL instance.
Persistent DB instances¶
SQL data is persistent
All the data you import in your SQL databases will be persistent across
jobs. Meaning that you can run a MySQL server job for the day, import
data in its database, stop the job, and resubmit the same MySQL server
job the next day: all your data will still be there as long as the location
you've chosen for your database (the $DB_DIR
defined in the
Preparation steps) is on a persistent storage location.