Tuesday, January 16, 2018

Fun with Bugs #60 - On Some Memory Leaks, Replication and Other Bugs Fixed in MySQL 5.7.21

Oracle had formally released MySQL 5.7.21 yesterday. I do not bother any more to study MySQL release notes carefully and completely, but during a quick review today I've noted several interesting items I'd like you to pay attention to.

I am historically interested in InnoDB implementation details, so I could not miss Bug #87619 - "InnoDB partition table will lock into the near record as a condition in the use ". This was a regression bug in 5.7+, probably caused by new implementation of partitioning in InnoDB.

Another interesting bug is Bug #86927 - "Renaming a partitioned table does not update mysql.innodb_table_stats.", by Jean-François Gagné. It was yet another bug in InnoDB's persistent statistics (that I truly hate). What makes it especially interesting to me, though, is that it's the first public bug report I noted that mentioned MySQL 9.0.0 release as a target for the fix:
"Fixed as of the upcoming 5.7.21, 8.0.4, 9.0.0 release"
So, it's clear that back in October 2017 Oracle had already got a separate branch for upcoming MySQL 9.0.x! It also probably means that MySQL 8.0.x GA is coming really soon.

There are bug reports that are worth reading for technical reasons, others - only if you want to get some fun. Bug #86607 - "InnoDB crashed when master thread evict dict_table_t object" is agood example that covers both cases. Good to know the crash is fixed, but, please, make sure to read all comments there.

In this release I've noted fixes to several public bugs reported by Shane Bester. The first one of them is Bug #86573 - "foreign key cascades use excessive memory". Check how he used memory instrumentation in Performance Schema to demonstrate the problem! In Bug #86482 - "innodb leaks memory, performance_schema file_instances #sql-ib3129987-252773.ibd", he used similar approach to show potential memory leak in the Performance Schema itself ! Yet another bug that mentions 9.0.0 as a target version for the fix, among others... 

Bug #78048 - "INNODB Full text Case sensitive not working", is here both because I recently started to notice problems related to InnoDB FULLTEXT indexing, again (first time was soon after it was introduced), and because it has an MTR  test case contributed by Sveta Smirnova.


XA transactions support had always been problematic in MySQL  (still "Verified" Bug #87526 by Sveta Smirnova is one of recent examples how incomplete or useless it can be, see also MDEV-14593). Check the following bugs fixed in MySQL 5.7.21 if you use XA transactions:
  • Bug #87393 - "xa rollback with wrong xid will be recorded into the binlog". It was reported by HongXiang Jiang, who had also contributed a patch.
  • Bug #83295 - "replication error occurs, use xa transaction(one phase)". Yet another XA transactions problem reported by Hiroyuki Itoh and then confirmed by many affected users. Nice to see it fixed.
There are many fixes in MySQL 5.7.21 related to memory leaks. Two bug reports of this kind were from Przemyslaw Malkowski:
  • Bug #85371 - "Memory leak in multi-source replication when binlog_rows_query_log_events=1". Again, memory instrumentation of Performance Schema was used to demonstrate the problem. Vlad Lesin, also from Percona, contributed the patch for this bug.
  • Bug #85251 - "Memory leak in master-master GTID replication with sync_relay_log_info". Here Vlad Lesin, who had contributed the patch, also used Massif for the detailed analysis.
To summarize, I start to miss memory instrumentation in Performance Schema in MariaDB 10.x... This is a really useful feature.

I usually care about optimizer bugs, and these two attracted my attention:
  • Bug #87207 - "select distinct with secondary key for 'Using index for group-by' bad results". This nice optimizer regression bug was found by Shane Bester. As a workaround, while you do not use 5.7.21, you can try to set optimizer_switch='use_index_extensions=off'. I'd keep it that way by default...
  • Bug #72854 - "Extremely slow performance with outer joins and join buffer". I am happy to see this old optimizer bug reported by Sergey Petrunya from MariaDB finally fixed.
You can find a lot more details, including usual references to MySQL bug reports that are still private, in the Release Notes. Keep reading and consider upgrade :)

Tuesday, January 2, 2018

Fun with Bugs #59 - On MySQL Bug Reports I am Subscribed to, Part II

New Year (that starts on Monday!) gives a good opportunity to change something in our lives, start doing something new, better or different. Let's assume I failed with all these so far, as I am again posting about MySQL bugs here.

Since my previous post on this topic I've subscribed to 15 more MySQL bugs, and being on a combination of public holidays and vacation now gives me a good opportunity to review these bug reports.

Here they are, starting from the most recent:
  • Bug #89065 - "sync_binlog=1 on a busy server and slow binary log filesystem stalls slaves". I do not remember seeing multiple threads in "Finished reading one binlog; switching to next binlog" state, but it would be interesting to see this bug report processed properly.
  • Bug #89051 - "EXPLAIN output is different for same content depending when index was added". The way optimizer decides on "range" vs "ref" access is always interesting. Here, based on a recent comment by Øystein Grøvlen, the bug is actually that "Cost values are not correct when optimizer switch from ref-access to range-access in order to use more key parts".
  • Bug #88914 - "Potential null pointer dereference at pointer node->undo_recs (row0purge.cc)". It's funny to see many bugs becoming private as "security" ones and, at the same time, this bug, where reporter suspects it is exploitable, being "Open" and ignored for more than two weeks...
  • Bug #88891 - "Filtered replication leaves GTID holes with create database if not exists". I can not even explain how much I "like" all kinds of GTIDs I have to deal with, especially such a long lists of GTIDs that may be created in cases described in this report.
  • Bug #88863 - "COUNT(*) can sometimes return bogus value". Now, this is a really funny bug! It must be some race condition, and I'd really prefer to see this bug fixed soon.
  • Bug #88844 - "MySQL crash with InnoDB assertion failure in file pars0pars.cc". Nice crash (that I've never seen before) quickly reproduced by Shane Bester.
  • Bug #88834 - "Uneven slowdown on systems with many users". What can be better to speed up connection than checking the list of users one by one, especially when there are thousands of users?
  • Bug #88827 - "innodb uses too much space in the PK for concurrent inserts into the same table". As Mark Callaghan put it:
    "I am not sure my reproduction details will ever satisfy Sinisa but I don't mind if you don't fix this because I care more about MyRocks today and this bug makes MyRocks look better."
    We (Facebook's MySQL, MariaDB and Percona server users) do have MyRocks, but why poor Oracle MySQL users should suffer? Let's hope Sinisa Milivojevic will process the bug fast, with all the details clarified there :)
  • Bug #88791 - "Binary log for generated column contains new value as WHERE clause, not old value". Generated columns and binary logging, what could went wrong?
  • Bug #88764 - ""ALTER TABLE MODIFY..." takes time even if leaving table as is". Any simple test cases they come to my mind do NOT let to reproduce this problem, but I feel some potential as soon as more exotic cases like partitioning or data directory settings are considered. Let's wait for bug reporter to clarify.
  • Bug #88720 - "Inconsistent and unsafe FLUSH behavior in terms of replication". Nice summary of problems from Przemyslaw Malkowski. One more reason for me to hate GTIDs, honestly.
  • Bug #88694 - "MySQL accepts wildcard for database name for table level grant but won't use it". One more problem with privileges reported by Daniël van Eeden.
  • Bug #88674  - "Regression CREATE TBL from 5.7.17 to 20 (part #2: innodb_file_per_table = OFF)." and Bug #88673 - "Regression CREATE TBL from 5.7.17 to 20 (part #1: innodb_file_per_table = ON)." - these two bugs were reported by Jean-François Gagné and clearly show some things that are done wrong by Oracle when fixing (private, "security") bugs...
  • Bug #88671 - "ALL + BNL chosen over REF in query plan with a simple SELECT + JOIN". In this case optimizer (probably) does not take costs into account properly when choosing block nested loop join vs usual "ref" index access. Maybe just a matter of missing/wrong statistics also. It would be interesting to find out eventually.
  • Bug #88666 - "I_S FILES : all rows are displayed whatever the user privileges". Yet another bug report from Jocelyn Fournier. I am actually surprised with a number of bugs related to privileges that I notice recently.
  • Bug #88633 - "Auto_increment value on a table is less than max(id) can happen". It seems only MySQL 5.7.x is affected, but not 5.6.x.
  • Bug #88623 - "Modifying virtually generated column is not online". May be by design, but still surprising.
  • Bug #88534 - "XA may lost prepared transaction and cause different between master and slave". XA transactions with MySQL is still a sure way to disaster, I think. See Bug #87526 also that should appear in the next part of this series...
Stay tuned to more posts about MySQL bugs from me in the New Year of 2018!

Saturday, December 9, 2017

Fun with Bugs #58 - Bug of the Day From @mysqlbugs

In 2013 I had a habit of writing about MySQL bugs on Facebook almost every day. Typical post looked like this one, link to the bug and few words of wondering with a bit of sarcasm.
By the way, check last comments in Bug #68892 mentioned there - the problem of LOST_EVENTS in master's binary log and a way to workaround it still valid as of MySQL 5.7.17.
At that time I often got private messages from colleagues that Facebook is a wrong media for this kind of posts, these posts make MySQL look "buggy" etc, and eventually I was shut up (for a year or so) in a more or less official way by combined efforts of my employer at that time and Oracle MySQL officials.

A year later I started to write about MySQL bugs again on Facebook, but these posts were not regular any more, and maybe became a bit less sarcastic. Years later, I agree that Facebook should better be used for sharing photos of cats, or nice places, or family members, or even for hot political discussions, as these things annoy people much less than MySQL bugs. So, the need for different media for short annoying messages about MySQL bugs is clear to me. I do think the right media is Twitter (it is annoying by itself), and I am present there as @mysqlbugs since December 2012 actually. I am not a big fan of it historically and used to open it mostly to share information about yet another my blog post, but recently (after they allowed to write longer messages there) I decided to pay more attention to it (until its gone entirely and replaced by something else). So, I post a link to one MySQL bug there every day for a week or so, with the #bugoftheday tag. I quickly noticed that the tag was used by others to share photos of nice insects/bugs, but I don't mind for my posts to end up among those, and I am ready to share some of my related photos as well.

To summarize, I am going to write short messages about MySQL bugs regularly on Twitter. I try to write about some bug I recently notices just because it is new, improperly handled or was involved in some customer issue that I worked on. Let me share 5 last bugs mentioned there, so you can decide if it makes sense for you to follow me or #bugoftheday:
  • Bug #88827 - "innodb uses too much space in the PK for concurrent inserts into the same table". Interesting finding by Mark Callaghan during his recent benchmarking efforts. Still "Open".
  • Bug #88788 - "log_bin is not considered correctly an makes binary logging unusable!". This report by Oli Sennhauser (quickly declared a duplicate of my older Bug #75507) got a really nice number, and it's also interesting how efforts to name hosts in a nicely structured manner may play against poor DBA...
  • Bug #88776 - "Issues found by PVS-Studio static analyzer". Related post with the detailed analysis of problems found by the tool was mentioned few days ago by somebody from MariaDB on Slack, so I immediately noted that the bug comes from the same author, Sergey Vasiliev.
  • Bug #88765 - "This bug reporting form has a ridiculously short character limit for the bug syn". The bug report about MySQL bugs database itself. I also hit the limit there way more often than I'd like to. Discussion continues, but I feel that this is not going to be fixed...
  • Bug #87526 - "The output of 'XA recover convert xid' is not useful". One of our customers hit this problem recently. I can only agree with Sveta Smirnova here, "Output of XA RECOVER should show ID which can be used in XA COMMIT statement."
So, if the list above seems useful or interesting, please, follow me on Twitter. Let's continue to have regular fun with MySQL bugs, now using (hopefully) a more appropriate media!

As a side note, if you are interested in older bugs opened this day years ago and still "Verified", please, check this great page by Hartmut! You may find really great reports like Bug #79581- "Error 1064 on selects from Information Schema if routine name has '\0'", by Sveta Smirnova. This bug is 2 years old today...

Saturday, December 2, 2017

Using strace for MySQL Troubleshooting

I'd say that strace utility is even more useful for MySQL DBAs than lsof. Basically, it is a useful general purpose diagnostic and debugging tool for tracing system calls some process makes and signals it receives. The name of each system call, its arguments and its return value are printed to stderr or to the file specified with the -o option.

In context of MySQL strace is usually used to find what files mysqld process accesses, and to check the details about any I/O errors. For example, if I'd really want to try to verify Bug #88479 - "Unable to start mysqld using a single config file (and avoiding reading defaults)" by Simon Mudd, I'd just run mysqld from some 5.7.x version as a command argument for strace. On my Ubuntu 14.04 netbook I have the following files:
openxs@ao756:~/dbs/5.7$ ls -l /usr/my.cnf
-rw-r--r-- 1 root root 943 лип 19  2013 /usr/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/my.cnf
-rw-r--r-- 1 root root 260 чер 24 20:40 /etc/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/mysql/my.cnf
-rw-r--r-- 1 root root 116 лют 26  2016 /etc/mysql/my.cnf
openxs@ao756:~/dbs/5.7$ bin/mysqld --version
bin/mysqld  Ver 5.7.18 for Linux on x86_64 (MySQL Community Server (GPL))
So, what happens if I try to run mysqld --defaults-file=/etc/mysql/my.cnf, like this:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
openxs@ao756:~/dbs/5.7$
It seems we proved that only the file passed as --defaults-file is read (if it exists). By default other locations are also checked in a specific order (note that return codes are mapped to symbolic errors when possible):
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --print-defaults --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/my.cnf", {st_mode=S_IFREG|0644, st_size=260, ...}) = 0
open("/etc/my.cnf", O_RDONLY)           = 3
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
stat("/home/openxs/dbs/5.7/etc/my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
stat("/home/openxs/.my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
openxs@ao756:~/dbs/5.7$
If we think that --print-defaults may matter, we can try without it:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf 2>&1 | grep 'my.cnf'stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
^C
openxs@ao756:~/dbs/5.7$
Last example also shows how one can terminate tracing with Ctrl-C.

Now, let me illustrate typical use cases with (surprise!) some public MySQL bug reports where strace was important to find or verify the bug:
  • Bug #20748 - "Configuration files should not be read more than once". In this old bug report Domas Mituzas proved the point by running mysqld as a command via strace. He filtered out lines related to opening my.cnf file with egrep and made it obvious that one file may be read more than once. The bug is closed long time ago, but it is not obvious that all possible cases are covered, based on further comments...
  • Bug #49336 - "mysqlbinlog does not accept input from stdin when stdin is a pipe". Here bug reporter shown how run mysqlbinlog as a command under strace and redirect output to the file with -o option.
  • Bug #62224 - "Setting open-files-limit above the hard limit won't be logged in errorlog". This minor bug in all versions before old 5.6.x (that still remains "Verified", probably forgotten) was reported by Daniël van Eeden. strace allowed him to show what limits are really set by setrlimit() calls. The fact that arguments of system calls are also shown matters sometimes.
  • Bug #62578 - "mysql client aborts connection on terminal resize". This bug report by Jervin Real from Percona is one of my all time favorites! I clearly remember how desperate customer tried to to dump data and load them on a remote server with a nice command line, and after waiting for many days (mysqldump was run for a data set of 5TB+, don't ask, not my idea) complained that they got "Lost connection to MySQL server during query" error message and loading failed. Surely, the command was run from the terminal window on his Mac and in the process of moving it here and there he just resized the window  by chance... You can see nice example of strace usage with MySQL Sandbox in the bug report, as well as some outputs for SIGWINCH signal. Note that the bug is NOT fixed by Oracle in MySQL 5.5.x (and never happened in 5.6+), while Percona fixed it since 5.5.31.
  • Bug #65956 - "client and libmysqlclient VIO drops connection if signal received during read()". In this yet another 5.5 only regression bug that was NOT fixed in 5.5.x by Oracle (until in frames of Bug #82019 - "Is client library supposed to retry EINTR indefinitely or not" the patch was contributed for a special case of it by Laurynas Biveinis from Percona, that in somewhat changed form allowed to, hopefully, get some fix in 5.5.52+), the problem was noted while trying to attach strace to running client program (with -p option).
  • Bug #72340 - "my_print_defaults fails to parse include directive if there is no new line". Here bug reporter used strace (with -f option to trace child/forked processes and -v option get verbose output) to show that file name is NOT properly recognized by my_print_defaults. The bug is still "Verified".
  • Bug #72701 - "mysqlbinlog uses localtime() to print events, causes kernel mutex contention". Yet another nice bug report by Domas Mituzas, who had used -e stat options of strace to trace only stat() calls and show how many times they are applied to /etc/localtime.The bug is fixed since 5.5.41, 5.6.22 and 5.7.6 removed related kernel mutex contention.
  • Bug #76627 - "MySQL/InnoDB mix buffered and direct IO". It was demonstrated with strace that InnoDB was opening each .ibd file twice (this is expected as explained by Marko Mäkelä) in different modes (and this was not any good). The bug is fixed in recent renough MySQL server versions.
  • Bug #80020 - "mysqlfrm doesn't work with 5.7". In this bug report Aleksandr Kuzminsky had to use strace to find out why mysqlfrm utility failed mostly silently even with --verbose option.
  • Bug #80319 - ""flush tables" semantics deviate from manual". Here Jörg Brühe proved with strace that close() system call is not used for individual InnoDB table t when FLUSH TABLES t is executed. manual had to be clarified.
  • Bug #80889 - "PURGE BINARY LOGS TO is reading the whole binlog file and causing MySql to Stall". By attaching strace to running mysqld process and running the command it was shown that when GTID_MODE=OFF the entire file we purge to was read. No reason to do this, really. Note how return value of open(), file descriptor, was further used to track reads from this specific file.
  • Bug #81443 - "mysqld --initialize-insecure silently fails with --user flag". As MySQL DBA, you should be ready to use strace when requested by support or developers. Here bug reporter was asked to use it, and this revealed a permission problem (that was a result of the bug). No need to guess what may go wrong with permissions - just use strace to find out!
  • Bug #84708 - "mysqld fills up error logs with [ERROR] Error in accept: Bad file descriptor". Here strace was used to find out that "When mysqld is secured with tcp_wrappers it will close a socket from an unauthorized ip address and then immediately start polling that socket"... The bug is fixed in MySQL 5.7.19+ and 8.0.2+, so take care!
  • Bug #86117 - "reconnect is very slow". I do not care about MySQL Shell, but in this bug report Daniël van Eeden used -r option of strace to print a relative timestamp for every system call, and this was it was clear how much time was spent during reconnect, and where it was spent. Very useful!
  • Bug #86462 - "mysql_ugprade: improve handling of upgrade errors". In this case strace allowed Simon Mudd to find out what exact SQL statement generated by mysql_upgrade failed.
The last but not the least, note that you may need root or sudo privileges to use strace. On my Ubuntu 14.04 this kind of messages may appear even if I am the user that owns mysqld process (same with gdb):
openxs@ao756:~/dbs/maria10.2$ strace -p 2083
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
To summarize, strace may help MySQL DBA to find out:

  • what files are accessed by the mysqld process or related utilities, and in what order
  • why some MySQL-related command (silently) fails or hangs
  • why some commands end up with permission denied or other errors
  • what signals MySQL server and tools get
  • what system calls could took a lot of time when something works slow
  • when files are opened and closed, and how much data are read from the files
  • where the error log and other logs are really located (we can look for system calls related to writing to stderr, for example)
  • how MySQL really works with files, ports and sockets
It also helps to find and verify MySQL bugs, and clarify missing details in MySQL manual.

There other similar tools for tracing system calls (maybe among other things) on Linux that I am going to review in this blog some day. Performance impact of running MySQL server under this kind of tracing is also a topic to study.

Saturday, November 18, 2017

How lsof Utility May Help MySQL DBAs

While working in Support, I noticed that probably at least once a week I have to use or mention lsof utility in some context. This week, for example, we had a customer trying to find out if his mysqld process running is linked with tcmalloc library. He started it different ways, using LD_PRELOAD directly and --malloc-lib option of mysqld_safe script etc, but wanted to verify that his attempts really worked as expected. My immediate comment in the internal chat was: "Just let them run lsof -p `pidof mysqld` | grep mall and check!" My MariaDB 10.2 instance uses jemalloc and this can be checked exactly the same way:
openxs@ao756:~/dbs/maria10.2$ ps aux | grep mysqld...
openxs    4619  0.0  0.0   4452   804 pts/2    S    17:02   0:00 /bin/sh bin/mysqld_safe --no-defaults --port=3308 --malloc-lib=/usr/lib/x86_64-linux-gnu/libjemalloc.so
openxs    4734  0.5  2.9 876368 115156 pts/2   Sl   17:02   0:00 /home/openxs/dbs/maria10.2/bin/mysqld --no-defaults --basedir=/home/openxs/dbs/maria10.2 --datadir=/home/openxs/dbs/maria10.2/data --plugin-dir=/home/openxs/dbs/maria10.2/lib/plugin --log-error=/home/openxs/dbs/maria10.2/data/ao756.err --pid-file=ao756.pid --port=3308
openxs    5391  0.0  0.0  14652   964 pts/2    S+   17:05   0:00 grep --color=auto mysqld
openxs@ao756:~/dbs/maria10.2$ lsof -p 4734 | grep mall
mysqld  4734 openxs  mem    REG              252,2    219776 12058822 /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
openxs@ao756:~/dbs/maria10.2$
I think it's time to summarize most important use cases for lsof utility for MySQL DBAs. I am going to show different cases when it can be useful based on public MySQL bug reports. 

As one can read in the manual, lsof "lists on its standard output file information about files opened by processes". In one of the simplest possible calls presented above, we just pass PID of the process after -p option and get list of open files for this process. This includes shared libraries the process uses. By default the following format of the output is used:
openxs@ao756:~/dbs/maria10.2$ lsof -p 4734 | more
COMMAND  PID   USER   FD   TYPE             DEVICE  SIZE/OFF     NODE NAMEmysqld  4734 openxs  cwd    DIR              252,2      4096 29638597 /home/openxs/dbs/maria10.2/data
mysqld  4734 openxs  rtd    DIR              252,2      4096        2 /
mysqld  4734 openxs  txt    REG              252,2 147257671 29514843 /home/openxs/dbs/maria10.2/bin/mysqld
mysqld  4734 openxs  mem    REG              252,2     31792  1311130 /lib/x86_64-linux-gnu/librt-2.19.so
mysqld  4734 openxs  mem    REG              252,2 101270905 29241175 /home/openxs/dbs/maria10.2/lib/plugin/ha_rocksdb.so

...
mysqld  4734 openxs  DEL    REG               0,11             443265 /[aio]
mysqld  4734 openxs    0r   CHR                1,3       0t0     1050 /dev/null
mysqld  4734 openxs    1w   REG              252,2     52623  5255961 /home/openxs/dbs/maria10.2/data/.rocksdb/LOG
mysqld  4734 openxs    2w   REG              252,2    458880 29647192 /home/openxs/dbs/maria10.2/data/ao756.err
mysqld  4734 openxs    3r   DIR              252,2      4096  5255249 /home/openxs/dbs/maria10.2/data/.rocksdb
...
mysqld  4734 openxs  451u  IPv6             443558       0t0      TCP *:3308 (LISTEN)
mysqld  4734 openxs  452u  unix 0x0000000000000000       0t0   443559 /tmp/mysql.sock
...
mysqld  4734 openxs  470u   REG              252,2         0 29756970 /home/openxs/dbs/maria10.2/data/mysql/event.MYD
mysqld  4734 openxs  471u   REG              252,2         0 29647195 /home/openxs/dbs/maria10.2/data/multi-master.info
Most columns have obvious meaning, so let me concentrate on the few. FD should be numeric file descriptor, and it is for normal files. In this case it is also followed by a letter describing the mode under which the file is open (r for read, w for write and u for update). There may be one more letter describing a type of lock applied to the file. But we can see values without any single digit in the output above, so obviously some special values can be present there, like cwd for current working directory, rtd for root directory, txt for program text (code and data) or mem for memory mapped file, etc.

TYPE column is also interesting and may have plenty of values (as there are many types of files in Linux). REG means regular file, DIR is, obviously, a directory. Note also unix for a socket and IPv6 for the TCP port mysqld process listens to.

In SIZE/OFF column for normal files we usually see their size in bytes. Values for offset in file are usually prefixed with 0t if the value is decimal, or 0x if it's hex. NAME is obviously a fully specified file name (with symbolic links resolved). Some more details about the output format are discussed in the following examples.

Another usual way to use lsof is to pass a file name and get details about processes that have it opened, like this:
openxs@ao756:~/dbs/maria10.2$ lsof /tmp/mysql.sock
COMMAND  PID   USER   FD   TYPE             DEVICE SIZE/OFF   NODE NAME
mysqld  4734 openxs  452u  unix 0x0000000000000000      0t0 443559 /tmp/mysql.sock
openxs@ao756:~/dbs/maria10.2$ lsof /home/openxs/dbs/maria10.2
COMMAND     PID   USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
mysqld_sa  4619 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
lsof      14354 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
lsof      14355 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
bash      29244 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
In this case we see that /home/openxs/dbs/maria10.2 is used as a current working directory by 4 processes. Usually this kind of check is used when we can not unmount some directory, but it may be also useful in context of MySQL when you get error messages that some file is already used by other process. In the first example above I was checking what process could use /tmp/mysql.sock file.

Now, with the above details on basic usage in mind, let's check several recent enough MySQL bug reports that demonstrate typical and more advanced usage of lsof:
  • Bug #66237 - "Temporary files created by binary log cache are not purged after transaction commit". My former colleague and mentor from Percona, Miguel Angel Nieto (who recently joined a dark side of MongoDB employees) used lsof to show numerous files with names ML* created and left (until connection is closed) by mysqld process in /tmp directory (tmpdir to be precize) of a server with binary logging enabled, when transaction size was larger that binlog cache size. The bug is fixed in 5.6.17+ and 5.7.2+. It shows us a usual way of creating temporary files by MySQL server:
    # lsof -p 6112|grep ML
    mysqld  6112 root   38u   REG                7,0  106594304      18 /tmp/MLjw4ecJ (deleted)
    mysqld  6112 root   39u   REG                7,0  237314310      17 /tmp/MLwdWDGW (deleted)
    Notice (deleted) above. This is a result of immediate call to unlink() when temporary files are created. Check this in the source code, as well as my_delete() implementation.
  • Bug #82870 - "mysqld opens too many descriptors for slow query log". This bug (that is still "Verified") was opened by my former colleague Sveta Smirnova (now in Percona). Basically, mysqld opens too many descriptors for slow query log (and general query log) if it is turned ON and OFF while concurrent sessions are running. lsof allowed to see multiple descriptors created for the same file, until eventually open_files_limit is hit.
  • Bug #83434 - "Select statement with partition selection against MyISAM table opens all partitions". This bug (later declared a duplicate of older one and, eventually, a documented, even if unexpected, behavior by design) was opened by my colleague from MariaDB Geoff Montee. lsof utility helped to show that all partitions are actually opened by the mysqld process in this case.
  • Bug #74145 - "FLUSH LOGS improperly disables the logging if the log file cannot be accessed". This bug (still "Verified") was reported by Jean Weisbuch. Here we can see how lsof was used to find out if slow log is open after FLUSH. The logging has to be disabled, but MySQL continue to lie that it is enabled. I remember many cases when lsof also helped to find out where the error log (file with descriptor 2w) is really located/redirected to.
  • Bug #77752 - "bind-address wrongly prefers IPv4 over IPv6". This was not a bug (more like a configuration issue), but see how lsof -i is used by Daniël van Eeden to find out what process listens to a specific port, and does it listen to IPv4 or IPv6 address.
  • Bug #87589 - "Documentation incorrectly states that LOAD DATA LOCAL INFILE does not use tmpdir". In this "Verified" bug report Geoff Montee used lsof to show that temporary files are really created in tmpdir, not in /tmp (OS temporary directory). This is how you can find out when MySQL manual lies...
  • Bug #77519 - "Reported location of Innodb Merge Temp File is wrong". One more bug from Daniël van Eeden, this time "Verified". By calling lsof +L1 during an online alter table, he demonstrated that two temp files are created in tmpdir instead of in the datadir (as described by the manual), while events_waits_history_long table in performance_schema seems to claim it waited ion temporary file in the datadir. Note that in other his bug report, Bug #76225, fixed since 5.7.9 and 5.8.0, he had also shown ML* binlog cache files created that were not instrumented by performance_schema.
  • Bug #75706 - "alter table import tablespace creates a temporary table". This bug report by BJ Quinn is formally still "Verified", but according to my former colleague Przemyslaw Malkowski from Percona in recent 5.6.x and 5.7.x versions lsof does NOT show temporary table created. Time to re-verify this bug maybe, if the decision is made on how to implement this?
  • Bug #83717 - "Manual does not explain when ddl_log.log file is deleted and how large it can be". My own bug report where lsof was used to show that the ddl_log.log file remains open even after online ALTER completes. Manual is clear about this now.
To summarize, lsof may help MySQL DBA to find out:
  • what dynamic libraries are really used by the mysqld process
  • where the error log and other logs are really located
  • what other process may have some file, port or socket opened that is needed for current MySQL instance
  • why you may hit open_files_limit or use all free space in some filesystem unexpectedly
  • where all kinds of temporary files are created during specific operations
  • how MySQL really works with files, ports and sockets
It also allows to find MySQL bugs and clarify missing details in MySQL manual.

Saturday, November 11, 2017

Fun with Bugs #57 - On MySQL Bug Reports I am Subscribed to, Part I

I've decided to stop reviewing MySQL Release Notes in this series, but it does not mean that I am not interested in MySQL bugs any more. At the moment I am subscribed to 91 active MySQL bugs reported by other MySQL users, and in this blog post I am going to present 15 of them, the most recently reported ones. I'd really want to see them fixed or at least properly processed as soon as possible.

In some cases I am going to add my speculations on how the bug had better be handled, or maybe highlight some important details about it. It is not my job any more to process/"verify" any community bug reports for any kind of MySQL, but I did that for many years and I've spent more than 5 years "on the other side", being a member of Community, so in some cases I let myself to share some strong opinion on what may be done differently from the Oracle side.

As a side note, I started to subscribe to MySQL bugs mostly after I left Oracle, as before that I got email notification about each and every change in every MySQL bug report ever created...

Here is the list, starting from the most recent ones:
  • Bug #88422 - "MySQL 5.7 innodb purge thread get oldest readview could block other transaction". It is one of that bug reports without a test case from reporter. It is tempting to set it to "Verified" just "based on code review", as the code in 5.7 is quite obviously shows both holding the trx_sys->mutex and linear complexity of search depending on number of read views in the worst case (when most of them are closed):
    /**
    Get the oldest (active) view in the system.
    @return oldest view if found or NULL */

    ReadView*
    MVCC::get_oldest_view() const
    {
            ReadView*       view;

            ut_ad(mutex_own(&trx_sys->mutex));

            for (view = UT_LIST_GET_LAST(m_views);
                 view != NULL;
                 view = UT_LIST_GET_PREV(m_view_list, view)) {

                    if (!view->is_closed()) {
                            break;
                    }
            }

            return(view);
    }
    But probably current Oracle bugs verification rules do not let to just mark it as verified. After all, somebody will have to create a test case... So, my dear old friend Sinisa Milivojevic decided to try to force bug reporter to provide a test case instead of spending some time trying to create one himself. I am not going to blame him for that, why to try the easy way :) But I consider this his statement in the comment dated [10 Nov 16:21]:
    "... 5.7 methods holds no mutex what so ever..."
    a bit wrong, as we can see the mutex is acquired when get_oldest_view() method is called:
    void
    MVCC::clone_oldest_view(ReadView* view)
    {
            mutex_enter(&trx_sys->mutex);

            ReadView*       oldest_view = get_oldest_view();

            if (oldest_view == NULL) {
    ...
  • Bug #88381 - "Predicate cannot be pushed down "past" window function". Here bug reporter had provided enough hints for a test case. One can probably just check 'Handler%' status variables before and after query execution to come to the conclusion. Moreover, it seems Oracle developer,  Dag Wanvik, accepted this as a known limitation, but the bug still remains "Open" and nobody knows if it was copied to the internal bugs database, got prioritized and if any work on this is planned any time soon. We shell see. You may also want to monitor MDEV-10855.
  • Bug #88373 - "Renaming a column breaks replication from 5.7 to 8.0 because of impl. collation". This bug was quickly verified by Umesh Shastry. I expect a lot of "fun" for users upgrading to MySQL 8.0 when it becomes GA, especially in replication setups.
  • Bug #88328 - "Performance degradation with the slave_parallel_workers increase". There is no test case, just some general description and ideas about the root case when semi-sync replication is used. I expect this bug to stay "Open" for a long time, as it is a topic for a good research and blog posts like this one, that is, a work for real expert!
  • Bug #88223 - "Replication with no tmpdir space and InnoDB as tmp_storage_engine can break". Here we have clear and simple test case from Sveta Smirnova (no wonder, she also worked at bugs verification team in MySQL, Sun and Oracle). I hope Umesh will verify it soon. As a side note, it is explained (in the comments) elsewhere that InnoDB as internal_tmp_disk_storage_engine may not be the best possible option. We do not have this variable and do not plan to support InnoDB for internal temporary tables in MariaDB 10.2+.
  • Bug #88220 - "compressing and uncompressing InnoDB tables seems to be inconsistent". See also other, older bug reports mentioned there that are duplicates/closely related, but were not getting proper attention.
  • Bug #88150 - "'Undo log record is too big.' error occurring in very narrow range of str length". It was reported by my colleague Geoff Montee and is already fixed in recent versions of MariaDB (see MDEV-14051 for the details and some nice examples of gdb usage by a developer)!
  • Bug #88127 - "Index not used for 'order by' query with utf8mb4 character set". Here I am just curious when bugs like that would be caught up by Oracle QA before any public releases.
  • Bug #88071 - "An arresting Performance degradation when set sort_buffer_size=32M". here the test case is clear - just run sysbench oltp test at high concurrency with different values of sort_buffer_size. Still, Sinisa Milivojevic decided to explain when RAM limit may play a role instead of just showing how it works great (if it does) on any server with enough RAM... Let's see how this attempt to force bug reporter to work/explain more may end up...
  • Bug #87947 - "Optimizer chooses ref over range when access when range access is faster". Nice example of a case when optimizer trace may be really useful. Øystein Grøvlen kindly explained that "range access and ref access are not comparable costs". I wish we get better cost model for such cases in MySQL one day.
  • Bug #87837 - "MySQL 8 does not start after upgrade to 8.03". It is expected actually, and even somewhat documented in the release notes that MySQL 8.0.3 is not compatible to any older version. So, it is more like MySQL Installer (that I do not care much about) bug, but I still subscribed to it as yet another source of potential fun during further upgrade attempts.
  • Bug #87716 - "SELECT FOR UPDATE with BETWEEN AND gets row lock excessively". I think I already studied once why with IN() rows are locked differently by InnoDB comparing to BETWEEN that selects the same rows. But I'd like to know what's the Oracle's take on this, and I'd like to study this specific test case in details one day as well.
  • Bug #87670 - "Force index for group by is not always honored". Clear and simple test case, so no wonder it was immediately verified.
  • Bug #87621 - "Huge InnoDB slowdown when selecting strings without indexes ". I'd like to check with perf one day where the time is spent mostly during this test. For now I think this is a result of the way "long" data are stored on separate pages in InnoDB. What;'s interesting here is also a test case where R is used to generate data set.
  • Bug #87589 - "Documentation incorrectly states that LOAD DATA LOCAL INFILE does not use tmpdir". This was yet another report from my colleague Geoff Montee. lsof is your friend, maybe I have to talk about it one day at FOSDEM (call for papers is still open :) I like to find and follow bugs and missing details in MySQL manual, maybe because I would never be able to contribute to it as a writer directly...

So, this list shows my typical recent interests related to MySQL bugs - mostly InnoDB, optimizer, replication problems, fine manual and just some fun details like the way some Oracle engineers try to avoid working extra hard while processing bugs... I am also happy to know that in some cases MariaDB is able to deliver fixes faster.

Thursday, October 19, 2017

Fun with Bugs #56 - On Some Public Bugs Fixed in MySQL 5.7.20

While MySQL 8.0.x hardly has much impact on my regular work, recent MySQL 5.7.20 release is something to check carefully. MySQL 5.7 is widely used in production, as a base for Percona Server 5.7, some features may be merged into MariaDB 10.x etc. So, here is my review of some community reported bugs that were fixed in recently released MySQL 5.7.20, based on the release notes.

Usually I start with InnoDB bug fixes, but in 5.7.20 several related fixes were made only to bugs reported internally. So, this time I have to start with partitioning:
  • Bug #86255 - First one to write about, and the bug report is private... Second one (Bug #76418) is also private. All we have is this:
    "Partitioning: In certain cases when fetching heap records a partition ID could be set to zero. (Bug #86255, Bug #26034430)"

    "Partitioning: Queries involving NULL were not always handled correctly on tables that were partitioned by LIST. (Bug #76418, Bug #20748521)"
That's all, folks. I fail to understand why bugs with such a description can stay private after they are fixed. I have to admit: I do not get it. Moreover, I am not going even to try any longer. Lists with one item look stupid, but hiding such bugs is not much better, IMHO.

Lucky I am, there were several bug fixes related to replication:
  • Bug #85352 - "Replication regression with RBR and partitioned tables". This regression bug (comparing to 5.5.x) was reported by Juan Arruti and immediately verified by Umesh Shastry. I do not know why it is listed as InnoDB in the release notes, as it hardly can be about native InnoDB partitioning, based on versions affected.
  • Bug #86224 - "mysqlrplsync errors out because slave is using --super-read-only option". It was reported by Richard Morris and verified by Bogdan Kecman.
  • Bug #84731 - "Group Replication: mysql client connections hang during group replication start", was reported by Kenny Gryp and verified by Umesh Shastry. Another bug reported by Kenny was also fixed in 5.7.20, Bug #84798 - "Group Replication can use some verbosity in the error log".
  • Bug #86250 - "thd->ha_data[ht_arg->slot].ha_ptr_backup == __null || (thd->is_binlog_applier()". This debug assertion was reported by Roel Van de Paar and verified by Umesh Shastry
  • Bug #85639 - "XA transactions are 'unsafe' for RPL using SBR". It was reported by João Gramacho.
  • Bug #86288 - "mysqlbinlog read-from-remote-server not honoring rewrite_db filtering", was reported by Julie Hergert.
Other bug fixes not to miss are:
  • Bug #85622 - "5.6: wrong results with materialization, views". It was reported by Shane Bester. Even though it was initially stated that only 5.6 is affected, release notes say there was a fix for 5.7.20 and even 8.0.3.
  • Bug #82992 - "Some warnings appear in dump from mysqldump". This funny bug was found by Nong LO and verified by Sinisa Milivojevic.
  • Bug #81714 - "mysqldump get_view_structure does not free MYSQL_RES in one error path". Thisbug was reported by Laurynas Biveinis, but it was noticed and patched by Yura Sorokin, also from Percona.
  • Bug #83950 - "LOAD DATA INFILE fails with an escape character followed by a multi-byte one", was reported by yours truly and verified by Umesh Shastry. Unfortunately the bug report itself does NOT say explicitly what versions had got the fix.
  • Bug #79596 - "If client killed after ROLLBACK TO SAVEPOINT previous stmts committed". This regression and potential data corruption bug was reported by Sveta Smirnova, verified by Umesh Shastry and studied at the source code level by Zhai Weixiang. Nice to see it fixed!
That's all, few build and packaging related bugs aside.

This was my very last attempt to do a detailed review of bug reports from MySQL Community based on official Release Notes. With private bugs and very few fixes for things coming from the public bugs database in general, it seems to make zero sense now to continue these. Authors of patches contributed are properly mentioned by Oracle, and we all know who verify community bug reports... One Twitter message would be enough to fit everything I have to say, and any real details should better be checked in git commits.

It's time to get back to the roots of this series and maybe write about bugs just opened, bugs not properly handled or just funny ones. I think it helped a lot back in 2013 to make MySQL 5.6 a release that was commonly accepted as a "good one". It's not fun any more and not much useful to report Oracle achievements in public bugs fixing, so I'd better switch to their problems.