Saturday, September 28, 2013

Fun with Bugs #24 - PERFORMANCE_SCHEMA

It seems that one of my session proposals is accepted for Percona Live London 2013, so I have to prepare myself to speak about PERFORMANCE_SCHEMA new features and problems in MySQL 5.6. Bugs are going to be discussed, among other things. Let's check current active bugs (and some "Not a bug"s) related to PERFORMANCE_SCHEMA in this issue.

I'd like to start with Bug #68514 that got some attention this week again, in despite of its "Not a bug" formal status. Detailed instrumentation comes with a cost, and to reduce high CPU cost (reported as Bug #67736 by Domas at 5.6 RC stage) it was decided to allocate memory in bigger batches. It seems notable (I'd say unexpectedly high) amount of memory may be allocated for performance counters if you have max_connections > 1500 (or table_open_cache > 10000, or table_definition_cache > 10000, or open_files_limit > 30000). So, this is something to take into account if you plan to use MySQL 5.6 in a hope to "scale up" your instance (as performance_schema=1 there by default and thus extra memory is allocated by default).

Speaking about "Verified" bugs, there are only 7 of them now for MySQL 5.6. So it's easy to just list them all:
  • Bug #68413 - "performance_schema overhead is at least 10%". It can be more, but it took a lot of time and efforts from Mark Callaghan and me to force Oracle engineers to accept this claim as valid. Sometimes I feel that public bug reports for PERFORMANCE_SCHEMA from users are considered more as an insult than a useful contribution from community (as it happens with even minor InnoDB problem reports, for example). Anyway, we managed to prove the overhead can really be that big, and later Oracle's own performance guru Dimitri Kravtchuk confirmed in his Bug #70018 that for some use cases just having PERFORMANCE_SCHEMA enabled may cost you 30% decrease of QPS, even if you do not try to use it in any way.
  • Comparing to the above, Bug #69727 reported by Todd Farmer is a minor issue. It seems setting instruments for P_S as server startup options is not so obvious in some cases.
  • Bug #69782 - "Old files not being removed from performance_schema.file_instances ". This may eventually become a problem for MySQL instances that create binary logs or relay logs often.
  • Bug #69915 - "statement/com/Query counter doesn't increment". Yet another bug report from Todd Farmer, who uses P_S and blogs about it a lot. No wonder he finds bugs in the process. This one sounds simple, so I don't really understand why it is still "Verified".
  • Bug #70025 - "Update on P_S setup_consumers and threads through JOIN only updates first row". Minor enough problem was found by yet another my former colleague in Oracle, Jesper Krogh. Jesper also often writes about P_S in his blog and does us a favor by reporting bugs in public bugs database. The most recent (at the moment) still "Verified" P_S bug is also from him: Bug #70028 - "P_S threads.INSTRUMENTED not set according to setup_actors".
 As you can conclude from the above, PERFORMANCE_SCHEMA is created by Oracle engineers for their own use and they use it a lot. So, it is doomed to be useful, especially now, when MySQL 5.6 is becoming widely used by Oracle customers and MySQL community users in production. We can rely on Facebook in this area - as they already use P_S, they'll do their best to force Oracle MySQL engineers to make it as efficient as possible eventually.

It's also clear from the above that one of the main problems of PERFORMANCE_SCHEMA is its name. Users try to name it as P_S, PS, PFS - whatever abbreviation is used, it's better than the original name (even if it comes from some SQL standard that nobody really cares about...). Along with INFORMATION_SCHEMA it makes me wanting to see Oracle's good old public synonyms implemented in MySQL some day... Before that it seems that the only sane way to use P_S on a regular basis in interactive command line client is via ps_helper, created by the godfather of P_S, Mark Leith.

Saturday, September 21, 2013

It's all about bugs fixed: MySQL 5.6.14

Most of MySQL gurus and famous users are probably in San Francisco now, getting ready for fun at MySQL Connect. Part of that fun should come from the announcement of great new MySQL 5.6.14 release (that somewhat silently happened yesterday).

I am sitting at home though and I've seen at best 3 sunny days in September. The rest of the time it rains, so hardly I can do anything more funny and useful than review of MySQL bug reports even during my weekend. Let me try to tell you what MySQL 5.6.14 is really about and what you should expect from it based on the list of bugs fixed. Please, do not blame me if my summary would be different from the upcoming keynotes at MySQL Connect. It rains here...


I'll use good old approach of checking my older posts about bugs in MySQL 5.6.13 and comparing it to the release notes of MySQL 5.6.14. You have to read them carefully anyway to check if any of the bugs you cared about are fixed. Chances are real, with 67 or so bugs mentioned, but previous releases of MySQL 5.6 GA contained notably more fixes. Either release before MySQL Connect was a top priority or MySQL 5.6 is getting mature now, with less problems remaining to solve in new releases.

Let's check, starting from InnoDB bugs I had written about back in June. From the bugs I mentioned there I see no new fixes, even Bug #69179 (that I've mentioned many times and that should be already fixed in recent Percona Server 5.5.x) and Bug #69236 (single thread performance regression affecting Facebook) are still just "Verified". Bug #69325 is also still "Verified", even though at least documenting potential memory use by ALTER against a partitioned InnoDB table (as it was suggested by Shane back in May) would help a lot...

MySQL 5.6.13 was released less than 2 months ago, so surely one should not expect that all new bugs are fixed so fast. But I am really disappointed to see Bug #69892 (that questions the possibility of forced recovery if InnoDB persistent statistics was ever used and that I had written about here) still "Verified".

It can not be that bad, so I checked later post, and found one problem I've cared about solved in MySQL 5.6.14. The problem of improper use of InnoDB tables to store replication information (see Bug #69898) is fixed. Related Bug #69907 is still "Verified", even though recent comment claim it is a duplicate of the previous one and must be fixed in MySQL 5.6.14. Let's assume it's a matter of documenting or just a mistake. In any case, if you plan to use crash safe replication in MySQL 5.6, you should upgrade to 5.6.14 ASAP!

Of the InnoDB bugs I had not paid attention to recently that I'd like to mention the following ones fixed (quote from the Release Notes):
  • InnoDB; Partitioning: Following any query on the INFORMATION_SCHEMA.PARTITIONS table, InnoDB index statistics as shown in the output of statements such as SELECT * FROM INFORMATION_SCHEMA.STATISTICS were read from the last partition, instead of from the partition containing the greatest number of rows. (Bug #11766851, Bug #60071)
  • InnoDB: When logging the delete-marking of a record during online ALTER TABLE...ADD PRIMARY KEY, InnoDB writes the transaction ID to the log as it was before the deletion or delete-marking of the record. When doing this, InnoDB would overwrite the DB_TRX_ID field in the original table, which could result in locking issues. (Bug #17316731)
  • InnoDB: The row_sel_sec_rec_is_for_clust_rec function would incorrectly prepare to compare a NULL column prefix in a secondary index with a non-NULL column in a clustered index. (Bug #17312846)
  • InnoDB: An incorrect purge would occur when rolling back an update to a delete-marked record. (Bug #17302896)
  • InnoDB: An assertion would be raised in fil_node_open_file due to a missing .ibd file. Instead of asserting, InnoDB should return false and the caller of fil_node_open_file should handle the return message. (Bug #17305626, Bug #70007)
  • InnoDB: The assertion ut_ad(oldest_lsn <= cur_lsn) in file buf0flu.cc would fail because the current max LSN would be retrieved from the buffer pool before the oldest LSN. (Bug #17252421)
Bug #60071 was waiting since 5.5 GA times and it's great to see it fixed. Bug #70007, on the other hand, was reported just a month ago against 5.6.13. Both fixes should make my colleague Justin happy.

Other bugs sounds serious and show good work of MySQL QA in Oracle probably, but it's hard to say anything more than we see in the Release Notes as all the details are hidden in internal Oracle's bugs database.

Let's move on to replication bugs I mentioned in June. Bug #69444 is still "Verified". Bug #69135 from Giuseppe is till "Open". Even Bug #69097 is still "Verified", since April 30... Read Bug #69095 to find out why you should not switch replication format from STATEMENT to ROW if you use GTIDs and why failures in this case do not indicate any bug... I hope Giuseppe is satisfied (I am just not brave enough to recommend to switch to GTID-based replication in production, yet). Bug #68892 is closed, but it says the fix will go to MySQL 5.6.15...

To summarize, MySQL 5.6.14 had NOT solved any more of my replication-related concerns expressed back in June, comparing to 5.6.13. The only really great improvement is the fix for Bug #69898, as explained above. Your millage may vary though, as I see many fixes to bugs in internal bugs database related to Replication, like these (quote from the Release Notes):
  • Replication: When a master with semisynchronous replication enabled was shut down, the master failed to wait for either a semisyncnronous ACK or timeout before completing the shutdown. This prevented semisynchronous replication from reverting to asynchronous replication and allowed open transactions to complete on the master, which resulted in missing events on the slave.
    To fix this problem, dump threads are now stopped last during shutdown, after the client is told to stop, so that, if the dump thread has pending events from active clients, they can be sent to the slave. (Bug #16775543)
  • Replication: A session attachment error during group commit causes the rollback of the transaction (as intended), but the transaction in which this happened was still written to the binary log and replicated to the slave. Thus, such an error could lead to a mismatched master and slave.
    Now when this error occurs, an incident event is written in the binary log which causes replication to stop, and notifies the user that redundant events may exist in the binary log. An additional error is also now reported to the client, indicating that the ongoing transaction has been rolled back. (Bug #16579083)
  • Replication: START SLAVE failed when the server was started with the options --master-info-repository=TABLE relay-log-info-repository=TABLE and with autocommit set to 0, together with --skip-slave-start.
    A workaround for previous versions of MySQL is to restart the slave mysqld without the --skip-slave-start option. (Bug #16533802)
Probably I have to stop at this stage and try to get some real-life experience with MySQL 5.6.14...

To summarize: if you use MySQL 5.6 in production setup where replication is a key component, you should definitely upgrade to MySQL 5.6.14. Do not expect it to solve all the problems you cared about though, for me it was not the case.

Sunday, September 8, 2013

Fun with Bugs #23 - more on Optimizer bugs in MySQL 5.6

When I've sent CV to MySQL AB back in 2004 (or early 2005) I had actually wanted to become a developer there. As a person who just started to use MySQL on a regular basis and, at the same time, had to explain dozens of engineers per month how optimizers works in Oracle and Informix RDBMSes, and optimize queries for them from time to time, I was naturally interested in adding missing (but well known to me) features to MySQL optimizer (from hash joins to stored outlines, histograms and tracing, all things I've noted as extremely useful for many real life cases)... So, I wanted to work on MySQL optimizer specifically, if something related to MySQL at all.

It happened so that MySQL Support had somehow noted my CV before anybody else, so in few months and after some serious tests and long screening talks (the longest of them was with new, at the time, optimizer developer there) I've ended up in Support and (lucky I am) I've stuck in Bugs Verification Team there, and I had never regret about this... But interest to MySQL optimizer remained. So, soon enough, I've ended up as a person who worked on more optimizer bug reports than anybody else in Support and, later, I've become an "Optimizer Support Coordinator", whatever that meant. Among other things it meant very hot discussions with Optimizer team in development (now all of them whom I worked with are working on MariaDB...), that started with statements like "there are no bugs in Optimizer other than crashes" but ended as a useful cooperation, and mutual understanding (and maybe even some understanding from my side why optimizer really works the way it works, not much...)

Time flies, but I am still interested in MySQL optimizer improvements (it's still so far from what I took as granted in other databases even before 2005). That's why I was surprised and even felt deeply insulted to see just a dozen of visitors on a great session on optimizer development by MariaDB engineers at PLMCE 2013... That's also why I devote this issue of "Fun with Bugs" to recent optimizer bugs, again (just 2 months after previous issue on this topic).

So, make sure you do not miss these recently verified optimizer bugs affecting MySQL 5.6:
  • Bug #70247 - "Using many WHERE conditions makes range scan disabled". We have a hardcoded limit in this case, nothing related to number of rows in the table or even less cost. Take care if you use some software that creates long list of values to compare against with IN - you can easily end up with full table scan.
  • Bug #70236 - some note on debug builds, why should we even care? Well, when the most famous and productive bug reporter of all times reports something, we should pay attention. Impact is not clear, but this kind of bugs surely affects at least any reasonable and regular QA efforts. Also note version affected, 5.6.15 - it means that MySQL 5.6.14 is probably already cloned off, and the problem we see here will affect this release. So, whatever the problem is, it will be with us for 2-3 months for sure.
  • Bug #70220 - "Grouping with a view may report 'Invalid use of group function'". Yet another example of public bug report by Oracle engineer. This is great and what I've asked for in this blog more than once. Everybody should report any MySQL bug (with some exceptions for security bugs and bugs having customer-sensitive data) in public, at http://bugs.mysql.com, no matter what company he works for. I am happy to see this happening!
  • Bug #70038 - "Wrong select count distinct with a field included in two-column unique key". Take care while adding UNIQUE keys to InnoDB tables in MySQL 5.5 and 5.6 - this may lead to wrong results for some queries.
  • Bug #70021 - "Poor execution of a plan with unnecessary "range checked for each record". Looks like all versions may have problems with queries like this:

    select * from t2 left join t1 on (t1.key1 < 3 or t1.key1 > 1020) and t1.key2 < t2.a;
    
    That is, you've just added new index on key2 column, or added additional condition on indexed column, and get notably worse performance. Unexpected and weird, but this happens. Note also yet another example of a great report from MariaDB engineer here (hardly anybody in the world knows more about the way MySQL optimizer works, by the way). You may think whatever you want about Monty and his interviews about MariaDB, Oracle or MySQL future, but engineers working on MariaDB contribute a lot, especially in area of Optimizer. Do respect this, please.
  • Bug #70014 - "MySQL crashes on explain with JSON formatting (debug builds)". Again, any regular QA efforts are affected by this kind of bugs, so they should not be silently ignored because of low impact.
  • Bug #69841 - "SELECT COUNT(DISTINCT a,b) incorrectly counts rows containing NULL". Any good list of bugs should contain regression ones. Here we have a regression comparing to MySQL 5.1, that returns correct results. This is NOT the first optimizer regression comparing to 5.1, and surely not the first bug with DISTINCT. So, take extra care if you upgrade from 5.1 to latest and greatest MySQL 5.6 (or well tested MySQL 5.5, for that matter...) I am still going to proudly wear my MySQL 5.1 T-shirt on any public conference - speaking about quality of GA release, MySQL 5.1 was not better than MySQL 5.6, but we still see cases where it works better than any newer GA releases (unfortunately).
Let's stop at this point. Now I am back from vacation and celebrations of one year outside Oracle, so I plan to make regular posts about MySQL bugs again. Stay tuned!