Blog of (former?) MySQL Entomologist: June 2013

Friday, June 28, 2013

Fun with Bugs #14 - InnoDB in MySQL 5.6

InnoDB improvements in MySQL 5.6 are well known. One of the key reasons to upgrade to MySQL 5.6 for most users is to get the benefits of improved performance, scalability, new monitoring features and fulltext indexes support in InnoDB.

Is there anything to double check before assuming that InnoDB in MySQL 5.6 is just better than any older version for any practical purposes? Let's review known public InnoDB-specific bug reports. Here is my "Top 10" list, as of MySQL 5.6.12, starting with most recent reports:

Bug #69424 - maybe I miss something (I am not the only one though), but I see no way to continue using raw devices (on Linux at least) to store InnoDB data. You had working raw device in 5.5.32, then you upgrade to 5.6.12 and just can not start MySQL any more. Check this bug for the details and maybe you'll find out what we all miss in this case.
Bug #69356 - insert (or change, if you prefer) buffer is a known source of problems for InnoDB, historically. This is one more case when server can not start up successfully because "ibuf_restore_pos failed when trying to restore cursor to the first record of a leaf page". There is work in progress on similar/related internal bug report it seems, so we may expect some fix soon.
Bug #69325 - with default settings MySQL may consume unexpected amounts (too much) of memory while altering InnoDB table with many partitions. Take care!
Bug #69316 - Drop/Alter table takes much longer time in 5.6 than 5.5 (with big InnoDB buffer pool that is mostly filled). We all remember long story of fixing similar cases in MySQL 5.5, so here we have really bad news: it seems some of these fixed had not made their way to MySQL 5.6, or there is a regression for some other reason. But that's it - things you expect to work fast enough for a long time already may be slow again in MySQL 5.6.
Bug #69236 - skip the beginning of the bug report (impact of utf8 or client problems is a topic for other reports and posts) and move on to perf results at the end. This is what was verified, I assume. MySQL 5.6 is spending a lot more time in rec_get_offsets_func, trx_undo_report_row_operation and btr_cur_optimistic_insert functions comparing to older versions. This is one of the reasons for performance regressions for single-threaded workloads reported by Facebook. Note that on slave it is usually single SQL thread that does a lot of work, so this regression matters.
Bug #69179 - persistent statistics for InnoDB tables in MySQL 5.6 is not really persistent. Even with read only workload statistics changes when you query from information_schema.partitions, and this may cause query plans to change.
Bug #69174 - it seems page cleaner thread may sleep too long time (is not working hard enough) in some cases. Hardly to be fixed any time soon. Bug #69170 has somewhat similar future it seems in terms of getting fix from Oracle. Some day, maybe.
Bug #69168 - not sure if this is in 5.6 only, but SELECT COUNT(*)...GROUP BY sometimes returns wrong results on partitioned InnoDB tables. Check this to make sure your queries are not affected.
Bug #69141 - SELECT from InnoDB table hangs being run in parallel with replicating LOAD DATA. It seems you may get a surprise after upgrading slave to MySQL 5.6...
Bug #69002 - interesting finding by Domas: InnoDB reads transaction logs on writes. Let me quote: "With large transaction logs that means that either InnoDB will hit this a lot, or amount of memory equal to innodb transaction log will be wasted". If you are a big fan of Domas and/or Sinisa and/or discussion it's worth reading, no matter how big are your logs or what MySQL version you use.

I had to stop now because of "Top 10" format (too many bugs to list before my Bug #68079 that questions scalability improvements in a simple enough real life use case anyway, so why bother...). I've skipped numerous bug reports about "wasted work", or "redundant code" - while important for further performance improvements, they are hardly important for those who consider upgrade to MySQL 5.6. Something that looks like a regression or change in behavior, or no real improvement in performance, or lack of documentation is more important in this case, I think. I've also skipped Windows-specific (legacy) problems like Bug #69326 and bugs that are surely repeatable with older MySQL server versions as well. So, it's as much about MySQL 5.6-specific surprises as possible.

To summarize: you may get problems at InnoDB level as well when upgrading from older versions to MySQL 5.6. From inability to use raw devices to slow DROP table (again), wrong results of your queries and huge memory use/swapping. Or you may just enjoy improved scalability and great new features... It depends, as usual. Just make sure you reviewed know bug reports before making final decision on upgrade and set your expectations accordingly.

Monday, June 24, 2013

Fun with Bugs #13 - MySQL replication and two-way communication

I hope you had noted this already, but in case you missed it, please, read this post by Matt Lord and check any bug at http://bugs.mysql.com. As soon as you log in to your Oracle account, you can vote for bugs and feature requests! I hope that eventually somebody will publish lists of "Top N Most Wanted" fixes based on number of users who clicked on this great "Affects Me" button.

If you plan to use this new feature to express your needs while given a chance, why not to start with replication-related bugs in latest and greatest MySQL 5.6.12? Here is my "Top 10" list (starting with recently reported):

Bug #69444 - just do not assume that replication in MySQL 5.6 is magically crash safe in all cases. DDL and MTS (multi-threaded slave, in this context) may still make starting replication from proper position a problem sometimes.
Bug #69369 - when GTIDs and MTS are used slave's SQL thread just stops when binlog rotation happens on slave (related to binlog group commit it seems).
Bug #69341 - semi-sync replication is slow when changes are done by many clients on master. There is a preliminary patch in the bug report, so you may want to check it (no public feedback from original bug reporter so far).
Bug #69135 - probably just a documentation issue formally, but still: don't forget to add sync_master_info=1 when master_info_repository = TABLE and maybe more if you want replication to be really crash safe in 5.6. Check comments in the bug report carefully, please.
Bug #69097 - mysqld scans all binary logs on crash recovery. This is really serious and may be considered a performance regression of a kind. I am surprised that there are no public comments since April 30 and I really hope 5.6.13 is going to fix this bug.
Bug #69096 - GTID_NEXT_LIST session variable is not visible. As a result, there is no way to recover from Bug #69045, so make sure you use MySQL 5.6.12, not any older version.
Bug #69095 - replication in 5.6 (including 5.6.12) may break with GTIDs enabled and master changes from SBR to RBR. Bug is still "Open" and similar problem could happen with 5.5.31 it seems, but still make sure you review this case if you plan to switch to GTIDs in 5.6.
Bug #69059 - for many real life use cases it's not possible to turn down and restart the entire database topology simultaneously in order to enable GTIDs. So, how to start using this feature in production, while you upgrade from older 5.x.y to 5.6? Still no public answer since April 24 to this question from Facebook, unfortunately.
Bug #68953 - some of binlog write errors, namely originating in MYSQL_BIN_LOG::do_write_cache, are silently ignored. This may be considered as a regression comparing to 5.5 (as new code is affected) and, in any case, is not any good.
Bug #68892 - Invalid use of GRANT command breaks replication. Surely DBA can break it using some other way, but writing to the binary log something that just can not be executed is wrong in any case, I think.

Let's stop at this stage. The goal of this issue was not to provide a complete list of replication bugs in 5.6.12, but rather to give a yet another partial answer to the question on what users upgrading to MySQL 5.6 should care about. To put it simple: do not assume that concurrency improvement, MTS, GTIDs, crash-safe replication and other new replication features just work well together by default.

Previous partial answer was given here. Few more issues on InnoDB, installation or upgrade/downgrade problems and PERFORMANCE_SCHEMA are still needed to get the entire picture...

Sunday, June 23, 2013

Fun with Bugs #12 - MySQL Cluster 7.3 GA

I had always tried to avoid all kinds of clusters, from Oracle RAC to MySQL NDB Cluster and Percona XtraDB Cluster, as much as possible. But these days clusters become common and it seems new developments in this area can not be just ignored. So, I decided to devote this issue of "Fun with Bugs" to MySQL Cluster 7.3, that was released as GA this week and still is in the news.

The release was mostly about adding foreign keys support (one of the features that some users were missing for years comparing to InnoDB and other cluster database solutions). At the same time, MySQL Cluster is now based on MySQL Server 5.6 code. I've decided to quickly check how community adopted 7.3 and what it means in terms of bug reports.

If one would just search for active bugs in version "7.3" at the public bugs database using its own advanced search form, she would find only 8 bugs, of them only the following are recent and/or seems serious:

Bug #69528 - ORDER BY with JOIN may produce wrong results in 7.3.2. This is probably related to one of optimizer regression bugs in MySQL 5.6 (check previous issue for some of them) and is not specific to NDB storage engine, but still may become a regression that one can easily hit.
Bug #69510 - LIKE just does not work as expected with NDB tables. This is really weird, if you ask me, to see this bug in new GA release. Read that report to check what others think... It seems a well known old problem, probably with a known solution, that somehow ends up not fixed in too many MySL Cluster releases. When I ask what's going on with MySQL QA these days it's exactly this kind of bugs that makes me worry about it, no matter how much Oracle really invests into the process. It seems that some procedures are just not there or still not followed.
One of the new features of MySQL Cluster 7.3 is "NoSQL JavaScript Connector for Node.js". Unfortunately it is not bug free, check Bug #69509. Good that it's al;ready work in progress and was reported by Oracle MySQL engineer in public bugs database. It shows that engineers care, still, if not to prevent the bug then to inform about it as soon as it is found.

Other bug reports are either old enough or not of wide importance. There are minor problems with installer on Windows (check Bug #69120 and Bug #69112 from Shane), ndbd does not report failure status when it runs as Windows service (check Bug #69063). Problems with table names case (non-) sensitivity on Windows, that used to hit InnoDB, seems to be reported many months ago for new MySQL Cluster foreign key feature (check Bug #67354), and it is still not clear if anybody is going to do anything about it. That's all "Verified" bugs for now in 7.3.

Does it mean that MySQL Cluster 7.3 GA is nearly bug free and safe to upgrade to? Unfortunately no, and the reason (besides 3 bugs above) is simple: it is based in MySQL 5.6, moreover, 5.6.11 at the moment, so you should expect to see dozens of bugs that are still not fixed in MySQL 5.6. So, it's time to test this release for sure, and it's time to check active bugs in MySQL 5.6, but, IMHO, hardly it's time to think that upgrade to MySQL Cluster 7.3 may be seamless and not prune to regressions.

Saturday, June 22, 2013

Fun with Bugs #11 - Top 10 Optimizer Regression Bugs in MySQL 5.6

I've got a question from colleague last night on what bugs should users take into account if they plan to upgrade to MySQL 5.6 now. Simple answer is: it depends. If one of the new features or scalability improvements are really important, then bugs in other features or clearly identified problematic use cases may be just ignored or avoided.

But to be on a safe side users should at least check if they are (or may be) affected by a known regression bugs, when new version is slower or produce wrong results or crashes in cases that worked without problems before.

List of bugs in MySQL 5.6 that can be formally considered as regressions comparing to previous major versions would be long enough for a single post. So I'd like to concentrate on regression bugs in Optimizer here:

Bug #69471 - "UNION of derived tables returns wrong results with "1=0/false"-clauses". Even current code of 5.6.13 is affected. Unfortunately all kinds of programs that generate SQL based on user input often add these "1=0" or "1=1" clauses to the list of conditions, and MyDQL 5.6 may start producing wrong results for queries with default settings.
Bug #69410 - it's not the first time when queries with LIMIT clause perform actually worse that queries without LIMIT, especially for InnoDB tables with many indexes defined and ORDER BY clause in the query. Here index condition pushdown optimization comes into play and leads to slower execution comparing to 5.5.x. Work in progress already, so maybe MySQL 5.6.13 will fix this.
Bug #68979 (with recent Bug #69390 considered a duplicate by Miguel but no way for us to check or confirm as user's schema is private). New feature, "Delayed materialization of derived tables", may lead to different plan comparing to 5.5.x and performance regressions. Looks like users that use derived tables a lot in their SQL statements should be very careful while testing with MySQL 5.6.
Bug #69350 - Shane had found that stored procedures doing things as simple as select 1 into `j`; in a loop perform notably worse in MySQL 5.6 comparing to MySQL 5.5. I am not sure if "Optimizer" is a proper category for this bug, but still this is something to take into account if this use case is typical for your application.
Bug #69268 - another public bug report from Oracle engineer. It seems that simple query like SELECT * FROM a LEFT JOIN b ON a.id = b.id GROUP BY a.id; may start to return different results in 5.6 comparing to 5.5. One may speculate on how correct is it to rely on extended GROUP BY feature of MySQL, or can this be considered an extended GROUP BY if the right table has only one row, but changes in results for simple queries are always surprising...
Bug #69233 - this bug is probably related to fraction of seconds now supported for date/time in 5.6 (this was a problem for Percona data recovery tools as well, by the way) and may affect ODBC-based applications. Anyway, inconsistent results of queries are never good.
Bug #69219 - I am happy to see Oracle MySQL engineers still reporting bugs at public bugs database, but hardly users are happy when statement like CREATE TEMPORARY TABLE ... SELECT is 4+ times slower on 5.6.12 than on their good old 5.1.
Bug #69005 - check this if you use ORDER BY LOWER(column) in any query. It may produce wrong results in MySQL 5.6.x easily. As simple as that, and still not fixed.
Bug #68919 - yet another great case when new optimizer feature (DS-MRR in this case) leads to performance regression, again LIMIT involved. Fortunately you can just disable this feature with optimizer_switch="mrr=off";
Bug #68897 - one more case of wrong results in MySQL 5.6 when user variables, derived tables, GROUP BY and ORDER BY are involved.

I think we can stop now, to be able to add nice "Top 10" clause to the title.

To summarize, if you plan to upgrade to 5.6 now and use derived tables, user variables, GROUP BY, ORDER BY or LIMIT clauses in your queries, please, review bug reports above, other active optimizer bugs affecting 5.6 (I hope you know how to find them) and text your applications carefully. You should expect both performance regressions and wrong results from MySQL 5.6.12 in these cases, and not all of them are easy to workaround. You may have to disable new optimizations or rewire your queries.

Saturday, June 15, 2013

Fun with Bugs #10 - recently reported bugs affecting MySQL 5.6.12

MySQL 5.6.12 is available to community for more than a week already, so people started to test and use it. And, no wonder, new bug reports started to appear. Let's concentrate on them in this issue.

I'd like to start with a funny one. Bug #69413 had scared some of my Facebook readers to death, as we see kernel mutex mentioned clearly in the release notes for 5.6.12. What, kernel mutex comes back again? No, it's just a result of null merge and, probably, copy/paste from the release notes for 5.5.32.

It seems recent bug reports for 5.6.12 are mostly related to small details that may not be of any importance to a typical user. For example, Bug #69419 that was reported by my colleague almost immediately after release questions the way mtr is used in the release process. Change related to fix for other bug had broken few tests, but tests were neither updated nor temporary disabled it seems. This is strange, at best, and can mean many things (from simple mistake to "nobody cares", to switch to some other tools for internal regression testing).

"Nobody cares" does NOT apply though, as during this week Shane Bester had reported 2 public bugs related to potential performance improvements possible in 5.6.12. Check Bug #69420 and Bug #69422. Looks like he tries to find and eliminate reasons for even less than smallest slowdown in benchmarks.

He is not the only one. Check Bug #69451. Event the smallest chunk of redundant code can not hide these days from careful users...

One topic for bug reports is ages old: MySQL still do not use proper data type for integers in many parts of the code. Bug #69431 from Shane is one of recent examples. Bug #69469 (that is more or less a duplicate of Bug #69249 reported for 5.6.11 a month ago), is another one, but related to a new feature introduced in 5.6. It seems that topic is valid for a new code as much as for older one that Monty and Sinisa were reviewing a decade ago. Let's hope that for MySQL 5.7 GA the review of the entire code base is planned, with the aim to find and fix all problems of this kind (among others).

Unfortunately it's not only about minor and cosmetic things. If you use raw devices with InnoDB and plan to upgrade to 5.6, check Bug #69424. It's not yet verified, and previous bug of this kind, Bug #68860, was set to "Not a bug" in two days... But, well, how one should upgrade with existing raw decide containing data, when code of srv_file_check_mode() function clearly says:

/*********************************************************************//**
Check if a file can be opened in read-write mode.
@return true if it doesn't exist or can be opened in rw mode. */
static
bool
srv_file_check_mode(
/*================*/
        const char*     name)           /*!< in: filename to check */
{
        os_file_stat_t stat;

        memset(&stat, 0x0, sizeof(stat));

        dberr_t         err = os_file_get_status(name, &stat, true);

        if (err == DB_FAIL) {

                ib_logf(IB_LOG_LEVEL_ERROR,
                        "os_file_get_status() failed on '%s'. Can't determine "
                        "file permissions", name);

                return(false);

        } else if (err == DB_SUCCESS) {

                /* Note: stat.rw_perm is only valid of files */

                if (stat.type == OS_FILE_TYPE_FILE) {
                        if (!stat.rw_perm) {

                                ib_logf(IB_LOG_LEVEL_ERROR,
                                        "%s can't be opened in %s mode",
                                        name,
                                        srv_read_only_mode
                                        ? "read" : "read-write");

                                return(false);
                        }
                } else {
                        /* Not a regular file, bail out. */

                        ib_logf(IB_LOG_LEVEL_ERROR,
                                "'%s' not a regular file.", name);

                        return(false);
                }        } else {

                /* This is OK. If the file create fails on RO media, there
                is nothing we can do. */

                ut_a(err == DB_NOT_FOUND);
        }

        return(true);
}

That is, if file is not a regular file we unconditionally return false, and as soon as this function returns false in all places it is used we just assume error. (I have to check this myself eventually as I have no raw decide at hand for immediate test, but code like this does not present in MySQL 5.5, so it seems good old manual page just can not be used any more.)

It seems Oracle MySQL engineers should pay more attention to testing upgrade procedures (and reading community bug reports). Even if eventually this may not be the case, currently community QA efforts (and public bugs database) are still important and sometimes lead to findings that seem new and unexpected to Oracle MySQL engineers.

Another serious enough bug from recently reported and verified, Bug #69444, is related to replication. It seems to be not really crash safe when DDL statement is involved. Potentially when crash happens during "wrong" time, DDL is going to be executed again upon slave restart.

That's all for now. MySQL 5.6.12 is going to be the best release ever for 6+ more weeks it seems, so we all have plenty of time to check it and contribute to public bugs database...

Tuesday, June 4, 2013

Fun with Bugs #9 - MySQL 5.6.12, quick review

So, it seems we have MySQL 5.6.12 released officially. We have great Changes in MySQL 5.6.12 page already widely shared and people already blogging about a feature implemented by my dear friend Sinisa.

Quick scroll over changes shows 130+ bugs fixed and it will surely take time to understand the impact of all these fixes. We have 2 months for this till next release, so eventually we'll find out what's good in MySQL 5.6.12 and should we immediately switch to it from all other older 5.x.y versions.

But we have to start with something, and I'd like to start with bugs that I've mentioned in issue #7 and issue #6 of "Fun with Bugs" series here. Let's check them one by one, starting with older ones:

Bug #68299 - closed, fixed in 6.5.12
Bug #68251 (and Bug #68569) - closed, fixed in 5.6.12
Bug #68250 (semisync replication) - declared "Not a bug" for reasons I do not completely understand.
Bug #68220 (minor, when replication info is stored in tables) - still "Verified", but it's minor...
Bug #68192 (data types) - it was reported at 5.5.29 and 5.6.9 times and got no visible attention since verification. Nothing changed. This happens more often than I'd like to see with bug reports from Elena/MariaDB. I wonder why...
Bug #68171 (missing manual) - still no comments, since 5.6 RC times. Nothing changed.
Bug #68144 (custom character sets, regression) - no visible progress since 5.6 RC times. Nothing changed. Is this because SkySQL is now MariaDB? Who knows...
Bug #68097 (incomplete manual) - still "Verified". No changes. But this is a documentation request and from me, and I "like" SkySQL and MariaDB often enough at Facebook, so maybe logic is the same. Kidding...
Bug #68079 (lack of scalability for joins) - this my report was mentioned by DimitriK during the Conference and I am sure Oracle is working on it, but public bugs database shows nothing new.
Bug #68041 (zero date, regression) - from Elena. Still no comments
Bug #67982 (partitioning) - from Elena. Still no comments, since last year
Bug #68350 - closed, fixed in 5.6.12!
Bug #68525 - replication bug verified since February 28. No visible progress.
Bug #69005 - regression bug in optimizer. Still "Verified" since April 19.

Now, more recently reported or updated:

Bug #69095 - replication fails with GTID enabled and master changes from SBR to RBR. Now back to "Open", but Giuseppe proved it is repeatable (his way, using MySQL Sandbox) even with 5.6.12. So, just some time wasted.
Bug #69135 - mysql.slave_master_info is not updated. Still "Verified", but not so much time passed since May 6, so let's hope for 5.6.13 to fix it.
Bug #69236 - still "Open". But it seems Mark Callaghan is sure that Oracle works on single-thread performance issues he reported, so we can stop caring much about this and just wait for outcomes...
Bug #68354 - the last but not the least, my favorite Federated storage engine keeps providing interesting ways to crash server. Bug was reported for 5.6.10 and is still "Verified".

That is, of 18 bugs I "cared about" for whatever reasons as of 5.6.11 only 3 or 4 were fixed in 5.6.12. Let's hope you are more lucky.

In any case, 5.6.12 looks like a good step forward, with many serious bugs fixed. I see only 17 bugs explicitly mentioned as affecting 5.6.12 here, and none of them looks like a recent regression (correct me if I am wrong). So, MySQL 5.6 is moving into the right direction, no matter what I write or do.

MySQL 5.6 Experiences - .mylogin.cnf and mysql_config_editor

Having basic ideas of how I am going to describe new features explained, I can proceed with some real (and I hope useful) content. As I read this page about new features from top to bottom, let's start with security improvements...

.mylogin.cnf and mysql_config_editor

Details:

you can store authentication credentials encrypted in an option file named .mylogin.cnf (in user's home directory or in %APPDATA%\MySQL on Windows)
password is no longer stored in plain text (like in .my.cnf) and still is not exposed in the command lines...
you have to use mysql_config_editor utility to create the .mylogin.cnf file
but if someone can read .mylogin.cnf, they have your MySQL password
~/.mylogin.cnf is not a more secure place to store your password than ~/.my.cnf

Links:

(!) http://mysqlblog.fivefarmers.com/2012/08/16/understanding-mysql_config_editors-security-aspects/ - Todd explains that "ease of use" was the goal of this feature, not actually security
(!?) http://serge.frezefond.com/2013/02/mysql-5-6-credentials-securely-stored/ - some explanations on what this new feature is about and, most importantly, code of the mysql_showconfigpwd.cc program from Serge to expose the credentials stored
(!?) http://mysqldump.azundris.com/archives/104-.mylogin.cnf-password-recovery.html - Kristian explains file format and provides PHP code to expose the credentials stored
(?#) http://www.skysql.com/blogs/kolbe/mysql-56-security-through-complacency - Kolbe explains (among other rambling on MySQL 5.6 security features) how to "hack" the mask_password_and_print() function from client/mysql_config_editor.cc file in the source code to build mysql_config_editor version that will show stored password in plain text. Bug #68602 mentioned in this post is not related to this file and tool (but related to security, still "Verified" as of today and will be discussed later in this series).

Bugs:

New MySQL features always come with related bugs. This simple enough tool and new file caused several related reports:

Bug #68680 - manual had some wrong details on where the .mylogin.cnf file is located on Windows, now fixed
Bug #68034 - while it is declared a "Duplicate", in the bug report you can read how this feature can be used to workaround Bug #66546 that nobody is going to fix (see Todd’s comment)
Bug #68277 - build problem affecting mysql_config_editor, fixed in 5.6.11
Bug #66546 - some people are not happy that there is no way to hardcode password in the script,
without getting warning. But who cares? Only Todd, who explained how to use one new security improvement to workaround problem created by another security improvement...

If you have any new links or bug reports related to the topic of this post, please, send them in comments.

Summary

While some more secure approach to store and provide authentication credentials for scripts was requested many times and by many users, I am not sure this new feature adds anything to security. IMHO it just may simplify work for a lazy DBA who does not remember proper permissions for .my.cnf file...

At the same time, this new feature probably caused additional work for developers and QA of MySQL Connectors and Workbench (as I have reasons to think it was probably decided that this time, for 5.6 GA, connectors and tools should really support all new GA features at the moment of GA release). I had found only one related bug in the public bugs database to prove this point (Bug #68356 - problems in Workbench tests that cover both MySQL 5.5 and 5.6), but who knows how many of them were reported internally and then fixed...

I share concerns that this feature may cause wrong feeling of improved security, that's why I talked about it during my presentation and had written this post. I agree with the following comment to Kolbe's post from Sergei Golubchik though:

"There's not much one can do to improve password handling. The mysql client needs to store some bit of information somewhere that allows to identify a user to the server. So if a bad guy would have access to this bit of information - no matter how obfuscated or encrypted it is - he will be able to impersonate the user just as well by repeating whatever mysql client does."

Sunday, June 2, 2013

Fun with Bugs #8 - what's wrong with Oracle's way of handling public MySQL bugs database

Many people seem unhappy with the way Oracle develops MySQL. I am not one of them. I think very few really important things are missing and in this post I'd like to concentrate on one of them: having internal and public bugs databases not in sync in too many cases.

Let me quote myself to explain where problem starts:

"Now the most important thing you should know about MySQL bugs processing the way it is done now in Oracle. When bug is "Verified" and(!) considered serious enough, it is copied to the Oracle internal bugs database and further processing, including status changes etc, is done there. All further comments to the public bug report are then copied to internal bug report automatically, but no comments or status changes in internal bug report are copied to public bug in any other way but by explicit action of some Oracle engineer."

Why is this a problem? Check Bug #42415 for example. It had a lot of activity and even patches pushed until December, 2010, but it is still "Verified". The problem is real and there is even a workaround implemented for cases like this in Percona Server. On October 2012 Sherii asked for status update:

"It says a patch has been committed, but the "status" is "patch pending". Was this actually put in a version of MySQL? If so, which version?

I am interested in this fix, and put in my "vote" to get this fixed."

Then bug became just "Verified" and we know it is not fixed. But why, what prevents this or when it will be fixed? Nobody knows. Well, I have a hint based on a comment from former colleague upon explicit request... But what if you are NOT hanging around on my Facebook page on a regular basis and you are NOT an Oracle customer? You may have to just wait fishing in the dark or keep asking in public (or investing time into workarounds for this problem in your software, or keep telling others that Oracle "kills" MySQL and creating foundations to "save" it, up to you to decide).

Anything wrong with this? I am not sure it's totally wrong in general. There should be some reason for you to become Oracle customer, and having access to more information may be a good enough reason (having a way to request and even demand fixes you need is even better reason). But I think this is wrong if in the corresponding bug in the internal Oracle database there is public comment explaining current status of the bug.

This leads to my point: public bug reports are usually NOT in sync with corresponding internal bug reports.

Status may be not in sync and useful public (and not any confidential or security-related) comments in the internal bugs database are not accessible to the community until some kind soul in Oracle closes the public bug and documents the fix. We know from some examples that this may not happen for months after the bug is really closed. Public bug may be declared a "Duplicate" of something that we know nothing about, check Bug #65326 (here I've got no hints after public request unfortunately), and so on.

How this can be solved (assuming Oracle is going to solve this and still plan to rely on public bugs database and community QA efforts - who knows if they really want these and for how long)? I see two options:

Automatically copy back (with a delay, with some manual approval work if needed) all public comments and status changes from internal bug to corresponding public bug report. Technically this is probably easy and there should be scripts almost ready (as forwarding from public bugs database is automated by scripts). Exceptions like security bugs are easy to identify. Delayed replication should help to prevent mistakes. I asked for this more than once when I still worked in Oracle...
Manually copy back all essential comments and status changes. This is probably a full time job for an engineer or technical writer (or maybe even several), or member of Bugs Verification team, or some independent contractors with non-disclosure agreement signed and trainings passed. I'd happily participate in this kind of activity as an independent contractor in my free time and, I hope, Percona would not be against this.

If nothing like this (investment of time or costs from Oracle side to increase value of public bugs database, to put it simple) happens any time soon it will be a clear indication (for me alt least) of decreasing interest in community bug reports and QA efforts in general. Small step to the direction I personally do not like.