Network World
Thursday, August 21, 2008
DNSstuff.com
Get information about your IP
IP Information
50+ On-demand DNS and network tools

A World of Bytes

Navigation

Should you ever use MySQL when Postgres is a superior DBMS?

By most measures, PostgreSQL and other members of the Postgres family are superior database management systems to MySQL.  They're more functional, more scalable, and easier to program.  Even so, MySQL has much more market/mind share than Postgres, and the reasons aren't solely mass insanity.

Let's start by reviewing some of the leading knocks against MySQL:

  • MySQL 4 and any version of MySQL over the MyISAM storage engine are missing some basic SQL standard transactional capability.  They should not be used for serious DBMS applications.
  • InnoDB, the usual transactional storage engine for MySQL, is of dubious scalability.  What's more, it's now owned by Oracle.
  • Falcon, the replacement transactional storage engine for MySQL, isn't generally available yet.  When it does come out, it will be immature.
  • Analytic functionality (even some basic kinds of join) has been problematic in MySQL.

But there are also some strong plusses on MySQL's ledger:

  • Many applications -- including Wordpress, on which I run all my blogs except this one -- are only available on MySQL.  (I would guess that this blog, which runs on Drupal, also runs over MySQL, but I haven't asked.)
  • MySQL is easy to set up.
  • Many people know how to run MySQL.
  • Some Very Large Web Databases run on MySQL.
  • Speciality data warehouse engines (e.g., Infobright Brighthouse, Kickfire) have been developed for MySQL.

As for Postgres:

  • PostgreSQL has been around for a long time. It has a large fraction of the features market-leading DBMS have, and in fact was the original research testbed for many of them.
  • EnterpriseDB's Postgres Plus adds more cool features, some free and some chargeable, including Oracle compatibility.
  • There are Postgres-specific data warehouse engines too.

Frankly, I like advanced DBMS features, such as robust support for various kinds of datatype.  So whenever it makes reasonable sense, I'd favor using the Postgres family.

Related links:

Virtually all of these "Advantages" aren't...

Useful answer?
0
  • MySQL is easy to set up

    So is PostgreSQL; I find it way easier to set up than MySQL. Neither involve "rocket surgery," and PostgreSQL has an advantage for cases where you want to use transactional capabilities, in that there's no need to install or configure any extra add-ons that mayn't come from MySQL AB.

    To characterize it accurately, You are more familiar with how to set up MySQL.

  • Many people know how to run MySQL.

    Again, this is not true for PostgreSQL?

    It actually isn't obvious that, if you drop out the set of people with very shallow understandings, that the relative quantities will differ all that much...

  • Some Very Large Web Databases run on MySQL.

    And this may be taken to imply that no large "Web Databases" run on PostgreSQL?

    The largest "Web Database" known is Yahoo's multi-Petabyte database, which is based on PostgreSQL. (It seems to be the largest database of any sort that has been publicly publicized.) It's not exactly PostgreSQL as we know it, but it certainly seems like some form of counterexample, particularly in that this one database probably outweighs the size of all MySQL databases in the world put together...

  • Speciality data warehouse engines (e.g., Infobright Brighthouse, Kickfire) have been developed for MySQL.

    Licensing considerations mean that the vendors that have built "specialty DW engines" atop PostgreSQL haven't had to say so. Yahoo's database is certainly an example; ExtenDB, GreenPlum, and Truvisio are others. There is indication (difficult to document, because vendors aren't always declaring themselves) that plenty of vendors out there are using PostgreSQL's parser to connect to other kinds of backends.

    The base capabilities of PostgreSQL are sufficiently interesting that people are using it directly for DW applications; it's only if you've got wackily-large amounts of data, getting to the dozens of terabytes range, where it starts to be necessary to do special things with hardware partitioning.

The only "plus" you can really retain is the "there are a bunch of applications, such as WordPress, specifically coded for MySQL."

I'[ve written about most of those points at length on DBMS2 :)

Useful answer?
0

Christopher,

I'd encourage you to visit one of my other blogs, DBMS2, where most of your points are considered at length (including the Yahoo example).

You're right that most of the plusses cited for MySQL apply to Postgres too.

As for your point on data warehouse engines:

  • Greenplum really is a version of PostgreSQL.
  • ExtenDB is now owned by EnterpriseDB, the Postgres Plus folks. It's not really competitive with the high-end data warehouse engines, although I'm sure there are scenarios where it's a lot better than nothing.
  • Netezza, ParAccel, Vertica and others used some level of Postgres APIs, Postgres code, or both. Even so, I'd question the extent to which Postgres knowledge would carry over to any of those individual products.

CAM

My personal Postgres cons

Useful answer?
0

Postgresql cons (i have years of exp with MySql and two months exp with Postgres):

  • can't do COUNT()-s fast enough.
  • can't turn off transactions (to get rid of the overhead)
  • requires trick syntax to do multiple inserts in one query
  • has no REPLACE INTO
  • has no built in backup/replication mechanism
  • gets ridiculously huge and slow if not VACUUMed often.
  • requires you to list in GROUP BY clause all fields you want to use in SELECT clause, even though you already have primary key listed in GROUP BY.
  • requires you to list in SELECT clause all fields you want to use in ORDER BY clause.
  • ignores indexes until you manually do ANALYZE
  • does not allow to turn off indexes temporarily to accelerate INSERTS and UPDATES
  • no support for User-Defined variables (mysql style) that allow doing interesting things in single query, like picking random subset of rows fast:
    SET @toGet=10; SET @left=(SELECT COUNT(*) FROM offers); SELECT *, @toGet:=@toGet-1
    FROM offers WHERE (@left:=@left-1)>0 AND RAND()<@toGet/@left;

From my point of view MySql was just trying to get people what they wanted while PostgreSql was trying to uphold the sql standards, and frankly people like getting what they want.

In most cases db is just a bag to throw data in, and to get them from it very fast. Things Postgres gives you are often unnecessary, and hoops it make you jump through are tiresome.

I also don't like how Postgres is encouraging you to put you logic inside db. You end up with tangled mess where part of logic is inside source files and part is magically enclosed in various triggers, rules and stored procedures in db away from you project and version control system.

...

Useful answer?
0

Self correction of example of use of user-defined variables ;-)

SET @toGet=10; SET @left=(SELECT COUNT(*) FROM offers)+1; SELECT *, @toGet:=@toGet-1
FROM offers WHERE (@left:=@left-1)>0 AND RAND()<@toGet/@left;

And dont forget

Useful answer?
0

the main reason is that 99.9 % of shared hosts all use mysql !

Which is one of the main reasons people dont know / use postgre

@Kamil: "does not allow to

Useful answer?
0

@Kamil: "does not allow to turn off indexes temporarily to accelerate INSERTS and UPDATES"

Well, how does an INSERT get accelerated if you *temporarily* turn off the index? When you turn it back on, the data must be checked against the index anyways...

@Kamil: "User-Defined variables (mysql style) that allow doing interesting things in single query"

Those are multiple queries, not a single query. Please note the semi-colon that indicates you are performing three queries.

@Kamil: "I also don't like how Postgres is encouraging you to put you logic inside db"

I don't like how some developers feel that logic in a database that guarantees basic data integrity is an annoyance. Also, projects I have been on have that "logic" worked out in a way that they are under VCS. The build system is the way to get the DB built, not anyone muddling around directly in the DB. Poor project logistics is not Postgres' fault.

You may have to spend more than two months on Postgres. You may broaden your MySQL-only perspective...

more

Useful answer?
0

Well, how does an INSERT get accelerated if you *temporarily* turn off the index? When you turn it back on, the data must be checked against the index anyways...

Usually bulk operations are faster than processing items one by one. So bulk index update after large amount of inserts can be (and usually is) faster than updating index after each inserted record.

Those are multiple queries, not a single query. Please note the semi-colon that indicates you are performing three queries.

If you actually read the queries not just counted semicolons you would notice that in my piece of code there are only two actual queries (third is just "pseduoquery" setting sessionwide user variable) of wich only one contains interesting things that I am referring to in my comment. Sorry for being little impolite on this but you did the same to me by assuming that I don't know the meaning of semicolon in my own code.

The build system is the way to get the DB built, not anyone muddling around directly in the DB. Poor project logistics is not Postgres' fault.

DB built by build system when it is needed is something I am yet to witness. For most projects I've seen, people were inserting code directly into db. And I just don't like that. I blame Postgres for encouraging this behavior because when people get "full featured db" they are tempted to use this features in most straightforward way possible.

Some of what you said is wrong in the latest versions

Useful answer?
0

* can't do COUNT()-s fast enough.

This was very true in earlier versions of PostgreSQL, but the latest versions greatly decrease the time it takes for COUNT() operations. Actually all aggregate operations such as SUM(), AVG(), etc.

* can't turn off transactions (to get rid of the overhead)

While many people would consider this a problem, personally I see this as a feature. I WANT my data to be safe, even at the expense of some speed. And honestly, the transaction overhead in PostgreSQL isn't large.

* gets ridiculously huge and slow if not VACUUMed often.

With modern versions of PostgreSQL there is an autovacuum daemon running that does this constantly. If your disk space is getting eaten up, it means you don't have your free spac e map configured properly for the amount of data you're updating/deleting. The default settings are ok for most people, but not appropriate for larger/busier systems. This is just a case of you not knowing enough about PostgreSQL and knowing a ton about MySQL.

more

Useful answer?
0

the latest versions greatly decrease the time it takes for COUNT() operations. Actually all aggregate operations such as SUM(), AVG(), etc.

Great to hear that. I noticed this issue on 8.2 but we are currently migrating to 8.3 and I hope to see improvements you mentioned.

can't turn off transactions (to get rid of the overhead)

While many people would consider this a problem, personally I see this as a feature. I WANT my data to be safe, even at the expense of some speed. And honestly, the transaction overhead in PostgreSQL isn't large.

Sometimes (often in case of popular web applications) performance is much more important than consistency. When you have such need and you use Postgres as your db you can't do much about it.

With modern versions of PostgreSQL there is an autovacuum daemon running that does this constantly.

Autovacuum triggered at undetermined moments may add load to already heavily loaded database.

This is just a case of you not knowing enough about PostgreSQL and knowing a ton about MySQL.

You probably overestimate my knowledge in both cases. ;-)

more

Useful answer?
0

the latest versions greatly decrease the time it takes for COUNT() operations. Actually all aggregate operations such as SUM(), AVG(), etc.

Great to hear that. I noticed this issue on 8.2 but we are currently migrating to 8.3 and I hope to see improvements you mentioned.

can't turn off transactions (to get rid of the overhead)

While many people would consider this a problem, personally I see this as a feature. I WANT my data to be safe, even at the expense of some speed. And honestly, the transaction overhead in PostgreSQL isn't large.

Sometimes (often in case of popular web applications) performance is much more important than consistency. When you have such need and you use Postgres as your db you can't do much about it.

With modern versions of PostgreSQL there is an autovacuum daemon running that does this constantly.

Autovacuum triggered at undetermined moments may add load to already heavily loaded database.

This is just a case of you not knowing enough about PostgreSQL and knowing a ton about MySQL.

You probably overestimate my knowledge in both cases. ;-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

About Curt Monash

Curt Monash is a leading analyst of and strategic advisor to the software industry. Praised by Lawrence J. Ellison for his "unmatched insight into technology and marketplace trends," Curt was the software/services industry's #1 ranked stock analyst while at PaineWebber, Inc., where he served as a First Vice President until 1987. He subsequently co-founded Evernet, Inc., a $40 million networking systems integrator. Since 1990, he has owned and operated Monash Research, an analysis and advisory firm covering software-intensive sectors of the technology industry. In that period he also has been co-founder, president, or chairman of several other technology startups.

Curt has served as a strategic advisor to many well-known firms, including Oracle, Microsoft, SAP, AOL, CA, and Netezza. Curt earned a Ph.D. in mathematics (Game Theory) from Harvard University. He has held faculty positions in mathematics, economics and public policy at Harvard, Yale, and Suffolk universities.

RSS feed

A World of Bytes archive.

Advertisement: