Stupidest excuse for slow app performance? The Magic Threshold.

It seems as though every aspect of IT has this imaginary point where it ceases to work simply because it gets tired.

As a production DBA I see different problem scenarios all the time. And in every one of these scenarios there’s always this blame game going on. The apps people think it’s the database; the database people think it’s the disk; the disk guys folks think it’s the server; the server guys folks think it’s the network; and the network peeps may not point fingers at anyone, have no idea what it is, they just know it’s not the network. And as fun as that game usually is, nobody ever bothers actually pulling numbers to solve the problem, beyond seeing that it can’t possibly be on their own turf. What’s even better is their justification for their diagnosis. I’ll talk about the database since that’s the world I live in the most.

Tell me how many times you’ve heard this conversation:

DBA: “What makes you think it’s the database?”

AppGuy: “Well, it’s a database app so it stands to reason the database is the problem since we haven’t changed anything in the app.”

DBA: “But it was working fine a couple days ago, even earlier today. And we haven’t made any changes on our side either.”

AppGuy: “Maybe the database reached a threshold and now it’s not able to keep up.”

DBA: “Really, that’s what you wanna go with?”

I’ve always wondered about this magic threshold that seems to plague IT. Because it’s not only in DBs, it plagues networks, and servers, and OSs too. It seems as though every aspect of IT has this imaginary point where it ceases to work simply because it gets tired. Now we know thresholds exist, but to declare the database has hit a threshold when it’s still under a TB (terabyte) is just ridiculous and shows everyone how little you know. Even if the DB is over a TB, the likelihood that in just a single afternoon it’s gone from running to crawling because those extra 50 rows have been entered just doesn’t make sense.

What does make sense though is for an index to become too fragmented and stop performing well in the middle of the day. That happens all the time. It also makes sense that you could be taxing the disks with an extra workload and that’s making things move slower. So are these thresholds? No not really. They’re a normal part of operations that occur all the time. So what we’ve got here are extremely uninformed IT guys grasping at straws to find a solution. I’ve actually stopped meetings before with conversations very similar to this:

AppGuy: “It’s got to be the database.”

Me: “Why?”

AppGuy: “Well I noticed that we have over 200 million rows in this table so it must have crossed some kind of threshold. Maybe we need to create another table and connect both of them with a view.”

Me: “Actually, it’s the system colors. Blue is much harder for Windows to process and that can have a severe performance impact.”

AppGuy: “What are you talking about?”

Me: “Oh, I’m sorry, I thought we were having a stupidest excuse contest.”

And I’m picking on the apps guy right now but the truth is that I’ve seen plenty of this kind of thing from DBAs too. Listen, ‘threshold’ can’t be the default diagnosis that explains every problem you can’t solve.

The Cure

There has to be an answer though, right? Of course there is… stop believing in superstition and do the obvious but time-consuming thing: pick up a book. Hell, pick up 3 or 5 while you’re at it. Of course I realize that simply picking up a book and reading it isn’t the actual cure. You have to actually work the examples and understand why things are being done the way they are. Strive to really understand what makes it work and then you can start to reason the answers for yourself. Knowledge is king. If books aren’t your thing, go into forums and ask questions. And above all, instead of boldly stating what the problem is, ask the DBA if he’s seen anything wrong with the database. And then let him answer. help educate you. Ask the question, is there a threshold that would keep the database from processing a table this size? Here’s another good one. What kinds of things could make the database performance tank so quickly?

Of course, it’s always an excellent idea to benchmark your servers. How can you possibly know what’s normal or what really is too much if you’ve never run a workload that mimics yours and increased it until the server screamed? Everyone should know what the limits of their server are and how close they are to them.

I’ve actually got one of the biggest examples of thresholds gone wild in my shop right now. A vendor of ours decided that SQL Server couldn’t handle databases over 60GB so they’ve capped a limit for their databases. Now instead of having one large DB, we’ve got several 60GB DBs. The frontend has to figure out where the data it’s looking for is being stored and then go query the right database. I have this vision in my head of some C# guy writing this app and having never picked up a SQL book of any kind, decided that his app was somehow special and instantly crossed this magic threshold and suddenly a database engine that can handle many TBs of live data everywhere else, suddenly breaks down in deference to his genius. Now I haven’t looked at the schema, but I’d be willing to bet that the problem is something glaring in either his queries, his indexes, or his design. I can almost guarantee it’s a simple beginner mistake.

Related:

Copyright © 2010 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022