Thursday, April 12, 2018

Architecture

When you're the DBA in a shop undergoing vigorous software development, one of the knottier challenges is how to specify which production SQL instances to order, as well as how to allocate your databases across those instances. The decisions in this process fall under the general rubric of "data architecture". Several design aspects interplay to pull your decision-making into conflicting directions. The big four are:

- cost
- execution efficiency
- ease of ongoing instance management
- security

You won't be able to provide a clearcut recommendation nor is there a process that magically defines a "best" answer. Also the parameters change over time, as new projects create new databases, and advances in hardware change the optimal choices. You need to weigh the pros and cons of all the options and come up with a "gut feel" configuration that appears to provide the best balanced approach.

Let's delve into each of the important factors to arrive at a happy medium. This process plays out similar to a game of ping-pong, where you smack the instance counts back and forth between high and low until you settle into a reasonable compromise.

1) Cost: Fairly early on you need to plan for the full operational workload, with the features of the database engine that you will require. "Enterprise" versions of SQL Server, although considerably more expensive, often provide the crucial functionality that a large implementation will require, such as data compression, advanced encryption, and a more thorough utilization of modern hardware and cloud-based blended resources.

Some of your databases may not require this complete functionality however: many vendor-supplied packages posses less severe operational requirements and work fine in a "sandboxed" engine. Often you can host "reporting" databases on simpler engines as well.

For future flexibility it's helpful to have a minimum of three instances, one of which is Enterprise. Normally you would purchase more instances depending upon the additional factors discussed below. So we've started by ping-ponging the low end to three. Yes you could throw hundreds of databases on each instance, but this certainly isn't optimal.

2) Execution efficiency: The multiple databases that reside on an instance naturally compete for resources: CPU threads, IO throughput, disk space, and tempDB all experience a shared impact that can lead to resource contention. Each individual database would prefer to have the instance all to itself, but if you follow this route it quickly becomes prohibitively expensive.

Besides, outside of small vendor supplied systems, most software needs to facilitate reads, updates, and data movement across various company databases; having databases with heavy non-batch cross references on the same server provides faster operations as it reduces the network traffic.

Therefore to establish a top bound on the number of instances, group your databases by non-batched cross reference activity. You can also group databases onto an instance by other similar attributes, such as common departmental access, or parallel functionality (eMail related databases clump together, reporting databases clump together, etc). Avoid putting large databases on the same server though if you anticipate it would cause high contention of CPU threads, IO throughput, disk space, or tempDB.

3) Ease of ongoing instance management: calculating the last section likely gave you a dozen or so instances (or more). Now it's time to think about the operational overhead of maintaining all those instances. For one thing, you will need to occasionally down an instance to apply a service pack or patch rollup. You also will build and maintain the automated backup and index optimization plans on each server. Some time in the future you may be even consider upgrading each engine to a higher release version. Once in a while you want to review the SQL error logs on each box.

Clearly then "ease of management" pushes you toward a smaller, further consolidated set of instances. Some helpful guides to further shrink your instances include whether workloads are similar (reporting, batch or OLTP), and if large jobs that you anticipate might be running can be scheduled to avoid stepping upon one another (and avoid stepping upon your scheduled maintenance plans).

4) Security: Now that you've narrowed down and consolidated your instances as much as possible, one final slight expansion might be appropriate to accommodate specialized security considerations. For example, most businesses are especially sensitive about protecting their financial data from prying eyes, so it is often a sensible choice to host the finance ERP on its own server.

In some settings the customer data may warrant exceptional auditing, protection, or encryption safeguards, so it's not unusual for that to be segmented off to the side as well. Many times externally supported vendor-supplied systems may also best be positioned on their own server, not only to more clearly separate update and maintenance responsibilities but also to delineate a security environment for the VARs with offsite access.

It may feel vaguely unsatisfying to heuristically arrive at a solution to such an important decision. Yet after you've ordered, setup, configured instances and then allocated the databases you have a brief window of a year or so where you still have the flexibility to shift things around (before the subsequent effort to reconfigure becomes too overwhelming). Once lots of new production software gets implemented with connection strings set to various locations, the architecture gets fairly tightly "baked in".

No comments:

Post a Comment