Data Center Capacity Planning - Balancing IT, communications, and facilities management capacity planning
April 30, 2019
Data Center Capacity Planning Defined - What are we talking about?
"Data center capacity planning is the establishment of a strategy that ensures an IT organization's computing resources, power load, footprint and cooling capacity will be able to meet the workload demands of its users and customers.”
Data center capacity planning not only helps to achieve efficient use of the physical infrastructure but also helps in pointing out potential issues predicting failures improving efficiency, and ultimately providing quality business service. Planning capacities is an ongoing process because the influx of business-critical services is ever increasing.
Data center managers have to make sure their data center strategy stays ahead of organization expansion needs as well as watching out for sudden peak requirements that have the potential to overwhelm current systems. The way to achieve that is via data center capacity planning.
When organizations lose sight of what is happening or what might happen in their environment, performance problems and capacity shortfalls can arise, which can result in the loss of revenue, reduced productivity, and an unacceptable customer experience.
Data center capacity planning also helps to provide accurate predictions of future workload resource requirements to meet business objectives and help to balance IT risk and health.
Data Center Capacity Planning Disciplines
There are six primary data center capacity planning disciplines. The purpose of each discipline is briefly described below.
- Network: Measure of internet and wireless connectivity, point connection speeds, number of physical connections available, and available IP addresses.
- Power: Measure of current carrying capacity of the conductor, trip settings / arc fault, redundancy, upgradability, and available utililties.
- Cooling: Measure of cooling capacity of the units, ability to move air, and ability to create pressure, redundancy, and upgradability.
- Computing: Calculating program runtimes, conducting analytical profiling, analyzing process delay, categorizing acceptable behavior, and identification of bottlenecks.
- Data Storage: Measurement of total storage capacity of the system, total storage capacity of the process (specific system requirements), internal server storage, and network storage (i.e. the cloud).
- Space: Measurement of available space, current space utilization, optional build-out space, and future layout space.
Objectives / Phases of Data Center Capacity Planning
Like any other project, IT or otherwise, there are some basic objectives or phases as part of the planning and implementation.
- Identify a capacity planning champion - Who is responsible?
- Get buy-in from the business - Highlight benefits of capacity planning and management
- Determine your requirements - Establish formal performance and availability requirements
- Analyze your resources - What is your current infrastructure and capacity situation?
- Plan for future needs - Forecast business demands and future capacity requirements
- Influence your audience - Engage those both within IT and within the business at large.
8 Steps for Effective Data Center Capacity Planning
Successful and efficient capacity planning can be achieved by following these steps with the help of DCIM software.
Step 1: Determine all of the components required for a new piece of equipment to be provisioned. Capacity planning requires data centers to consider utilization and existing capacity of rack space, rack power, UPS power, upstream breaker or panel power, cooling, fiber or data port connectivity at the rack, patch panels, and switches.
Step 2: Determine the current usage level of the required components and whether each one of them is fully utilized or has additional capacity. Look to leverage stranded capacity for space, power, and networking.
Step 3: Search for the list of capacity requirements. Ensure that you're able to meet them as well as have available resources in case of a failover situation.
Step 4: Create a plan for provisioning the new equipment. Share this plan with the IT, network, facilities, and other teams to gain alignment and ensure that your plan works for everyone involved.
Step 5: Use data to ensure that your capacity planning is accurate. For example, use what-if analysis to determine the net impact of additions and decommissions on your data center.
Step 6: Make reservations for all the capacities required to provision the new equipment. Use DCIM software to automatically reserve connections for each device planned for installation.
Step 7: Issue work orders to physically provision the new equipment. Creating work orders in your DCIM software saves you time and reduces manual error by pulling data for the work order from the asset's details. You can also add multiple requests to a work order to reduce truck roll time, and then check the status for each work order from creation to completion.
Step 8: Audit and accept the work, thereby updating the production DCIM database to reflect the new state of capacity utilization.
Data Center Capacity Planning Methodology
- Calculate the Current Capacity: The capacity of the system (at it’s weakest point) that exists within the critical space for the attribute being managed.
- Calculate the Current Loading: The load that exists within the critical space for the attribute being managed.
- Account for Redundancy: Redundancy is a multiplier that mast be added to account for the design capabilities of the system. This can be changed, but must be understood and reported correctly. Examples include N, N+1, N+2, and 2N.
- Account for Design Conditions: Refers to terminology such as “fault tolerant”, or “Concurrently maintainable.”
- Forecast Future Current Loading: An attempt to change before the capacity reaches a limit. This is a balancing act between budget and requirements – efficiency counts – making a change too early may cost large dollars in efficiency loss.
Resource Assessment: Server Rack Optimization
An assessment of all equipment is a necessity. The assessment should include the following:
- Server name, make/model, OS, speeds
- Memory/disk space
- Total number of CPUs
- CPU queue
- Percentage of the CPU and memory used
- Memory paging
- Disk I/Os
- Network bandwidth and data rates
- Electrical power consumption
- Cooling power generated
Major site considerations should include the following:
- Space including logistical footprint (i.e., inventories, spares, etc.)
- Overall bandwidth
- Budget and funding
- Cooling with redundancies
- Regulations and standards
- Tools (test equipment, software, etc.)
- Misc: security, safety, alarms, fault tolerance, natural risks due to location, key infrastructure suppliers and contract support…
Additional considerations include:
- Outsourcing options including rentals, leasing, and SLAs with additional data centers
- Timeliness of contingent resources and additional equipment
- Redundancy of space, electrical power, and cooling solutions
- Virtualization – beware of the zombie servers
- Open architectures (i.e., Open19, SDDC, etc.)
- Modeling and simulation (M&S) tools for networking monitoring (i.e., OPNET, EXata, etc.) for both simulation and emulation with “live” data capture
- Electrical power monitoring software with predictive analytics and load balancing (i.e., ETAP, MetaSys, etc.)
- Building information modeling (BIM) for space, layout, piping, electrical, networks, etc. in 3D graphics
- Machine learning software tools (i.e., SAS, R, S, TensorFlow, scikit-learn, Torch, etc. )
Successful Outcome - a Data Center Capacity Plan
Data center operations require a careful balance of resources, personnel, and planning.
Assessing the data center resources is very important to ascertain current capacity and capabilities to meet the organizational mission, customer demands, and as a starting point towards defining the data center capacity plan.
The data center capacity plan may include the following:
- Data Center Infrastructure Management (DCIM) plan
- IT Service Management (ITSM) Framework (ITIL?)
- Definition of the mission and digital constructs, requirements, metrics, and key performance indicators (KPIs)
Components of the data center capacity plan include:
- Identify key stakeholders
- Plan ownership – assign overall ownership and include owners for sub-sections or key areas
- Define each metric – must be obtainable, measureable and can stand alone
- Accurate assessment of current infrastructure regarding performance
- Compare the existing maximum capacity against actual utilization
- Identify workload forecasts
- Perform a cross-walk of requirements with utilization
- Provide historical and future scenarios
- Forecast linear/non-linear projections of utilization against capacity and contingencies*
*Should not be a single event or based on a snap shot in time – instead, it should be a living document that adapts and becomes further defined throughout the continuum.
In addition to linear customer demands, the overall capacity must be capable of exceeding the surge requirements and, with timeliness, alternative resources must be brought online to stay ahead of any potential outages.
Business capacity, service capacity and component and resource capacity must meet current and future business requirements in a cost-effective manner.
Capacity Planning Requires Reliable Answers
Planners will need to know the answers to questions such as these:
- What will happen to the response time if the transaction rate for this application doubles?
- Can I migrate this workload from a legacy platform to our new standard platform without hurting performance?
- Do we need to be concerned about available capacity if I add a new tenant to a shared service next month?
Data Collection Impacts on Capacity Planning
While capacity planning has always been important, its star has risen in the era of virtualization, cloud computing, BYOD, mobility and Big Data. To cope with this, Gartner analyst Will Cappelli says capacity planning needs to be supported by predictive analytics technology.
“Infrastructures are much more modular, distributed and dynamic."
“It is virtually impossible to use traditional capacity planning to effectively ensure that the right resources are available at the right time.”
This entails being able to crunch vast amounts of data points, inputs and metrics in order to analyze them, quantify the probabilities of various events and predict the likelihood that certain occurrences will happen in the future.
Therefore, data center managers are advised to lean toward capacity planning tools that enable them to conduct that analysis in such a way that they can run a variety of “what if” scenarios. This allows them to determine their precise requirements, thereby reducing both cost and risk.
The challenge for organizations is to understand how they can slice and dice all of the data coursing through the data center and the organization. By compartmentalizing all this data into actionable information, capacity planners can share this in the form of a dashboard with metrics that the business can understand and use to make strategic business decisions.
The typical data center server operates at 12 percent to 18 percent of capacity.
“The standard method for adding capacity is to use resource utilization thresholds as triggers to purchase more hardware, but this results in excess hardware purchases as it does not factor in the requirements of the workloads (the applications) running on the infrastructure,” says Harzog. “The trick is to be able to drive up utilization without risking application response time and throughput issues.”
"Manual" Data Collection - a Thing of the Past
Traditionally, data center management was characterized by Excel and Visio files, strictly focused on infrastructure in the on-premises data center. The top priority was keeping an eye on servers, storage, and network bandwidth to ensure resources were properly utilized.
IT companies and businesses built data center capacities to last for the next 10 years. However, this did not prove to be a cost-effective solution. Overprovisioning introduced idle assets, waste, and increased costs. The hardware installed in these deployments could be difficult to replace due to limited space, power, and networking capacity.
Energy was not managed efficiently. Most importantly, it was difficult for data center managers to quickly provision IT equipment to reduce costs and respond to demand with agility.
It is time to move from data collection to data forecasting. Many data center managers move to software to streamline forecasting, automate the process and heighten accuracy. This makes it possible for forecasts and reports to be made available and updated weekly and daily if necessary. That enables the data center to move out of reactive mode, understand changes as they happen and take action to ensure its systems are not overwhelmed.
Capacity forecast inputs are combined with a variety of business metrics and data gathered from a collection of Java tools. This can then be translated into projections for CPU and business growth, dollar cost per server, forecasts relevant to different lines of business and executives, and even ways to check the accuracy of earlier forecasts.
The point here is not to try to predict the future based on one or two metrics. Instead extract a wide range of parameters from a variety of sources:
- Database information such as server configuration (current and historical),
- Resources consumed (CPU, memory, storage)
- Business transactions (via user agents).
- Specific to its UNIX AIX environment, metrics like rPerf (relative performance) can help the data center to understand whether it needs to add or remove CPUs to improve performance.
Data Forecasting Tips
Base data center capacity forecasts on both cyclical growth as well as linear projections. Example: Calculate annual growth but apply a cyclical pattern to that forecast based on monthly usage. This approach to data center strategy accounts for potential leaps in demand due to seasonal peaks, or campaign launches.
A linear projection, for example, may show that a purchase should be made in June, but cyclical data highlights where surges in business usage may occur. This allows the data center to defer capital expenditures or speed up purchases based on actual business needs instead of just projecting usage forward as an orderly progression.
By implementing capacity planning in this way, data centers can dramatically reduce resource time commitments; automate the forecasting process, and implement daily/weekly reporting.
Develop a standardized forecasting strategy to conduct historical forecast tracking and to identify areas of improvement.
Data Forecasting Challenges
Watch out for exceptions that can trip up forecasting when working on data center strategy.
- Historical data being incomplete or non-existent for a new server. Can result in a fairly new server being forecast as having 300% growth.
- Need to remove big data – watch out for baseline jumps such as shifts in resource consumption without changes in growth rates.
- Understanding how the business is currently driving the resources being consumed in the data center.
- Understanding how business market shifts might overhaul internal resource requirements.
Capacity Planning Emerging Trends
The affordability of hardware means that more organizations have access to faster, smarter devices for use in their data centers. These devices are housing mission-critical data and applications that end users need to drive business innovation.
Data center managers need to change the way they manage resources in order to quickly meet demand for both physical and virtual assets.
Emerging trends, such as Data Center Infrastructure Management (DCIM) software and Software Defined Data Centers (SDDC), have taken center stage. These technologies help data center managers better prepare for and provision the incoming volume and velocity of server, storage, and network equipment. It also facilitates the management of the physical infrastructure needed to support new equipment through data center capacity planning.
In the past, IT companies and businesses built data center capacities to last for the next 10 years. However, this did not prove to be a cost-effective solution. Overprovisioning introduced idle assets, waste, and increased costs. The hardware installed in these deployments could be difficult to replace due to limited space, power, and networking capacity. Energy was not managed efficiently. Most importantly, it was difficult for data center managers to quickly provision IT equipment to reduce costs and respond to demand with agility.
Adoption of virtualization eliminated the fixed tie between an application and the infrastructure. The capacity planning process now had to take things like workload placement and resource pools into consideration.
More recently, organizations are moving workloads into the cloud. The elasticity of IaaS means you can worry less about capital expenditures and provisioning lead times, but you still need to plan ahead and optimize your allocation strategy. Capacity in the cloud may seem infinite, but the cost for allocations needs to fit your budget.
Capacity Planning Methods - Examples
A simple way to do capacity planning is to set performance or capacity thresholds. Once those thresholds are hit, actions need to be taken.
An example of this would be setting a threshold that all systems with a CPU utilization above 50 percent should be upgraded.
Using capacity thresholds is a good method to use if you only need basic capacity planning. The thresholds will help you mitigate issues as they arise.
But you can't take proactive measures, like provisioning new resources, based on capacity thresholds alone. That's because provisioning new resources usually involves a lead time. And you would need to account for potential continued growth during that lead time when defining your thresholds.
Plus, using capacity thresholds is often prone to errors. And capacity thresholds often lack the precision of other capacity planning methods. Setting accurate thresholds requires insight about individual workloads. If you're uncertain, you'll add safety margins—which leads to inefficiency and a slack capacity.
Linear trending involves looking at historical data over time and creating a trend line to predict future needs. It's a commonly used method that is simple to implement and use.
Linear trending is a useful capacity planning method for workloads that increase at a steady rate. Your historical data will help you create a trend line and make sure you have enough capacity down the line.
The major limitation with linear trending is it assumes that workloads increase at a steady rate, something that is not always the case.
Many times you may want to consolidate workloads or plan for events or new conditions that are not represented in the trail of historical data. In such cases, trending provides very little guidance and the whole process is reduced to guesswork.
Linear trending also falsely assumes that system performance is linear. The reality is that once you hit a bottleneck, the performance of the application drops exponentially. A capacity planning tool needs to assess more than just past performance to make accurate predictions about the future. Having the ability to determine when bottlenecks will occur is crucial.
Capacity modeling helps you understand system behavior and predict if the current resources will be sufficient in different scenarios.
This capacity planning method requires analytical tools and data from monitoring solutions. Algorithms like queuing theory are typically built into capacity modeling tools. These formulas help you calculate processing times and delays, so you can predict the behavior of a system under varying loads.
The best part about capacity modeling is that you can do it without physically testing it on your infrastructure.
Here are some questions you can use capacity modeling to answer:
- How many VMs running this workload can safely run on each physical server?
- Which of my applications are in danger of failing to meet service levels within the next six months?
- Where will my future bottlenecks be?
A key differentiator of capacity modeling is that it can identify risk and prescribe solutions to the problem. Capacity modeling also has some distinct advantages over linear trending and performance thresholds.
Capacity Modeling vs. Linear Trending
It gives you the ability to define your own what-if scenarios. You're not bound to what is reflected in the historical data.
Capacity Modeling vs. Capacity Thresholds
Capacity modeling is also workload-agnostic. It automatically adapts to changing circumstances. That means you don't have to define custom thresholds based on the specific characteristics of each application.
There are at least two modeling methods used by capacity planning software to predict performance: simulation modeling and analytic modeling.
A good simulation modeling tool will create a queuing network model based on the system being modeled and simulate running the incoming workloads on that network model.
These simulations can be highly accurate, but a lot of work is needed to adequately describe the systems with enough detail to produce dependable results.
It makes plenty of sense to use flexible simulation models to plan for those “what-if” scenarios.
For example, you might use it to determine how long a proposed investment in CPU infrastructure, i.e. headroom, will last so that you can construct a business case for management.
This is still the preferred method for networks, but it’s so resource-intensive that it’s practically impossible to use as a capacity planning tool for servers.
For your most critical capacity planning needs, you’ll need something that utilizes queuing theory.
While analytic modeling also takes queuing into account, it doesn’t simulate the incoming workloads on the model.
In a good analytic modeling tool, formulas based on queuing theory are used to mathematically calculate processing times and delays (throughputs and response times).
This type of modeling is much quicker—and not nearly as tedious to set up. The results can be just as accurate as simulation modeling results.
It’s important to pick the right data (e.g. peak online transactions) to model and ensure that it represents the appropriate situations.
When the process of selecting and contextualizing data isn’t automated, you must rely heavily on the skills of the analyst doing the work and run the risk of making mistakes.
Without automation, it becomes extremely easy to miss important data and get inaccurate projections of future needs.
Capacity Management Tools - Factors to Consider
No matter what your organization needs most, there are six major factors to consider when choosing a capacity management tool, and the right tool should account for all of them. The priority of each factor will depend on your business needs. A (+) indicates a positive score. And a (-) indicates a negative score.
1. Time It Takes to Use
This is fairly self-explanatory—but nevertheless important. After all, it doesn’t matter how accurate your prediction for tomorrow is if it will take all of today to make. A performance monitoring tool provides current and historical reporting but doesn’t allow you to make future predictions. So it doesn’t even qualify for this factor.
Simulation modeling takes time and is resource-intensive. So, it should be used strategically in the capacity management process.
The other three types—trending, workload stacking, analytical modeling—are quicker to use. So, if the time it takes to use matters most to your organization, steer clear of performance monitoring and simulation modeling.
2. Capacity Planning Accuracy
Capacity planning means nothing if it isn’t accurate. Trending tools give imprecise forecasts as systems do not perform linearly. Capacity planners typically use trending tools only for conservative estimates to avoid bottlenecks and downtime based on KPI thresholds. But that results in overprovisioning and overspending—which leads to a bloated, inefficient IT budget.
Workload stacking tools tend to lack accuracy as well. And performance monitoring tools, as mentioned above, don't provide capacity planning.
The other two types—simulation modeling and analytical modeling provide accurate results. So, either of these two would make a smart choice if capacity planning accuracy matters to you.
Even the most skilled IT professionals can make mistakes. This is especially true when it comes to manual data entry. Automation minimizes the risk. Tools that excel in automated capacity management automatically run the numbers. That way, IT professionals can focus on the work they do best—analyzing complex sets of data and converting data into actionable insights.
Performance monitoring tools can’t do capacity planning—much less automated capacity management. Simulation modeling tools tend to be pretty manual. And so do most analytic modeling tools.
If you’re looking for a tool that can do automated capacity management, your best ways forward are trending, workload stacking, or analytical modeling.
As IT infrastructure gets larger and more complicated, scalability is of the utmost importance. A tool that is limited to hundreds of servers at a time just won’t cut it for large companies.
A scalable tool needs to be able to monitor and make predictions about thousands of servers at once. Simulation modeling can only handle tens of servers at a time. If you’re at a large organization, those tools just won’t do.
Performance monitoring, trending, workload stacking, and analytical modeling tools offering automation will be much more effective for large organizations.
Even if you get an accurate prediction of future workload demands, it can be difficult to know how to prepare for them.
Performance monitoring and trending tools only give you raw data. It’s up to you to interpret it. So, if you’re looking for fast answers or want to avoid relying on individuals with very specific expertise, it isn’t a good idea to place your trust in these methods.
Workload stacking tools might help you identify servers that are underutilized as candidates for consolidation. But these tools use imprecise methods. So you don’t get definitive answers about workload distribution. It’s better than performance monitoring or trending tools at giving answers. It’s just not your best option.
Tools like simulation modeling and analytical modeling will give you real, easy-to-understand answers. These tools are the best way to solve potential problems in the near and distant future.
6. Simplified Comprehensive Reporting
Easy yet thorough capacity reporting is a must if you need to communicate to business leaders. Unfortunately, four out of five types of capacity tools fail to provide this.
- Only analytical modeling gives you the holistic view of your entire infrastructure that:
- Makes accurate predictions
- Finds when a risk will occur
- Identifies the resource constraint
- Offers valuable solutions
- Communicates IT metrics in business terms
If this factor matters most, you ought to choose an automated analytic modeling tool.
Changing Landscape - Planning for the Future
While capacity planning has long been important for minimizing the risk of downtime and maximizing uptime, reliability, and efficiency, the best ways to optimize service delivery have changed over time.
Regardless of technology, you need to know about future business demands and how they will impact capacity requirements over time. Only then can you optimize your sourcing strategy and cost.
- Paradigm shifts are rarely binary nor do they happen overnight.
- Services running on older technologies that still need to be managed.
- Your capacity planning process has to cover all technologies used by your organization.
Learn more by viewing our Webinar on Data Center Capacity Planning