What's The Cost of Data Locality : @VMblog

Article

Search:

Follow VMblog.com:

Improve end user experience in VDI, DaaS and physical endpoint environments

What's The Cost of Data Locality

In evaluating modern virtualization and storage solutions, you've likely encountered the term data locality. Vendors highlight it as a key feature, promising improved performance. But what is data locality, why do vendors use it, and more importantly, what hidden costs and complexities does it bring?

Understanding Data Locality

It is used in converged infrastructures, such as HCI, to leverage the fact that storage and virtual machines reside within the same server, rather than being separate, as in classic three-tier architectures. Data locality refers to storing a virtual machine's primary data on the same node where the VM is actively running. The idea is simple and initially intuitive: If your data is physically close to your VM, it reduces latency and boosts performance because data doesn't need to traverse the network.

Historically, data locality made sense in environments with slow networking infrastructures and costly, performance-limited storage media, such as spinning hard disk drives (HDDs). Under these circumstances, minimizing network hops and latency resulted in a notable performance improvement.

Why Do Vendors Use Data Locality?

Today, vendors promote data locality for hyperconverged infrastructure (HCI) solutions. The premise remains the same-reducing data retrieval times and network congestion. Vendors advocating this strategy argue that it provides better VM performance by having primary storage physically close to compute resources, theoretically minimizing latency and enhancing responsiveness.

However, today's data center is drastically different from when data locality emerged. NVMe flash drives, RAM, and high bandwidth 10GbE (and beyond) networking are fast, plentiful, and affordable, reducing the advantages of data locality.

The Hidden Costs and Problems of Data Locality

While its benefits are debatable, data locality introduces several critical complexities and hidden costs that can negatively impact infrastructure performance and cost-efficiency:

1. Performance Degradation During Node Failures

Data locality assumes the VM's data is predominantly local. When a node fails and VMs migrate to other nodes, their data is suddenly remote, dramatically increasing latency and temporarily reducing performance. Even after recovery, migrating data back to the original node generates additional network traffic, prolonging degraded performance. Vendors dependent on data locality are less likely to invest in specific networking protocols designed for inter-node communications and tend not to have the internal network performance of vendors that do.

2. Increased Network Congestion

Ironically, the data locality strategy, initially designed to reduce network usage, can amplify network congestion during failures or migrations. Continuous background processes moving data to re-establish locality generate sustained network traffic, impacting infrastructure performance.

3. Resource Inefficiency and Complexity

To maintain data locality, the infrastructure must continually track and balance data placement to ensure optimal performance. This overhead consumes CPU cycles, memory, and network resources, adding complexity and potentially degrading the performance of virtual machines (VMs).

4. Storage Overhead

Maintaining data locality leads to temporarily increased storage demands due to data replication, duplication, and ongoing rebalancing tasks. Over time, these inefficiencies lead to increased hardware expenditures and operational complexity.

5. Complexity in Scalability and Management

Data locality adds management complexity as infrastructure scales. The continual need to manage data placement, monitor locality status, and address performance bottlenecks complicates operations, increasing administrative overhead.

VergeOS: A Modern Architecture Without Data Locality

Modern solutions address these challenges by eliminating data locality, opting instead for high-performance storage networking protocols explicitly designed for internode communications. These protocols leverage automatic, active-active port utilization to ensure optimal data transfer performance and availability across nodes.

Rather than migrating data, these architectures distribute data evenly across all nodes, ensuring consistent performance regardless of VM location. This strategy reduces the complexity, network congestion, and overhead traditionally associated with data locality.

Additional features commonly found in these solutions, including global inline deduplication, intelligent caching, and native NVMe storage support, further optimize performance by minimizing latency and maximizing throughput, without added management complexity.

Because these architectures don't rely on data locality, VM mobility is simplified, improving day-to-day operational efficiency and responsiveness during hardware issues, such as drive or node failures. Administrators can confidently migrate virtual machines (VMs) without concerns about data rebalancing or performance degradation.

For example, solutions like VergeOS demonstrate how removing reliance on data locality can streamline operations, simplify scalability, and provide a stable and predictable infrastructure environment.

Conclusion: Reconsidering Data Locality

While data locality once provided substantial performance benefits in older storage and network environments, technological advancements now render its advantages minimal or counterproductive. Modern solutions, such as VergeOS, that leverage high-performance networking, advanced storage media, and intelligent data distribution deliver superior performance without the hidden costs and complexity associated with data locality.

Data locality is one aspect to consider when evaluating the storage capabilities of a VMware alternative. Register to attend VergeIO's upcoming webinar, "Comparing HCI Architectures, " to discover how HCI solutions compare with traditional three-tier solutions and a modern Ultraconverge Infrastructure.

Published Monday, May 12, 2025 4:13 PM by David Marshall