In evaluating modern virtualization and storage solutions,
you've likely encountered the term data
locality. Vendors highlight it as a key feature, promising improved
performance. But what is data locality, why do vendors use it, and more
importantly, what hidden costs and complexities does it bring?
Understanding Data Locality
It is used in converged infrastructures, such as HCI, to
leverage the fact that storage and virtual machines reside within the same
server, rather than being separate, as in classic three-tier architectures.
Data locality refers to storing a virtual machine's primary data on the same
node where the VM is actively running. The idea is simple and initially
intuitive: If your data is physically close to your VM, it reduces latency and
boosts performance because data doesn't need to traverse the network.
Historically, data locality made sense in environments with
slow networking infrastructures and costly, performance-limited storage media,
such as spinning hard disk drives (HDDs). Under these circumstances, minimizing
network hops and latency resulted in a notable performance improvement.
Why Do Vendors Use Data Locality?
Today, vendors promote data locality for hyperconverged
infrastructure (HCI) solutions. The premise remains the same-reducing data
retrieval times and network congestion. Vendors advocating this strategy argue
that it provides better VM performance by having primary storage physically
close to compute resources, theoretically minimizing latency and enhancing
responsiveness.
However, today's data center is drastically different from
when data locality emerged. NVMe flash drives, RAM, and high bandwidth 10GbE
(and beyond) networking are fast, plentiful, and affordable, reducing the
advantages of data locality.
The Hidden Costs and Problems of Data
Locality
While its benefits are debatable, data locality introduces
several critical complexities and hidden costs that can negatively impact
infrastructure performance and cost-efficiency:
1. Performance Degradation During Node
Failures
Data locality assumes the VM's data is predominantly local.
When a node fails and VMs migrate to other nodes, their data is suddenly
remote, dramatically increasing latency and temporarily reducing performance.
Even after recovery, migrating data back to the original node generates
additional network traffic, prolonging degraded performance. Vendors dependent
on data locality are less likely to invest in specific networking protocols
designed for inter-node communications and tend not to have the internal network
performance of vendors that do.
2. Increased Network Congestion
Ironically, the data locality strategy, initially designed
to reduce network usage, can amplify network congestion during failures or
migrations. Continuous background processes moving data to re-establish
locality generate sustained network traffic, impacting infrastructure
performance.
3. Resource Inefficiency and Complexity
To maintain data locality, the infrastructure must
continually track and balance data placement to ensure optimal performance.
This overhead consumes CPU cycles, memory, and network resources, adding
complexity and potentially degrading the performance of virtual machines (VMs).
4. Storage Overhead
Maintaining data locality leads to temporarily increased
storage demands due to data replication, duplication, and ongoing rebalancing
tasks. Over time, these inefficiencies lead to increased hardware expenditures
and operational complexity.
5. Complexity in Scalability and Management
Data locality adds management complexity as infrastructure
scales. The continual need to manage data placement, monitor locality status,
and address performance bottlenecks complicates operations, increasing
administrative overhead.
VergeOS: A Modern Architecture Without Data
Locality
Modern solutions address these challenges by eliminating
data locality, opting instead for high-performance storage networking protocols
explicitly designed for internode communications. These protocols leverage
automatic, active-active port utilization to ensure optimal data transfer
performance and availability across nodes.
Rather than migrating data, these architectures distribute
data evenly across all nodes, ensuring consistent performance regardless of VM
location. This strategy reduces the complexity, network congestion, and
overhead traditionally associated with data locality.
Additional features commonly found in these solutions,
including global inline deduplication, intelligent caching, and native NVMe
storage support, further optimize performance by minimizing latency and
maximizing throughput, without added management complexity.
Because these architectures don't rely on data locality, VM
mobility is simplified, improving day-to-day operational efficiency and
responsiveness during hardware issues, such as drive or node failures.
Administrators can confidently migrate virtual machines (VMs) without concerns
about data rebalancing or performance degradation.
For example, solutions like VergeOS demonstrate how removing
reliance on data locality can streamline operations, simplify scalability, and
provide a stable and predictable infrastructure environment.
Conclusion: Reconsidering Data Locality
While data locality once provided substantial performance
benefits in older storage and network environments, technological advancements
now render its advantages minimal or counterproductive. Modern solutions, such
as VergeOS, that leverage high-performance networking, advanced storage media,
and intelligent data distribution deliver superior performance without the
hidden costs and complexity associated with data locality.
Data locality is one aspect to consider when
evaluating the storage capabilities of a VMware alternative. Register to attend VergeIO's upcoming webinar, "Comparing HCI Architectures, " to discover how HCI
solutions compare with traditional three-tier solutions and a modern
Ultraconverge Infrastructure.