I am looking into the 'RPC wait' for node getData with a named node. I am trying to understand how this error message works, and the internals of what it means.
Update 10/13/25:
Well it's kinda opaque how this all works. It also seems like there are a couple possible reasons for why this could be happening. It was not clear to me if there is a way to tell, in the abstract, which one without looking at node logs.
It seems like the most likely reason is RDMA issues. Likely due to network issues where the communications are not getting through as they should causing 'RPC wait' issues.
If you are like me and are trying to learn GPFS (IBM Spectrum Scale) the best advice I can give is get yourself a copy and then pull it apart. Seriously you will spend hours reading documentation and you will miss a lot of the details just due to the density of the documentation. Having the documentation open while having the binaries and scripts so you can disassemble and read them–will make the learning process more efficient.
Also don't forget your notes. Take notes in your own words and try not to quote. With these three things:
Documentation
Binaries and Scripts to read and disassemble
Good notes
You at least have a fighting chance of understanding what is going on–even if you are going in blind.
You can actually try IBM Storage Scale for free–obviously there are limits, but you can certainly get a sense of how it all works. To that end I was pulling apart the installer the other day and noticed that I couldn't find mmdiag.
I was confused by this–until I unpacked the .deb file and looked at the post install script. As it turns out IBM makes liberal use of symlinks and mmdiag is just a symlink to tsdiag. I wanted to document that here–so I don't forget and in case anyone else finds themselves needlessly puzzled. I also posted a list of the symlinks I found in the GPFS Base .deb file here.
I am currently trying to understand the various diagnostic tools built into GPFS, and I wanted to look at how things work internally.
Uses both nodes and tiebreaker drives to achieve quorum even if there is only one surviving quorum node. Each of the quorum nodes needs to have access to all tiebreaker drives. You can have up to three tiebreaker drives.
Majority Quorum
Uses an odd number of quorum nodes to achieve quorum. Is the only viable option if you want to have distributed quorum. For example: if you have three GPFS clusters you may want to distribute your quorum over the three clusters. In this case you can't have minority quorum because there is no way to have each quorum node access each tiebreaker drive without going through another cluster's nodes.
Just documenting the summary for quorum types for future reference, and to put it in my own terms for comprehension and retention.
My current area of investigation is General Parallel File System (GPFS). Up until now I have had almost no exposure to this sort of filesystem beyond being aware that they exist. In my previous role we didn't use GPFS and had a single level file system with archive.
I will post more about GPFS as I dig into it, but for now it suffices to say that GPFS seems excessively fragile and I can understand why people say troubleshooting it is as much art as it is science.
HPE is to sell and support IBM’s venerable Spectrum Scale parallel file system for high performance computing (HPC) and AI workloads running on its ProLiant and Apollo servers,. The company has added IBM Spectrum Scale to its HPE Parallel File System Storage offering, which is part of its HPC portfolio, and positioned under its ClusterStor […]
HPE vendera GPFS para contrarrestar competencia en HPC
HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems
by M S Nirmala" HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems"
Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018,
URL: http://www.ijtsrd.com/papers/ijtsrd18211.pdf
Direct URL: http://www.ijtsrd.com/engineering/electronics-and-communication-engineering/18211/hba-distributed-metadata-management-for-large-cluster-based-storage-systems/m-s-nirmala
paper publication for engineering, engineering journal, ugc approved journals for engineering
An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, whereas the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and implementation in Linux. Simulation results show our HBA design to be highly effective and efficient in improving the performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or super clusters) and with the amount of data in the peta byte scale or higher. Our implementation indicates that HBA can reduce the metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when the system is configured with 16 Meta data servers.