GraphChi:Large-ScaleGraphComputationonJustaPC
Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts. Inthiswork,wepresentGraphChi,adisk-basedsystem for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into smallparts,andanovelparallelslidingwindowsmethod, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs thatevolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updatespersecond,whilesimultaneouslyperformingcomputation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.
1