What Is Distributed Computing?
Introduction
No device is an island: Your daily computational needs depend on more than just the microprocessors inside your computer or phone. Our modern world relies on “distributed computing,” which shares the computational load among multiple different machines. The technique passes data back and forth in an elaborate choreography of digital bits — a dance that has shaped the internet’s past, present and likely future.
In 1973, Xerox engineers invented the ethernet connection, which allowed the first personal computers to communicate with a shared printer. Ethernet gave rise to local area networks (LANs), which by the end of the 1970s let users share files within homes or offices.
Around that same time, the Advanced Research Projects Agency (ARPA) had been developing a more expansive distributed network. ARPANET, as it was called, could distribute information over phone lines, promising a much vaster network than a LAN. But it had limitations: The machines involved had to be compatible, and all connections were hardwired. Defense officials wanted tanks, planes and ships to communicate wirelessly, and scientists wanted to open the networking possibilities up to the masses. The challenge was to establish rules that standardize how any kind of machine talks to another one over any kind of connection.
In 1974, Vinton Cerf and Robert Kahn documented their proposal for a set of rules they called Transmission Control Protocol/Internet Protocol, or TCP/IP. These instructions allowed information to be broken into small “packets,” sent through a network, and reassembled at the destination.
To this day, all internet data transfers begin with TCP’s three-way handshake. The sender’s machine sends a “synchronization” packet to the receiver, as if to say “listen up.” The receiver responds with an acknowledgment, and the sender confirms with its own acknowledgment. The message then begins in earnest. The TCP part of the protocol splits the data to be sent into a sequence of packets, labeling them carefully so that they can be reassembled later. IP then dictates the best route for each packet to take. TCP protocols reassemble the information at the receiver’s machine and check it for accuracy. The process guarantees no out-of-order or missing packets.
In the early 1980s, physicists at the European particle physics laboratory CERN showed that distributed computing could be important for far more than just communication. Their computational problems were too complex for a single device to handle efficiently. Hundreds of CERN scientists would collaborate on a single experiment, and the giant detectors that they used created an enormous volume of raw data that had to be analyzed. Distributed computing allowed these researchers to efficiently share the computational load between giant mainframes and individual workstations, even if those machines came from different manufacturers and used different operating systems. This type of work paved the way for reliable intercontinental collaboration and, eventually, email.
Decades later, as global business moved to the internet, the ultimate distributed computing system was born: the cloud. Cloud servers are, basically, other people’s computers that you can use to store and process data for you. With these systems in place, tech companies no longer needed their own servers, they could just rent processing power that’s scattered across a distributed network. Cloud servers are a bit like taxis: You only use one when you need it, and it’s often more efficient than owning a car that sits unused 95% of the time.
Distributed computing also allows for better cryptographic systems. An encrypted key may be more secure when generated by a network of computers where no single device knows the entire secret code. This collaborative approach also prevents tampering with data on the blockchain, a technology that stores transactions redundantly in multiple nodes of a network.
These benefits — efficiency, security and reliability — suggest that our computers are likely to keep distributing their computations for the foreseeable future.