The Covalent Network Data Pipeline

A lot happens between blockchain data being extracted and indexed to a developer querying the unified Covalent API. Within each of the three key processes within the Covalent Network, Network Operators are present whose responsibility varies depending on the role.

Extraction and Export

The Covalent Network is built upon a foundation of historically complete and accurate replicas of the source blockchain data, including Ethereum, various Layer 2 chains, Cosmos Zones, Subnets, and more.

Pivotal to this are Block Specimen Producers (BSPs) (see Network Operators), who extract and export block specimens, a 1-to-1 secure representation of a block and its constituent elements. Together these form a canonical representation of a blockchain's full historical state. This allows the Covalent Network and data consumers to efficiently retrieve historical blockchain data with confidence in its accuracy.

Once created, Block Specimen arr uploaded to a storage instance. The BSP will then announce to the network, via a proof, that they uploaded the Block Specimen, which is just the hash of the Block Specimen, along with the output (IPFS) access URL where other operators can access it. The proof is published to the ProofChain Contract.

By comparing proofs, one can detect any deviations in the data, either accidentally or malicious.

Refinement

Taking the base data (block specimens at the time of writing), the refinement and storage layer performs validated data transformation according to blueprints broadcasted across the network.

At the center of this are Refiners (See Network Operators) who host a data processing framework. The Refiner locates a source to apply a transformational rule to and outputs an object generated from applying such a rule. The source as well as the output stored are available through a decentralized storage service such as IPFS. A transformation-proof transaction is emitted confirming that it has done this work along with the output (IPFS) access URL.

Furthermore, Refiners have the capability to perform arbitrary transformations over any binary file concurrently with other transformations. This enables simultaneous data indexing, with any consumer of the data slicing and dicing the data as they see fit. This significantly enhances the Covalent Network as the network benefits from the ability to run parallel re-executions on blocks. While other indexers can perform re-execution on blocks, they are doing so in a centralized system and lack the same concurrency achieved as in a distributed model.

Indexing and Query

Up to this point, all data, either from the base layer or the refinement stage, has been made available via distributed storage and announced on-chain.

Query Operators, the set of Network Operators responsible for responding to API queries over the network, observe these events. Running a local data warehouse, query operators pull data objects from storage that most interest them (based on API user demand), and apply any additional internal database indexing on top. With historical and real-time data available to them, query operators can return API queries so long as they have the appropriate data. For doing so successfully, they will be compensated in CQT.

Furthermore, to fetch data from the network, query operators will have to pay, in CQT, into a "network fund". How much they have to pay is proportionate to the amount of data they’ve fetched from the network. This fund pays out in turn to the production operators such as Block Specimen Producers and Refiners, as network rewards.

Decentralized Storage

Among each of the three key processes, data is either retrieved or fetched across the network. Network Operators that produce data such as BSPs and refined block objects will push outputs to a decentralized storage instance. They can run this storage instance locally and make it available publicly or make use of external storage options and pinners. A Network Operator who wishes to pull from a public storage repository can observe proofs and fetch them through IPFS using the access URL appended to proofs. Therefore, the storage of various activities is delegated to Network Operators rather than the network itself. This allows for loose coupling between the nodes as long as they upload their work to the decentralized storage layer and others can pull from the same for theirs.

Auditing Proofs (Validation)

As mentioned, when a Network Operator performs a piece of work on the Covalent Network, a production proof is created and published to the Covalent proof contract. A number of scenarios can arise during this. Lets take the production of a block specimen proof as an example:

  1. Every proof matches and thus every BSP has produced the same Block Specimen.
  2. Some proofs mismatch but there is a majority that match.
  3. There are no matching proofs.

To determine what scenario has transpired and who should be rewarded per epoch, a check is done off-chain (initially by Covalent). Critical to this check is the role of the auditor which is to examine an epoch of proofs, be it historic or present. Think of these as Validators. Rewards are not calculated or generated until the auditors approve or falsify a given quorum was attained by the independent distinct set of operators. To communicate this, the auditor(s) messages the Covalent proof contract.

Auditors are selected at random from a base pool of operators in which they play only the role of an auditor for that epoch. For every audit that passes they’re awarded the staking block rewards of the operator for which they successfully provide the malfeasance proof. They resubmit the proof for every block and at the end of every epoch, the operators that are found to have invalid proofs are slashed accordingly.

Note:

  • Until this function is developed, Covalent will be acting as the source of truth given that Covalent will be producing valid Block Specimens and publishing their respective proofs. Thus, proofs that BSPs publish can be compared against Covalent’s own. This also mitigates against the risk of collusion occurring between Operators.
  • Slashing will not be live initially on the Covalent Network. Rather, if there is an invalid proof, no reward will be distributed for that epoch.

In Sum

Block Specimens are the first data object created by Block Specimen Producers. Refiners take these and transform them into Block Results which will eventually evolve to support decoded contract states (Block Trace Results). While observing on-chain announcements made by both of these operators, Query operators load data into their local data warehouses from the storage instance used by BSPs and Refiners. Query Operators are then able to respond to API queries.

While the Covalent Network will only support Ethereum initially, multiple chains will be rolled out onto the Covalent Network gradually, most likely based on demand and community involvement. Covalent meanwhile will continue to provide data for the 104+ blockchains currently supported until the Covalent Network can manage demand.