Despite all the progress we've made in crypto tooling the last couple of years, one difficult task remains unsolved: extracting data from blockchains. Blockchain data is messy, disorganized and time-consuming even for an experienced engineer. Today, we are excited to announce a software development kit (SDK) to query data from the Ethereum blockchain – a problem we've been trying to solve for many years. We've been dogfooding the SDK internally for the last few months and used it to instrument over 20 DeFi protocols supporting over a dozen customers and thousands of users.
If you're a builder in the space, you are spoilt for choice. Over the last few weeks, our friends in the community have released kits to build with DeFi money legos without writing a single line of solidity code. Some examples of these kits:
The above are excellent choices and are general purpose kits to not only read your data but also to modify your data. One of the many features they come with is a way to access structured data (balances and positions) on the blockchain. These kits run completely on the blockchain and therefore only rely on a Web3 node provider like Infura.
We've made a different bet.
Our team at Covalent from the start has been laser-focused on solving one and only one problem: reading structured data from the blockchain. We've ignored shiny objects. We will either succeed in solving this problem or die trying.
The Challenge with Querying Blockchain Data
What's the status quo to get blockchain data?
A blockchain node software like Geth in theory already has the blockchain data. It has a handy JSON-RPC layer to pull out data. But there are four problems:
- Expensive. If you want any kind of historical data, you need to run Geth in a special configuration mode known as "full archive mode" which currently takes up hundreds of gigabytes of storage space and has other special hardware requirements.
- Slow. The JSON-RPC interface is what's known as a "point query" interface, i.e., you can only ask for a single object (block, transaction, etc.) at a time. What you need is a way to batch export the data – which means rethinking the JSON-RPC layer.
- Incomplete data. The most interesting DeFi data is actually the data structures inside the contract state and not visible outside. It's currently too hard or simply impossible to reconstruct these data structures through the JSON-RPC layer.
- Too niche. Hundreds of query languages have come and gone over the years, but nothing has stood the test of time like plain old SQL. SQL has been around for 40 years and will be around for the next 40. It's the Lindy effect in action. In our opinion, a niche query interface is a mainstream adoption blocker.
This is why every Ethereum DeFi portfolio tracker falls short because they hit against one of the above problems. It's simple to get account balances through the JSON-RPC (you don't even need an archive node), but anything to do with historical data is very difficult.
At Covalent, our team has been chipping away at the problem for quite some time now leaving no stone unturned. Our internal benchmark is highly ambitious – 5 lines of code to pull out a piece of data. Any approach that requires a user to write more than 5 lines of code is considered burdensome and gets mercilessly chucked out. Constraints are important for innovation.
The best way to understand the SDK is to experience the benefits of an end-product resulting from building with the SDK. As a case-study, we'll build a Uniswap integration for liquidity providers that is a simple problem to articulate but hard to solve for without a tool like the Covalent SDK.
To many devs, this is a glimpse of the future.
Case study: Building a Uniswap connector for LP balances and positions
Note: The rest of this blog post is fairly technical, but you don't have to be a programmer to follow along. That's the beauty of our SDK – as long as you understand the data structures on the blockchain, you can query them quite simply using SQL.
Objective: Build a connector for Uniswap that can show a liquidity provider their balances and contributions (positions) to the liquidity pool. Extra credit to calculate the cost basis and annualized ROI of their contributions to the pool.
1. How Uniswap basically works
You can skip this section if you already know how Uniswap works.
Uniswap has two sets of users:
- traders who come to the protocol to avail of the liquidity in order to perform swaps, and
- liquidity providers (LP) who provide the liquidity and are compensated with 0.3% of every trade as a trading fee that goes through the exchange.
For the purposes of our case study – we focus on the liquidity provider. When an LP wants to participate in a pool, they make an initial contribution of ETH and the corresponding ERC20 token. They’ll have to make sure the ratio of their contributions reflect market prices. If the ratio is not accurate, it attracts arbitrageurs to extract easy profits. When an LP contributes to the pool, they receive a liquidity token that represents the pro-rata share of the pool.
With each trade, the trading fee is added to the liquidity pool. So the pool continues to grow with each trade. The growth rate of the pools grow in proportion to the number of trades and the size of each trade. At any point, the LP can withdraw their contribution and they are returned the ETH and the ERC20 token in proportion to the pro-rata share of the pool represented by the liquidity token.
That's basically how Uniswap works.
2. How to manually lookup Uniswap exchange balances on Etherscan
###Correction from Noah Zinsmeister
In V2, LPs are technically entitled to a share of the reserves of a pair, not whatever the pair's balances of tokens happens to be. The reason for this is technical and has to do with oracle security, see the whitepaper/audit report for more. The long and the short of it is that as an LP you're entitled to a share of reserves not balances. Reserves are public state variables as seen in the contracts.
Editor's note: Because Covalent persists the entire state, switching from balances to reserves is a simple variable change.
Figure 1: Total liquidity locked in the WETH-DAI Uniswap exchange.
Each exchange on Uniswap is deployed as a separate individual contract. For example, the DAI-WETH exchange is represented by the contract at the address 0xa478c2975ab1ea89e8196811f51a7b7ade33eb11
and its associated liquidity token. If you look on Etherscan, you can pull out all the relevant information as of block 10194612:
- Total liquidity (~ 2.496 million DAI + ~10,200 WETH)
- Total liquidity tokens "total supply" at 158,494.250 UNI-V2
Of course, when you check Etherscan it's going to be different because the liquidity changes with each trade and LPs adding/removing their positions.
Figure 2: Total liquidity tokens held by 0x49a2...1936.
Let's pick a particular LP with the address: 0x49a2dcc237a65cc1f412ed47e0594602f6141936
. This LP has 21,114 liquidity tokens out of a total of 158,494 tokens and their share of the pool rewards:
$$ \frac{21,114}{158,494} = 13.32% $$
This particular LP has accumulated their liquidity token with three investments as seen in the Etherscan screenshot below:
Figure 3: Total liquidity tokens accumulated by 0x49a2...1936.
3. Programmatically querying for Uniswap exchange balances
The lookups we performed on Etherscan above are also queries, but they are manual, tedious, and are of limited usefulness. We'll now use Covalent to programmatically query data on the blockchain so that you can re-construct all of the above information in a format that is much more useful.
Covalent's secret power is that it has indexed all of the data on the blockchain: every block, every transaction, every log event, and every state update. We've worked very hard over the last couple of years to scale this multi-billion row data store spanning terabytes updated with a latency of 30s (~2 blocks.) We'll share our systems architecture in a future post.
Query 1: Query for add/remove liquidity events and token balances
In Uniswap V2, Mint
and Burn
events are emitted whenever an LP adds or removes liquidity to/from an exchange. The full signature of these events are:
Mint (index_topic_1 address sender, uint256 amount0, uint256 amount1)
Burn (index_topic_1 address sender, uint256 amount0, uint256 amount1, index_topic_2 address to)
One missing piece of data from this log event is the actual number of liquidity tokens. You can probably triangulate the number of tokens from a Transfer
event, but you have to be careful to pick the right Transfer
event – which is becoming increasingly hard with all of the DeFi composability going on. Fluster not, we have also solved this problem elegantly.
Continuing on, our Mint
query simply looks like this:
-- lp_address = 0x49a2dcc237a65cc1f412ed47e0594602f6141936 -- exchange_address = 0xa478c2975ab1ea89e8196811f51a7b7ade33eb11 SELECT e.tx_hash FROM block_log_events e WHERE e.topics[1] = keccak('Mint(address,uint256,uint256)') AND e.topics[2] = :lp_address AND e.sender = :exchange_address
We'll have a similar query for Burn
events. These events together will give a list of transaction hashes that modify the LP's liquidity token balances. From there you can JOIN
against the table that contains token balances. Specifically, we will be joining against these three contracts:
- The
WETH
contract at0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2
- The
DAI
contract at0x6b175474e89094c44da98b954eedeac495271d0f
- The
WETH-DAI
contract at0xa478c2975ab1ea89e8196811f51a7b7ade33eb11
Now, back to getting the number of liquidity tokens minted/burned.
We've modified the Ethereum Virtual Machine (EVM) to capture state transitions which include changes to token balances amongst other data structures. Querying all token balance updates for the WETH-DAI
exchange contract is simply this:
-- contract_address = 0xa478c2975ab1ea89e8196811f51a7b7ade33eb11 -- lp_address = 0x49a2dcc237a65cc1f412ed47e0594602f6141936 SELECT data FROM trace_sstore_events tse WHERE tse.account = :contract_address AND tse.key_path = '"[balanceOf_slot, :lp_address]"'
The data
column is the new balance written into the smart contract with each transaction. We'll have to format the balance with the right number of contract decimals. You can't always assume that tokens have 18 decimals. Some popular contracts like USDC (6 decimals) and WBTC (8 decimals) are different and it's important to format them right.
Fortunately, we also have a table contract_currencies
with the metadata (token name, token ticker symbol, decimals) of all tokens on the Ethereum blockchain (currently about 200K ERC20 tokens.) Joining with this table will help us format the balance correctly.
At this point, we know how much DAI
and WETH
the exchange has in liquidity. To get the current value of the positions, we simply calculate the share:
$$ \frac{\text{LP's liquidity tokens}}{\text{Exchange's liquidity supply}} * (\text{WETH balance} + \text{DAI balance})$$
Query 2: Query historical token price
Now that we have a list of events that modify our balances, our ledger entries can be thought of something like this:
Timestamp | Event | Liquidity token balance | DAI balance | WETH balance |
---|---|---|---|---|
2020-05-19 18:00:54 | Add liquidity | +XX | -YY | -ZZ |
2020-05-19 22:23:03 | Add liquidity | +XX | -YY | -ZZ |
2020-05-21 19:19:33 | Add liquidity | +XX | -YY | -ZZ |
The next step is to JOIN
with market prices to get historical prices of WETH
(which is actually 1:1 ETH
) and DAI
. We've indexed about 40,000 markets in 12 fiat currencies – so you have localization out of the box.
SELECT price FROM prices p WHERE p.base = 'USD' AND p.ticker = 'WETH' AND p.date = :timestamp
Query 4: Extending to all Uniswap exchanges
We started with a hardcoded value of the WETH-DAI
exchange. What if we don't know what exchanges the user has invested in?
Luckily for us, we have access to the entire blockchain, so we've indexed all of the current as well as future exchanges. There's no need to hardcode the exchanges supported.
In the new Uniswap V2 factory contract, the PairCreated
event is emitted every time a new exchange is created. We can query for this event signature and also parse out what token pairs are involved in the exchange as well.
-- UNISWAP_FACTORY_V2 0x5c69bee701ef814a2b6a3edd4b1652cb9cc5aa6f SELECT extract_address(abi_field(e.data, 0)) as exchange, extract_address(topics[2]) as token_0, extract_address(topics[3]) as token_1 FROM block_log_events e WHERE e.topics[1] = keccak('PairCreated(address,address,address,unit)') AND e.sender = :UNISWAP_FACTORY_V2
A JOIN
with the above query will get you all of the balances across all current and future Uniswap exchanges!
Don't sweat it – SafeKeep integrates Uniswap for you with 0 lines of code
Figure 4: SafeKeep's Uniswap Liquidity balances and transaction history.
Do you think 5 lines of code is too much? No problem, you can use SafeKeep today to see your Uniswap balances and transaction history. Here's a live example of the aforementioned liquidity provider who has invested in 10 Uniswap pools: