Common Aggregate Functions
An aggregate function performs a calculation on a set of values, and returns a single, aggregated value. Aggregate functions are often used with the GROUP BY
clause of the SELECT
statement. Click here for a full list of aggregate functions.
any()
SELECT distinct value
is a common practice to find unique values. However, the distinct
statement is extremely inefficient as it compares every value with every other value. The any()
function is a solution to this inefficiency as it just returns the first unique value encountered - no comparisons needed. Using any()
in your SQL query can significantly improve it's performance. The code sample below compares the performance differences.
-- Query Time: 2 mins 5 seconds SELECT distinct chain_name FROM blockchains.all_chains -- Query Time: 27 seconds SELECT any(chain_name), chain_id FROM blockchains.all_chains GROUP BY chain_id
Both queries return a list of chains supported on Increment.
Example
Let's assume you want to find the daily sum of gas paid on Ethereum - denominated in ETH. Due to the structure of the all_chains
table, each log event has a separate row; meaning the transaction information connected to that log event gets printed each time. For example, here's the data for a transaction with 5 log events.
tx_hash | tx_gas_spent | tx_gas_price | topic0 |
---|---|---|---|
3F16273C6106... | 233932 | 9157901614 | 49995E5DD615... |
3F16273C6106... | 233932 | 9157901614 | 4C209B5FC8AD... |
3F16273C6106... | 233932 | 9157901614 | 4CDDE6E09BB7... |
3F16273C6106... | 233932 | 9157901614 | 804C9B842B27... |
3F16273C6106... | 233932 | 9157901614 | DDF252AD1BE2... |
3F16273C6106... | 233932 | 9157901614 | DDF252AD1BE2... |
3F16273C6106... | 233932 | 9157901614 | DDF252AD1BE2... |
To calculate the gas fee for this transaction, use the following equation:
tx_gas_spent/pow(10, 18)* tx_gas_price
However, if you wanted to calculate the total gas fees paid on Ethereum you couldn't just wrap this equation in a sum()
function. Reason being, it would sum the transaction gas fee for every log event. Use the any()
function to return a single row of transaction data for each tx_hash
. Your subquery would look something like this:
SELECT any(tx_hash), tx_gas_spent, tx_gas_price, signed_at FROM blockchains.all_chains WHERE chain_name = 'eth_mainnet' AND signed_at > '2022-10-10' GROUP BY tx_gas_spent, tx_gas_price, signed_at
To understand how the any()
function works, insert the transaction hash from the table above into the WHERE
statement.
SELECT any(tx_hash), tx_gas_spent, tx_gas_price, signed_at FROM blockchains.all_chains WHERE chain_name = 'eth_mainnet' AND tx_hash = unhex('13F018913C86C2D842C3634DC478992CBB15A45F4E1B38BF597F23D48A1C48EC') GROUP BY tx_gas_spent, tx_gas_price, signed_at
tx_hash | tx_gas_spent | tx_gas_price | topic0 |
---|---|---|---|
3F16273C6106... | 233932 | 9157901614 | 49995E5DD615... |
Now you can use this query as a subquery to calculate the total fees paid each day.
SELECT date_trunc('day', signed_at) as date , sum(tx.tx_gas_spent/pow(10, 18)* toFloat64(tx.tx_gas_price)) as aggregate_gas_cost FROM ( SELECT any(tx_hash), tx_gas_spent, tx_gas_price, signed_at FROM blockchains.all_chains WHERE chain_name = 'eth_mainnet' AND signed_at > '2022-10-10' GROUP BY tx_gas_spent, tx_gas_price, signed_at ) tx GROUP BY date
argMax()
argMax(arg, val)
calculates the arg
value for the maximum value of val
.
Example
Lets assume you want to add a metric chart to your dashboard to display the transaction count for the month. To start, the query below counts the number of unique transactions per month over the last 3 months.
SELECT uniq(tx_hash) as count, date_trunc('month', signed_at) as month FROM blockchains.all_chains WHERE signed_at > now() - interval '3 month' GROUP BY month
The query returns two columns - count
and month
- with 4 rows of data.
count | month |
---|---|
368852269 | 2022-10-01 |
438483951 | 2022-11-01 |
220947524 | 2022-12-01 |
28527288 | 2023-01-01 |
However, all you want to display is the transaction count for this month. To do so, use the argMax()
function which finds the transaction count for the maximum (i.e latest) month. Your query would look something like this:
SELECT argMax(count, month) FROM ( SELECT uniq(tx_hash) as count, date_trunc('month', signed_at) as month FROM blockchains.all_chains WHERE signed_at > now() - interval '3 month' GROUP BY month )
The result:
count |
---|
28527288 |
min()
The min()
function calculates the minimum across a group of values.
Example
Lets assume you want to find new transaction senders (tx_sender
) on Avalanche. To start, write a query that finds the first time each address sent a transaction. Your query would look something like this:
SELECT min(signed_at) as signed_at , tx_sender as address FROM blockchains.all_chains WHERE chain_name = 'avalanche_mainnet' GROUP BY address
Next, transform the query above into a subquery where the final SELECT
statement counts the number of new addresses each day.
SELECT date_trunc('day', signed_at) as date , uniq(address) as new_addresses ( SELECT min(signed_at) as signed_at , tx_sender as address FROM blockchains.all_chains WHERE chain_name = 'avalanche_mainnet' GROUP BY address ) WHERE signed_at > '2023-01-01' GROUP BY date
The result:
count | month |
---|---|
2023-01-02 | 312 |
2023-01-08 | 64 |
2023-01-04 | 916 |
2023-01-05 | 536 |
2023-01-03 | 377 |
2023-01-07 | 1278 |
2023-01-06 | 860 |
2023-01-01 | 642 |
uniq()
SELECT count(distinct value)
is a commonly used statement to find a count of unique values. However, the distinct
statement is extremely inefficient as it compares every value with every other value. The uniq()
function is a solution to this inefficiency as it returns a very accurate approximate count of unique values. Using 'uniq(x)` in your SQL query can significantly improve it's performance.
The uniq(x)
aggregate function approximates the number of unique values with an average margin of error between 0.1-1%. The margin of error can be bigger or smaller depending on your query.
Example
The code sample below compares the speed and accuracy of the uniq(x)
function against count(distinct x)
.
-- Query Time: 40.65s -- Result: 127,604 SELECT count(distinct tx_hash) FROM blockchains.all_chains WHERE signed_at > now() - interval '1 week' -- Query Time: 0.91s -- Result: 127,456 SELECT uniq(tx_hash) FROM blockchains.all_chains WHERE signed_at > now() - interval '1 week'