Common Aggregate Functions

An aggregate function performs a calculation on a set of values, and returns a single, aggregated value. Aggregate functions are often used with the GROUP BY clause of the SELECT statement. Click here for a full list of aggregate functions.

any()

SELECT distinct value is a common practice to find unique values. However, the distinct statement is extremely inefficient as it compares every value with every other value. The any() function is a solution to this inefficiency as it just returns the first unique value encountered - no comparisons needed. Using any() in your SQL query can significantly improve it's performance. The code sample below compares the performance differences.

-- Query Time: 2 mins 5 seconds
SELECT distinct chain_name 
FROM blockchains.all_chains 

-- Query Time: 27 seconds
SELECT any(chain_name), chain_id 
FROM blockchains.all_chains
GROUP BY chain_id

Both queries return a list of chains supported on Increment.

Example

Let's assume you want to find the daily sum of gas paid on Ethereum - denominated in ETH. Due to the structure of the all_chains table, each log event has a separate row; meaning the transaction information connected to that log event gets printed each time. For example, here's the data for a transaction with 5 log events.

tx_hashtx_gas_spenttx_gas_pricetopic0
3F16273C6106...233932915790161449995E5DD615...
3F16273C6106...23393291579016144C209B5FC8AD...
3F16273C6106...23393291579016144CDDE6E09BB7...
3F16273C6106...2339329157901614804C9B842B27...
3F16273C6106...2339329157901614DDF252AD1BE2...
3F16273C6106...2339329157901614DDF252AD1BE2...
3F16273C6106...2339329157901614DDF252AD1BE2...

To calculate the gas fee for this transaction, use the following equation:

tx_gas_spent/pow(10, 18)* tx_gas_price

However, if you wanted to calculate the total gas fees paid on Ethereum you couldn't just wrap this equation in a sum() function. Reason being, it would sum the transaction gas fee for every log event. Use the any() function to return a single row of transaction data for each tx_hash. Your subquery would look something like this:

SELECT any(tx_hash), tx_gas_spent, tx_gas_price, signed_at
FROM blockchains.all_chains
WHERE chain_name = 'eth_mainnet'
AND signed_at > '2022-10-10'
GROUP BY tx_gas_spent, tx_gas_price, signed_at

To understand how the any() function works, insert the transaction hash from the table above into the WHERE statement.

SELECT any(tx_hash), tx_gas_spent, tx_gas_price, signed_at
FROM blockchains.all_chains
WHERE chain_name = 'eth_mainnet'
AND tx_hash = unhex('13F018913C86C2D842C3634DC478992CBB15A45F4E1B38BF597F23D48A1C48EC')
GROUP BY tx_gas_spent, tx_gas_price, signed_at
tx_hashtx_gas_spenttx_gas_pricetopic0
3F16273C6106...233932915790161449995E5DD615...

Now you can use this query as a subquery to calculate the total fees paid each day.

SELECT date_trunc('day', signed_at) as date
       , sum(tx.tx_gas_spent/pow(10, 18)* toFloat64(tx.tx_gas_price)) as aggregate_gas_cost
FROM (
		SELECT any(tx_hash), tx_gas_spent, tx_gas_price, signed_at
	    FROM blockchains.all_chains
  		WHERE chain_name = 'eth_mainnet'
        AND signed_at > '2022-10-10'
	    GROUP BY tx_gas_spent, tx_gas_price, signed_at
    ) tx
 GROUP BY date

argMax()

argMax(arg, val) calculates the arg value for the maximum value of val.

Example

Lets assume you want to add a metric chart to your dashboard to display the transaction count for the month. To start, the query below counts the number of unique transactions per month over the last 3 months.

  SELECT uniq(tx_hash) as count, date_trunc('month', signed_at) as month
  FROM blockchains.all_chains
  WHERE signed_at > now() - interval '3 month'
  GROUP BY month

The query returns two columns - count and month - with 4 rows of data.

countmonth
3688522692022-10-01
4384839512022-11-01
2209475242022-12-01
285272882023-01-01

However, all you want to display is the transaction count for this month. To do so, use the argMax() function which finds the transaction count for the maximum (i.e latest) month. Your query would look something like this:

SELECT argMax(count, month)
FROM (
  SELECT uniq(tx_hash) as count, date_trunc('month', signed_at) as month
  FROM blockchains.all_chains
  WHERE signed_at > now() - interval '3 month'
  GROUP BY month
  )

The result:

count
28527288

min()

The min() function calculates the minimum across a group of values.

Example

Lets assume you want to find new transaction senders (tx_sender) on Avalanche. To start, write a query that finds the first time each address sent a transaction. Your query would look something like this:

SELECT min(signed_at) as signed_at
       , tx_sender as address
FROM blockchains.all_chains
WHERE chain_name = 'avalanche_mainnet'
GROUP BY address     

Next, transform the query above into a subquery where the final SELECT statement counts the number of new addresses each day.

SELECT date_trunc('day', signed_at) as date
       , uniq(address) as new_addresses
(
  SELECT min(signed_at) as signed_at
        , tx_sender as address
  FROM blockchains.all_chains
  WHERE chain_name = 'avalanche_mainnet'
  GROUP BY address             
)
WHERE signed_at > '2023-01-01'
GROUP BY date

The result:

countmonth
2023-01-02312
2023-01-0864
2023-01-04916
2023-01-05536
2023-01-03377
2023-01-071278
2023-01-06860
2023-01-01642

uniq()

SELECT count(distinct value) is a commonly used statement to find a count of unique values. However, the distinct statement is extremely inefficient as it compares every value with every other value. The uniq() function is a solution to this inefficiency as it returns a very accurate approximate count of unique values. Using 'uniq(x)` in your SQL query can significantly improve it's performance.

info

The uniq(x) aggregate function approximates the number of unique values with an average margin of error between 0.1-1%. The margin of error can be bigger or smaller depending on your query.

Example

The code sample below compares the speed and accuracy of the uniq(x) function against count(distinct x).

-- Query Time: 40.65s
-- Result: 127,604
SELECT count(distinct tx_hash)
FROM blockchains.all_chains
WHERE signed_at > now() - interval '1 week'

-- Query Time:  0.91s
-- Result: 127,456
SELECT uniq(tx_hash)
FROM blockchains.all_chains
WHERE signed_at > now() - interval '1 week'