Throughput & scalability

Everyone claims to be scalable, but here we'll prove that Headjack can handle billions of accounts and anchor unlimited amounts of off-chain content tied to identity with simple napkin math.

How big is a Headjack transaction

Applications post anchors to off-chain content with an IPFS CID hash and a merkle root. IDMs also anchor off-chain content (mainly user preferences & updates to social graph), but they also post authorizations to other accounts (applications) to post on behalf of users as integer pairs.

So the fields for a transaction by an application/IDM (which will be the majority) are:

version: 4 bytes
signature: 65 bytes
blob IPFS address: 32 bytes
blob merkle root: 32 bytes
nonce: 4 bytes auto-increment integer associated with the account - to prevent reordering of anchored off-chain blobs (which would mess up internal addressing based on that nonce)
value: 4 bytes amount of native token paid to validators for transaction inclusion

So far that is 141 bytes which almost every transaction by an application or IDM contains. IDMs also submit a list of authorizations (or revocations) as integer pairs. For example, 1000 accounts authorizing 15 different applications to post on their behalf would be 1000 integer pairs. Assuming 8 byte integers (up to 2^64) that would be 8 2 1000 = 16k bytes.

Naive scenario

The initial version will target block bandwidth of up to 100 kb/s. This is not a problem for ZK validiums as there are already DA solutions that offer 10 mb/s or even much more.

Assuming:

1 MB block size & 10 second block time (100 kb/s of block bandwidth)
1000 applications posting in every block
100 IDMs authorizing as much users as possible - filling the remaining block space
no on-chain actions such as keypair & name changes, account creation & direct interaction with the chain by end users

We get:

1100 actors (1000 applications + 100 IDMs) that post in every block at least 141 bytes for their transactions, which is 155100 bytes
the remaining 893476 bytes (1048576 (1MB) - 155100) can be filled with authorizations and since an authorization is 16 bytes (8 * 2) that would be 55842 authorizations/revocations every 10 seconds or 5584 authorizations/revocations per second
for 1 billion accounts that would be 0.557 authorizations/revocations per person per day which is actually quite good - people on average do way less single sign-ons per day

completely different goals - comparing the 2 protocols just to put things into perspective	Headjack	Ethereum
block size	1 MB	~80 kb
block time	10 seconds	~13 seconds
blockchain bandwidth per second	100 kb/s (x16 more than Ethereum)	~6.15 kb/s
blockchain bandwidth per day	8640 mb/d	~528 mb/d
transactions/authorizations per second	5584 APS	~14 TPS
transactions/authorizations per day	482,457,600 APS	1,209,600
transactions/authorizations per person per day for 1 billion accounts	0.482 (x400 more than Ethereum)	0.0012096

Realistic scenario

The naive scenario does not include on-chain actions for specific accounts such as:

keypair changes (new pubkey (32 bytes) + signature (65 bytes) if there is an older key)
account creation (if done by an IDM then this is just a few bytes - no pubkey)
name registration & ownership changes (see the dedicated page for more details)
updating account fields such as a URI pointing towards an off-chain account directory (which could point to archived posts) or pointing to another account index for such services
signed transactions by individual accounts that want to directly interact with the chain
- authorizing an IDM, rotating keys, or even publishing off-chain content as an application

However, the realistic scenario will not be far from the naive because:

Only a % of all accounts will have keypairs (even though 100% could) and will make just a few signed actions per year - leaving most block throughput for authorizations through IDMs.
Large % of accounts will rarely even be authorizing new applications - many people don't sign in to new services through SSO every single day. There could also be 2 types of log-ins: passive (viewing only - nothing on-chain) and authorized (allowing services to post on behalf of users).
Many applications that don't generate a lot of off-chain activity will publish less often than on every block in order to minimize on-chain block space costs.
The chain throughput can be further optimized & scaled by multiple orders of magnitude.

Optimizations & scaling

Throughput of 100 kb/s is just the start & can easily go to 1-10 mb/s as a ZK rollup.
The chain & state can be trivially sharded - there aren't problems such as fracturing liquidity or preventing composability because accounts don't care about each other - they mostly contain authorization block numbers & keypair history.
Integer indexes that only need 4 bytes can be compressed/batched together - it'll take many years to go beyond 4 billion accounts so the actual throughput is 2x of what is listed here.
A fee market can develop that tunes the cost of different actions so that actors don't just pay for on-chain bytes - the ways the system is used can be guided through incentives.
Other optimizations not listed here - this is just the starting point.

State growth

Headjack's main value proposition is keeping historical records of the sequence of authorizations, key changes & off-chain content anchors and being able to generate proofs for any specific piece of off-chain content.

TODO: finish this

https://ethereum.stackexchange.com/questions/268/ethereum-block-architecture

numbers - state - one difference from other cryptos is that this one is append-only and could be designed to be easier on memory access patterns

One difference with other blockchains is that accounts in Headjack are numbers and thus the state tree could be different.

on eth state growth: https://twitter.com/SalomonCrypto/status/1587983584471633921 https://hackmd.io/@vbuterin/state_size_management

All on-chain changes just append data to one of the few attributes of:

accounts:
- public keys: a map of keys and block height integer ranges (non-overlapping)
- authorizations: a map of indexes and arrays of block height integer ranges
- nonces: an array that maps autoincrement indexes to block numbers
  - appended only when publishing off-chain content (usually an application/IDM)
names:
- owners: a map of owner indexes and block height integer ranges (non-overlapping)
- nonces: an array that maps autoincrement indexes to account index & nonce pairs
  - appended only when publishing off-chain content (usually an application/IDM)

TODO: should IPFS hashes & merkle roots be saved in the state?

- no?

TODO: light clients? in addition to merkle proofs for inclusion of content they would need merkle proofs for the state of which applications a user has authorized to post on their behalf in a given block

state growth: https://twitter.com/keoneHD/status/1574451986501623808

Off-chain content

There are no limits for off-chain content as it is all just anchored with merkle roots - it could be as high as hundreds of terabytes per second. There isn't a more minimal design that can link unbounded amounts of off-chain data to billions of identities that can change keys & names and yet still provide the guarantees & mental model simplicity of Headjack - it achieves consensus on the absolute bare minimum.