Back to Blog

Risk assessment methodology: How probability of loss turns documented failures into a risk score

by Dmytro Zap
10m

Intro

In our public communication, we often say that how risk assessment is done right now in crypto is broken (because it is not measured, bummer), doesn't work, and leads to catastrophic losses from project treasuries and users' wallets. CORE3 offers an alternative: a comprehensive risk infrastructure. But how do you know if it's credible, complete, and covers enough so it can flag failures in advance and enable risk-aware decisions in Web3? That's what we're going to talk about in this article:

  • The logic behind how projects are assessed on risk with the Probability of Loss framework
  • Why each domain proved it is worth a closer eye before the next failure, not after
  • What exact failures forced us to include certain metrics in the assessment set
  • How we compare the risk of a memecoin to a privacy blockchain and still get it right

Why does CORE3 assess projects as it does?

Before CORE3, risk assessment in crypto was done manually: a number of risk proxy metrics were gathered, stitched together, and weighted according to how someone felt they should be. If you had an astronomical number of projects to study and a geological amount of time, you could maybe find how TVL/MCAP/Volume/Backers correlates with whether a project will fail, or how a smart contract audit protects against team phishing. 

 

But crypto doesn't have that luxury: the industry is too young and moves too fast for purely empirical weight-fitting across thousands of variables. So at CORE3, we worked backward from the other end. Instead of asking "which metrics predict failure?" with not enough data to answer statistically, we took the data points that signal "this will drain a project if not remediated" or "these economics will collapse under their own weight", and translated them into probability language. 

Spoiler: We also use TVL/MCAP/volume/team credibility as metrics, but they are just part of a full risk picture, with 1-3% influence on the final score. 

 

Some metrics are super critical, like smart contract audit coverage, and are weighted appropriately. Others are smaller criticality metrics that appear minor individually but enable larger-scale attacks when combined with multiple vulnerabilities. For example, a missing checkLiquidity() call in Euler Finance was one harmless line of code on its own. Then the hacker combined it with a flaw in the liquidation logic, and it drained $197M in a single block. So, to conclude and present a logic clearly, we take retrospective failure angles and apply them to a forward-looking assessment frame. Historically, when a project lacked a specific set of risk practices, it failed. And the more practices are missing, the bigger the probability that the project will fall hard, loud, and painfully. This benchmark is applied to current projects to assess their survivability based on the risks they address and miss.insufficient data to answer statistically, we used the data points that signal "this will drain a project if not remediated" or "these economics will collapse under their own weight"

Six apocalyptic domains: How each one breaks a project apart

Security beyond smart contracts

21.webp

 

Let's look at how crypto risk management works in the security domain. People always ask if a project has an audit, but that question alone is useless. Euler had ten of them and still got hacked. CORE3 methodology tracks if an audit is present, what it covers, and whether the findings were fixed. 

 

Other metrics answer questions like: Is there a bug bounty program for spotting continuous code vulnerabilities? Will any on-chain monitoring spot the attack in real time?

 

Beanstalk is one example of why this layered approach is necessary. 

 

In April 2022, someone flash-borrowed over $1B, captured 70% of the voting power, and submitted a malicious proposal with an emergency commit function. The feature wasn’t spotted by auditors because it was outside their coverage. 

The other safety measures that could have prevented the attack at some point were missing, too: no bounty, no monitoring, no protection included against a known attack type. This resulted in $182 million vanishing. The protocol died. 

 

Our conclusion reflects the methodology: if you don’t have one of those, the protocol probably will survive thanks to other safety measures. But with all security practices absent, the probability of being a hacker’s target rises exponentially.

 

The financial metrics that flash before failure

22.webp

 

Surprisingly, in crypto, which is a financial sector, financial health metrics work pretty well to signal risk. We assess treasury quality, income sources, revenue dependencies, liquidity concentration, and related factors to predict how well tokenomics will perform under stress. 

 

Terra is still worth revisiting in our case study. Here’s the situation:

 

Anchor Protocol paid 20% on UST, costing $6 million a day from a shrinking reserve. 75% of all UST was locked in one protocol. The peg was backed with Bitcoin bought during a bull run, correlated to the exact conditions that would trigger a depeg. And the stabilization mechanism contained an unlimited LUNA mint trigger. Once that was activated, the death spiral was arithmetic. 

Each of those maps to a PoL sub-metric: revenue sustainability, TVL concentration, treasury quality, and inflation triggers. We could see four signals months before, but industry didn’t connect them to risk. Consequently, $45 billion was erased. 

Another case to demonstrate why measuring finance is important from a different angle. 

 

Celsius has advertised 18% yields. The crazy number was funded by using customer deposits in illiquid DeFi strategies. The other issue was the treasury: it consisted of stETH, and they couldn’t redeem it when the bank run started. 

Therefore, the financial domain provides an assessment of whether a project's economic structure contains the conditions that historically precede collapse and whether those conditions are quantifiable at present.

 

Operations opacity to observe 

23.webp

 

Operations measures: Wash trading, certificates ISO 27001 / CCSS, Founders' track record, Documentation, Liquidity, an other. 

 

A case: 

BitConnect reached the top 10 by market cap, even though its GitHub was dead. The project also advertised a trading bot with no documentation. The team was pseudonymous, and on-chain data showed they were okay with wash trading on their platform. On a regular sunny day, BitConnect's price collapsed from $430 to under a dollar in 24 hours.

 

Another case:

The Squid Game token was smaller ($3.38M), but the execution was lazier. Zero GitHub commits after launch. In a white paper, they claimed to have a Netflix IP tie-in, yet no legal documents confirmed it. Their smart contract had disabled the sell function, which was readable, but didn’t stop the investors from believing. You probably already know the result: the holders became victims of a true Squid Game, left with nothing except worthless tokens after the liquidity was withdrawn. 

In retrospect, all these indicators look obvious. But in reality, emotions force people to believe it’s just FUD. That’s why CORE’3 operational domain is designed to make them obvious before the rug.

 

How can reputation ruin the revenue?

24.webp

 

Reputational risk is where past behavior becomes a forward-looking metric. It assesses how a project responds to incidents, social fraud signs, the auditor's reputation, and insurance.

 

In January 2022, Multichain suffered a $3 million exploit due to a centralized MPC architecture where one person, the CEO, held sole control over all signing keys. Once this became public, no remediation followed. Eighteen months later, $126 million was drained (the project's CEO was arrested 6 weeks before). Cointelegraph later revealed he had allegedly used a fake identity to register the company. On the other hand, official channels continued to reassure users that operations were normal. By shutdown, $1.5 billion in TVL was inaccessible. 

 

If we have seen the compromise before with no remediation followed, it will probably happen again. 

Cream Finance makes it even cleaner: three flash-loan hacks in one year ($37M, $29M, $130M), all the same attack class, yet no verified fix followed them. TVL collapsed from over $1 billion to zero. Prior audits by PeckShield and CertiK didn't prevent recurrence. Three hacks of the same type in ten months, with no public post-mortem between them, is why past incident response is a standalone metric in PoL.

 

Cosmetic compliance checked

25.webp

 

Compliance risk measures whether a project operates within established legal frameworks or is structured to avoid them. Projects that choose jurisdictions and disclosure practices to minimize oversight consistently show higher rates of fraud and customer loss. 

 

For example, Tornado Cash, which processed over $7 billion, mixed $455 million stolen by North Korea's Lazarus Group from the Ronin hack. In August 2022, OFAC designated the protocol for this. As a result, the developer was arrested, and the co-founders were indicted by the DOJ on charges of money laundering and sanctions violations. 

 

If assessed by CORE3's current methodology, the compliance signals would be flagged before the designation: no registered entity, no MSB registration despite FinCEN guidance, and the one compliance measure of a front-end block on OFAC-listed wallets (which was pretty cosmetic). OFAC lifted the sanctions on the smart contract in March 2025 following a Fifth Circuit ruling, while the criminal charges remain active.

 

But we're not to judge; CORE3's goal is to indicate whether the user will incur losses due to legal restrictions.

How does CORE3 score 29 different types of projects with different metrics on the same scale?

Imagine you have two projects: one is a DeFi yield aggregator on Ethereum, and another is a privacy blockchain. You have a set of metrics that define "will it fail," but they differ between the two projects. The CORE3 cryptocurrency risk score has a scale of 1-99, so how do we score them both on it? This is where category-specific assessment sets come in, designed specifically for 29 different categories of projects. 

 

The idea is that if you weight different conditions in different assessments correctly, you get a comparable benchmark. The domains stay the same: security, financial, operational, reputational, regulatory, and dependency; But the conditions within each domain flex to match what actually matters for that project type.

 

A yield aggregator gets assessed on oracle dependency, revenue source sustainability, and TVL concentration, because those are the failure modes that killed Celsius ($4.7B in customer claims, treasury locked in illiquid stETH) and drained Mango Markets ($114M via oracle manipulation in under 30 minutes). A privacy blockchain doesn't face those same risks, but it does face dependency risks around its consensus mechanism, regulatory exposure around jurisdiction choices, and operational risks around the verifiability of its core cryptographic claims. 

 

Even when conditions assessed are different, the scale remains the same: final output is 1-99 PoL index with breakdown. And this is how the two projects can be compared: both addressed the surface issues that led to their predecessors' failures, making PoL of 40 for DEX comparable to a GameFi project with PoL of 50.

Conclusion: Challenge our Web3 risk methodology

While the Probability of Loss is a crypto risk score on a 0-to-100 scale where lower means fewer risk factors are verifiably present, the methodology itself isn't radically different from how the Web3 industry has approached risk assessment before. 

What's different is that we scoped the metrics from documented failures, assigned category-specific assessment sets for 29 project types, and backtested the weights against real incident data. The result is the first open benchmark for crypto risk, not a proprietary black box, but a framework you can inspect, question, and build on. The difference between CORE3's Probability of Loss and competitors' offerings is that CORE3's standard is open, debatable, and self-regulatory.

 

In practice, this means:

  • We build custom solutions for specific needs on top of the existing framework, CORE3's architecture supports it by design, so if you need one, drop us a line.
  • When the market needs more coverage, we deliver it. We've already received queries about mapping stablecoin risks, and we're addressing them now. If any metric feels odd or missing for your project category, that's a conversation we want to have.
  • You know every domain, every metric, every weight we use to produce a Probability of Loss index. 

 

Measure risk. Improve your PoL. Build trust.