The Science of 70%
A technique that is integral to the science of Transition Economics’ (TE) Proof Charts (TEPs), is The Science of 70%.
Similar to the Pareto 80/20 Rule and Six Sigma quality control sciences, The Science of 70% is an easy approach to finding important measures within very large data sets.
The science of 70% permits us to determine causality from a very large, complex dataset by taking the value at the 70%-value of a known-important indicator’s data, and comparing the rankings of the amplitudes of charts created in frequency distributions.
If this sounds complicated, it’s not; in fact, it’s a very simple technique. It just takes a few steps to mine value from even the largest dataset using this approach.
To explain how this works, let’s take a real-world example. You can follow along with a Jupyter Notebook session or use a spreadsheet as well if you like. I assume here that you have a pretty good understanding of spreadsheets, so look for tutorials online if you need spreadsheet help …
Skill Level: Beginner
Fun Fact: Eskimos have 50 words for “snow”; and the same detail is important when working with data
- APIs – allow us to see and pull down most-recent data online whenever we wish. API pages are URLs like “www.google.com” and typically use a format called JSON or XML. Our spreadsheets use data formats like .CSV, .xlsx, .xls, and similar.
See an example API here: https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?format=json
- Columns versus Rows – data columns are verticle (run up and down) while Rows are horizontal (run left to right/right to left). Fun Fact: 30% of the earth’s population reads Right to Left
- Data – is any information that we need to build a chart with – including Country name, Indicator name, Indicator Value
- Datapoints – are the number of values available
- Data Quality – Governments lie about embarrassing stats, countries fail to report stats consistently or fully, military spending can be spread across many reports, suicides can be reclassed unknown or COVID or Fentanyl Deaths, on and on. The consistency of lying and challenges of reporting, consolidation, and distributing quality data, is true for almost all counties so context and multi-measure validations are essential. Read how this is accomplished in the Global Leadership Book of Knowledge (GL-BOK) and the WAOH Econometric Library
- Data Set – Think of a Data Set as the Column’s 215 datapoints in the example below
- Data Types – Data can be numeric (floating point decimals, integers, etc.), characters/strings, etc.
- Data Value – the actual value within each datapoint
- Metadata – is information about the data. Data Type is a metadata, a description of the data, or the source of the data can also be documented in metadata; metadata can include revision histories, etc.
- TEP Score – Score is the amplitude of a TEP chart; the difference between its maximum value and its minimum value
- Threshold Value – is the data value contained within our 70% datapoint. This value becomes the threshold for Advance or Collapse for this indicator. For an explanation of how Threshold Indicators are used to determine causality (importance) of any measure, read Transition Economics
Step 1: Determining the 70% value for National Savings %GDP
- First, let’s download data for a causal indicator – to make this easier later on. Open. Click on this hyperlink – https://csq1.org/info/NY.GDS.TOTL.ZS.htm – and you can also find this link to “Gross Domestic Savings (% of GDP)” with the searchable table on the Data Science page at the WAOH Library. The link will take you to a Sheet of TEP charts that are all drawn using this Science of 70%
- Next, download the data that comprises this chart. Do an internet search for “World Bank NY.GDS.TOTL.ZS“
- We want the most recent data, so click on CSV or Excel under the Download tab (see the green circle above).
Many computers will open this data file into your spreadsheet application automatically, but using any spreadsheet to open the file from your browser’s download directory will work similarly.
Country Name 2017 2018 2019 Aruba 22.51873515 —> 22.51873515 Afghanistan Angola 29.88168324 33.16398899 32.04094604 Albania 8.893553742 9.818071383 8.231647145 Andorra Arab World 30.87885253 33.90737467 32.82091963 United Arab Emirates 49.48114701 49.87897504 47.80923381 Argentina 15.56353276 17.77413958 19.61479264 Armenia 7.651454985 8.712420363 4.088341312 American Samoa Antigua and Barbuda 26.28307018 27.43485333 27.43485333 Australia 24.64969067 24.90992989 25.70835979
The blue text above shows how to clean up the data by using older stats for any gaps. Is it more accurate to use years with very few or no missing data? Yes. So, only do this cleanup when a handful of stats need to be carried over. Choose the most recent year of data that has 90%+ column data points updated (not empty). Not every nation reports every statistic, so rows / nations that do not post a statistic for this measure can be ignored.
- Sort the table in order of “least-important row first” to “most-important row” last. In the example above, the cleaned-up data in the column labeled “2019” is the dataset that you will want to resort into ascending order.
If the measured indicator’s most-important stats, are also its smallest values (“Infant Mortality Rates” is an example of a stat where lower numbers are better), sort the dataset in descending order (with highest values first to lowest values last).
- Locate the value that is at 70% in the list. Gross Domestic Savings had approximately 215 values in 2019, so you are looking for the value at cell #149 (215 cells x .70% = cell 149). Note that the more-important stats will now comprise the bottom 30% of datapoints. The 2019 column’s row 149 should show a value of approximately 27.5 – and you might also want to note if neighbouring cells have values that are the same, close, or very different.
- Sweet Spots – But what about TEPs with Sweet Spots? A “Sweet Spot” indicator looks like the IMF Finance Industry Index that shows Finance Industries are helpful to a certain size (when valuation equals GDP), and then they are harmful above that size. A frequency distribution chart looks like a McDonald’s restaurant arch which will show starting and ending data both have low values. Will the rule above will provide us with the 30% Advance and 70% Collapse data split that we need to find this threshold?
Yes. For Sweet Spots, our process works just the same. Remember that we are looking for the 70/30 threshold “line” within the dataset. Nations above this threshold are Advancing, and nations below this threshold are collapsing – for purposes of this exercise.
- With this threshold value of 27.5 in hand, we are now going to assume that: Savings >= 27.5 (greater than or equal to) are Advancing, and nations with Savings < 27.5% (less than) are trending toward collapse – based on Domestic Savings.
That’s it, you’re done. By the Science of 70%, you have just created a new way to confirm the causality of any measured national indicator.
Many types of TEP Reports
In the following example, we see a TEP (Transition Economics Proof) Frequency Distribution Report created using Trade Balance with a threshold value of 0%. At above 0$ or 0% Trade Balance, a nation is Advancing; and, below 0% or 0$ Trade Balance, a nation is collapsing (by this indicator).
TEP Sheets stack different types of TEP Reports for advance or collapse comparisons using Social Contract (SCP), Social Contract Product, GINI, Change in measured values over 10 and 20-years, per Capita, and other measured comparison (see https://csq1.org/info/NY.GDS.TOTL.ZS.htm to see a full TEP sheet at WAOH).
In 2022 “Threshold Analytics” was added to permit analysis of hundreds of highest-scoring indicator thresholds for detail comparisons between advancing and collapsing nations.
Why does this approach work?
This works because TEP reports show which measured indicators have strong advance/collapse and causality – as this chart above shows; and, which indicators do not.
TEP frequency distribution charts showing little causality (low scores/amplitudes in their TEP Charts) will comprise the lion’s share of reports and only those reports with truly high causality will dominate to top Rankings.
If you find too many high-scoring 1.0 TEP reports (there should only be a few dozen from 1600 indicators at maximum), you have set your threshold too low – so try the value located at the 75% datapoint of the survey data.
Alternatively, if you find that too few TEP Reports have an amplitude score of 1.0, then lower your threshold datapoint to create more 1.0 scoring TEPs. You should only be thresholding causal indicators so that the comparisons of all reports are as valid as possible.
There are 200 nations measuring this indicator; so there are 200 comparisons to a threshold value (of 70%) in each TEP Chart, 20 to hundreds of TEP Charts per indicator (TEP Sheet), and 1600+ measures/indicators. So, you begin to see the value of automating all of this simple but repetitive addition, subtraction, multiplication, and division work. Now consider the number of computations required to build a chart of the “trend change” in TEP scores or ranks over time for 60 or 100-years – for every TEP and indicator as well; that’s a lot of number-crunching – that a computer can accomplish in just seconds.
This is the reason that we say evidence-based economics is a computer science, and why Transition Economics discourages mathematic guesswork and obfuscation. As Issac Newton put it, science should be simple.
The value at the 62% data point ($0.00 or 0.0% GDP) works well as a threshold for some datasets like Trade Balance, based on my experience.
So our Science of 70% is more a Science of around 70% really.
The actual scores matter less than the Rankings truly, but to get rankings that are consistent between different indicator comparisons, you will want to be consistent with the number of reports give a 1.0 scoring – across all of your indicator comparisons. Admittedly, this last point is a nuanced nice-to-have, but it also shows experience in working with the Science of 70% data tables.
How good is our automation at present?
When we say that evidence-based economics is a computer science, we aren’t kidding – but we also always want to give you the tools to validate any automated number crunching for yourself on a spreadsheet too.
Software tools like WAOH and MEMS take around 4-hours to recreate 1600 TEP-sheets with scoring and ranking data (assuming that all data is current in our cache). If your are old enough to remember Windows 2.0, it was groundbreaking and we are further along than that – probably at Windows 5.0 at this point.
Getting us to Windows 12 is just a matter of time, funding, development, and that comes along with Adoption.
So do your part and get the word out there, that evidence-based socioeconomic science is essential, the future, and that we need to be teaching and building tools that can double economies reliably – as a national priority.
Important Parts of a TEP Chart
The Red Line: 28 is the minimum number of countries needed to create a TEP Chart Frequency Distribution, and we try to provide a minimum of four data points on a TEP chart.
The Blue Line: Is added to explain how many countries participated in each survey at the indicated value. A minimum survey size of 7 countries is considered necessary for a meaningful survey.
The Dashed Red Line: is a Linear Regression line that averages the ups and downs of the survey line to provide an approximate trend. It’s a guide only.
An Orange Line: is used in manual TEP Charts, while a Redline is portrayed on program-driven charts.
XY Scatter Plot Charts: A TEP is an XY Chart with an equalized horizontal axis (X-axis). Arguably an XY Chart is the more accurate, but where only 200-countries keep all stats and not every nation participates in keeping every statistic, we need to maximize the benefit of the available data. WAOH and MEMS’s TEP Sheets show both TEP and XY Charts for the same calculations for this reason.
The following two charts plot the same data with a TEP on the left and XY Scatter Plot on the right. As TE report scoring is based on amplitude (y-axis) only, x-values don’t affect scores nor report rankings.
I chose a Causal indicator (Domestic Savings) for the example in Step 1. Causal indicators have high-scoring TEP frequency distributions (Score is 1.0 in the example chart below). They show us that nations with desirable values advance 100%, and nations with poor values collapse 100%.
As an aside, the Domestic Savings indicator was identified as a causal indicator using another high-scoring indicator, Trade Balance. Trade Balance’s threshold value is located at 62% of its data sample.
How did I know that Trade Balance was an important measure? I observed that it was a factor in other evidence-based reports … Notice in these two charts how Income Equality reduced Health and Social Problems in nations with positive trade balances, where trade deficit nations were highest in Health and Social Problems.
By using Trade Balance <> greater or less than zero (0) – “Advancing” was assigned to all nations with trade balances greater-than zero, and “Collapsing” was assigned to nations with negative trade balances (less-than zero). We decided that Germany and Japan are “Advancing”, but that Canada and the U.S. are “Collapse Trending”.
Transition Economics calls the Trade Balance TEP-Chart’s red line “Advancing Economies”, but any Causal indicator would let you build new criteria to determine Advance or Collapse.
Now we have a table of chart-able values for 187 countries.
In line 1 of the chart we see that Canada is a collapse-trending nation with a Gross Domestic Savings value of 21.5% of its GDP
Nation Status (based on Trade <> 0.0) Gross Domestic Savings (%GDP)
Canada Collapsing 21.5
United States Collapsing 18.2
Germany Advancing 27.2
Japan Advancing 24.5
… for 187 countries
This frequency distribution TEP Chart now shows at left that 30 nations with Savings less than -13% were 100% collapsing; while 15 nations at right had Savings greater than 43% (to 79%) and were 100% advancing.
What does this data tells us?
Note from the TEP Chart’s frequency distribution of 187 nations, that 100% of high-Domestic Savings nations are advancing and 100% of low-savings nations are collapsing. The amplitude of this chart is 100% from bottom to top; we can calculate this because it has a maximum value of 100% (= 1.0) minus a minimum value of 0.0 – to equal the highest amplitude possible – 1.0.
This would rank Domestic Savings as either #1, or as one of the highest-amplitude indicators, because the very great majority of measures – for other economic and social indicators – rank much lower than 1.0.
This TEP Chart report is now either telling us that Savings is a causal indicator, or that high-savings nations simply export more than import – as a coincidence.
Here is where science, correlating reports, and observation come into play. Is the standard of living and production in Germany higher than counterparts in Japan, Canada, and the U.S. by observation? Perhaps a comparison between highest savings nations (Ireland, Qatar, Brunei, Luxemburg, and UAE) versus lowest-saving nations (Somalia, Haiti, West Bank and Gaza, Zimbabwe, Central African Republic) confirms coincidence or causality.
Trade Surplus nations have higher Social Contracts (lower social problems and inequity) similarly – shown here by a report created by an Edinburgh University Research Team in 2013. Individuals and nations with more savings and fewer social problems have the potential to be more productive, and almost all of the countries on the lower half of this list have trade surpluses consistently.
On the TEP Sheet for Domestic Savings below, we see how Domestic Savings %GDP looks based on a frequency distribution TEP Chart of SCP values >5.0 (5.0 is a value determined by the Science of 70%). The TEP Chart’s amplitude is .7 (70% minus 0) so it’s not as high as the amplitude above, therefore its lower-ranking TEP Chart but still holds valuable information.
I use TE’s data science tool MEMS to find highest-ranking TEP Charts and then the WAOH library posts this as searchable sheet scores.
Step 2: Building “National Savings” TEP Charts
- To build this TEP Chart yourself, you can start with a basic excel template or upload a much larger sample pack on the About WAOH page – under Contributions (or click here). There are a lot of examples in the larger large pack and you can create your own frequency distribution charts easily here as well.
- Here in the spreadsheet is a TEP chart that I created using the process described above
- Learn to use your new Threshold definition. As you work with the TEP charts created by the data, we might decide to tweak the threshold value to use 65% or 75%. This chart had a too-low “Average Advancing” of 24% so I reduced the Domestic Savings value to 25 (from 27.5) and this improved Average Advancing to 32%.
38% Average Advancing is a good target, so I could have lowered the threshold again and this would have lifted the score of this and all 1600 other indicator reports. Again, as long as only a dozen or two-dozen TEP Report per 1600 indicators are showing scores of 1.0, your threshold value for advance is valid
Our MEMS tool analyzes 1600 indicators and does all of this heavy number-crunching and chart work over approximately four-hours, with present configuration assumptions.
As you can see, this can be a time-consuming one-chart-at-a-time analysis, but the Science of 70% works well to find causality within a very large dataset using this testing method.
By measuring the amplitudes of TEP Charts created by National Savings, we can easily rank the top correlations to the lowest rank using this causal indicator now as well.