Data Science and Cryptocurrency arbitrage: How to profit from it

Recently, I have encountered a lot of discussions regarding “arbitrage” and how individuals are either executing it, planning to do so, or have made substantial profits through it, specifically in the realm of cryptocurrency arbitrage using bots that they have developed using tutorials found on YouTube.

Recently, I have encountered numerous ICOs that raised funds for the purpose of arbitrage, but they failed to address crucial aspects of the process and their teams lacked the necessary expertise in the field.

With a background in finance and engineering, I’ve seen and heard people from Silicon Valley, Hong Kong, and New York discussing the topic of “arbitrage” and their profits made through it. As someone with a deep understanding of arbitrage, high yield trading, and financial engineering, I decided to write this article to educate others on what arbitrage is, its different forms, and the opportunities available in the world of cryptocurrencies. My experience in this field comes from both my academic training (MBA in Finance and engineering background) and practical experience in structuring proprietary arbitrage CDO deals at Deutsche Bank and Lehman Brothers, the largest of which was worth $3bn USD. I was even interviewed by the Wall Street Journal in reference to Lehman Brother’s bankruptcy in 2008.

To put it briefly, the cryptocurrency market has offered ample opportunities for “deterministic arbitrage” until around late 2017 and still holds potential for “statistical arbitrage” and “regulatory arbitrage.” As a quant and data scientist in the crypto assets field, I find the most thrilling opportunity to be in “hashing arbitrage,” which encompasses elements of all the aforementioned forms of arbitrage, but stands on its own as a distinct concept.

Despite the presence of opportunities in the crypto space, those who are knowledgeable about them remain tight-lipped, avoiding publishing papers or openly sharing their code through methods like Kaggle kernel competitions. The attitudes towards knowledge sharing in Wall Street and Silicon Valley are diametrically opposed.

Merrian-Webster defines arbitrage as the following:

The nearly simultaneous purchase and sale of securities or foreign exchange in different markets in order to profit from price discrepancies

I won’t talk here about what I call “hashing arbitrage” mentioned above, and about regulatory arbitrage, I will just quote the definition from investopedia, which is pretty good:

Regulatory arbitrage is a practice whereby firms capitalize on loopholes in regulatory systems in order to circumvent unfavorable regulation. Arbitrage opportunities may be accomplished by a variety of tactics, including restructuring transactions, financial engineering and geographic relocation. Regulatory arbitrage is difficult to prevent entirely, but its prevalence can be limited by closing the most obvious loopholes and thus increasing the costs associated of circumventing the regulation.

Before delving into the distinction between deterministic and statistical arbitrage and discussing the type of arbitrage that most cryptocurrency traders are referring to, it’s important to first touch on “Market Making.”

To purchase or sell a financial product, individuals must go to an exchange where buyers and sellers come together.

The price at which a financial product can be traded is based on the supply and demand of the product at a given time and this determines the bid price for buying and the ask price for selling. When there are limited potential buyers or sellers, it may become difficult to trade the product and the product is then considered to be illiquid. This concept is depicted in the accompanying chart.

Exchanges require professional participants to maintain liquidity by offering a continuous bid-ask spread in the market.

The professionals who provide a constant bid-ask spread to the market to ensure liquidity are referred to as market makers, hence the name.

Market makers are neutral in regards to the price movement of a financial product, their profits come from the difference between the bid and ask prices, also known as the spread.

In order to offset the risk taken when trading on either side of the spread, market makers will look for ways to balance their position, such as by hedging with a different product.

A market maker must have a comprehensive understanding of the financial product they’re making markets in as well as its relationship with other similar products to effectively offset any associated risks.

The role of market makers has evolved over time due to increased competition and advancements in technology. To maintain a competitive edge, market makers now rely on computer algorithms and electronic exchange connections to continuously offer quotes on multiple products across various exchanges.

Major investment banks like JPMorgan, Morgan Stanley, and Goldman Sachs are leaders in market making, but have limited involvement in the crypto market due to regulations. However, it’s rumored that Goldman Sachs and JPMorgan have done some private crypto transactions and may increase their presence in the market soon.

Market makers must continuously invest in technology and human resources in order to stay competitive and maintain efficient financial markets. The evolving technology and competition have made the work of market makers more intricate.

Market makers play a crucial role in providing liquidity to the financial market and lowering the cost of buying or selling securities by narrowing the bid-ask spread.

With an understanding of liquidity and the influence of technology, we can now distinguish between two main forms of arbitrage: Deterministic Arbitrage and Statistical Arbitrage.

a) Deterministic Arbitrage happens when an investor takes advantage of a price discrepancy by purchasing and selling an asset at the same time. This approach aims to level out price disparities and maintain fair market value for the security involved. The investor plays a role in regulating the market through this arbitrage technique.

Given the advancements in technology for trading traditional securities, it has become extremely challenging for individual investors to profit from market inefficiencies. Market leaders, such as JPMorgan, Morgan Stanley, and Goldman Sachs, invest heavily in IT infrastructure and computerized trading systems to monitor price fluctuations in similar financial instruments. These sophisticated financial institutions act quickly on any pricing inefficiencies, often eliminating opportunities in a matter of seconds. The author previously worked on a team of quant traders, who structured these transactions using proprietary capital. Casual traders face significant competition when large institutional players, with substantial capital and cutting-edge technology, dominate the market.

The use of technology in stock trading can provide a competitive advantage, a trend that dates back to the early 20th century when hot railroad stocks were being traded. During times of volatility, savvy traders who had access to cutting-edge technology such as private telephones and telegraphs could receive real-time information on prices of certain railroad stocks in California and New York, allowing them to execute risk-free transactions and profit from the price difference between the two exchanges. The integration of capital and technology has allowed visionaries to take advantage of market inefficiencies.

Arbitrage in the cryptocurrency market is similar to that in traditional securities. Prior to late 2017, there was little to no institutional involvement in this asset class, offering opportunities for those with basic knowledge of finance, Python, and data analysis to potentially profit through what could be considered “deterministic arbitrage.”

In the past, I was part of a team that used to exploit arbitrage opportunities for my previous employer. I also attempted to take advantage of such opportunities in the cryptocurrency market by forming my own team. I teamed up with a former analyst from the Federal Reserve who holds a Ph.D. in Quant and has strong knowledge of C++, Python, and its scientific stack. Additionally, I enlisted the help of a talented Python and C++ coder from India who has assisted me on multiple projects. With our combined expertise in different domains, we started this project in Palo Alto while working on an AI project for a company that required quants with time-series analysis skills.

To invest our funds effectively, we first assessed the market and its associated risks by gathering real-time data for every cryptocurrency exchange. This involved obtaining tick-by-tick price and volume information for over 100 currency pairs, as well as a vast array of other data points related to the underlying blockchains and exchanges trading the coins. All this information was then stored in a NoSQL database.

The left graph displays the graphical user interface of the arbitrage tracker and execution code that we created in its early stages (January 2017).

I labeled an occurrence of arbitrage as a “Arb Event” based on its timing, and created a table with some of the data we gathered. Our study reveals that in terms of both frequency and magnitude, the most significant arbitrage opportunities in the past year occurred in the BTC-USD/USDT and BTC-ETH pairs.

The table shown below displays tick-by-tick data for the BTC-USD/USDT pair on various exchanges, averaged by minute, covering the period from 7/22/2017 to 8/23/2017.

For the period from 7/22/2017 to 8/23/2017, our analysis indicates that the average duration of an arbitrage opportunity was around 11 minutes, yielding an average profit of about 6%. Exchanges Exmo, OKCoin, and LakeBTC dominated with over 2/3 of all arbitrage opportunities during the selected time frame.

But we were curious and wanted to determine the reason behind these opportunities. Our analysis found that in more than 90% of the trades, a Chinese exchange was involved as one of the parties in the arbitrage.

What caused these arbitrage opportunities to occur? Did they persist or decrease over time?

On September 4, 2017, the Chinese government made an announcement that banned initial coin offerings (ICOs) within the country, resulting in a market correction that lasted several days. This ban was eventually expanded to include exchanges and some mining operations, with some operators being given a 30-day deadline to shut down their operations.

The Chinese government’s ban on ICOs, exchanges, and mining operations caused a significant decrease in the value of virtually all cryptocurrencies. However, it also led to a spike in profit opportunities in the crypto markets being monitored.

This decision by the Chinese government was driven by the high cost of energy subsidies that Chinese miners were taking advantage of by operating in rural areas of the country. There was a significant increase in energy consumption in recent years that was not correlated with industrial production, causing the Chinese government to take action and investigate the issue. As a result, companies like Bitmain (which dominates the ASIC market for mining machines) began exploring relocation to other regions such as Asia, North America, and Europe. This created arbitrage opportunities because Chinese miners, who were responsible for a large portion of cryptocurrency mining, had to sell their production at discounted prices on local Chinese exchanges, allowing operators of Chinese exchanges to profit from price discrepancies on exchanges outside of China.

So, let’s move on to explain “statistical arbitrage”.

b) Statistical arbitrage is a highly quantitative and computationally intensive form of trading that involves data mining, statistical analysis, and automated trading systems. Prominent hedge fund firms such as Quantbot Technologies, Bridgewater Associates (with assets worth 150 billion USD), and 2 Sigma are leaders in this field and have invested heavily in technology, hiring top quants from Wall Street and retraining programmers and computer scientists from Silicon Valley who have limited or no experience with time series analysis or financial domain expertise. However, these firms have limited involvement, if any, in the cryptocurrency market.

Statistical arbitrage has its roots in the pairs trading strategy, where stocks are paired based on their fundamental or market similarities. The strategy involves buying the underperforming stock and selling the outperforming one, with the expectation that the underperforming stock will eventually catch up to its counterpart. The goal is to find a pair of assets with strong cointegration.

In statistical arbitrage, constructing portfolios involves a scoring phase where each asset in the market is assigned a numeric score or rank that reflects its desirability. This process is similar to Google’s page rank, with high scores indicating a “go long” position and low scores indicating a “go short” position.

The portfolio construction in statistical arbitrage includes a risk reduction phase, in which assets are blended into a portfolio in precise ratios to eliminate risks. Nevertheless, it is important to be mindful of these risks, and that is where casual cryptocurrency traders tend to fall short.

The scoring mechanism used by quant shops and hedge funds is intriguing and kept confidential. Each firm has its own unique scoring formula, which is carefully guarded. The author also has their own scoring formula, which they have used in cryptocurrency trading and mining, as well as other assets, but with varying time frames.

Statistical arbitrage is a trading strategy that leverages statistical and econometric methods to generate trade signals. This approach has become a central focus of both hedge funds and investment banks, many of which have built proprietary operations around statistical arbitrage trading. The specifics of the strategy and its application are highly guarded and vary among institutions.

In my opinion, the traders who claim to have made profits from cryptocurrency arbitrage opportunities have not truly achieved arbitrage profits in the traditional sense.

According to the definition of arbitrage, it involves buying and selling an asset simultaneously. However, with the current process of verifying transactions in various blockchains, the execution is not “nearly simultaneous”. The best-case scenario is a 10-15 minute window of risk, which in the cryptocurrency market is much riskier compared to the stock market when adjusting for median volatility.

According to the definition of arbitrage, it involves simultaneously buying and selling an asset. However, the time it takes for transactions to be verified in different blockchains makes this difficult to achieve in the crypto market. Although some traders have seen some profit, they have taken on several risks, such as market risk exposure, credit risk from different exchanges, and operational risks. These traders may not have been aware of these risks or how to quantify them, leading to the illusion of arbitrage profits. In reality, any profits they’ve seen were likely due to luck and the lack of liquidity risk working against them. Everyone appears successful in a bull market.

To maximize the chances of realizing profits from arbitrage opportunities in the cryptocurrency market, it is recommended to target at least two of the three types of arbitrage outlined, namely deterministic, statistical, and regulatory arbitrage.

Our custom code allows us to analyze the order books and transactional accounts of cryptocurrency exchanges, providing us with valuable insights into the market. In addition, it takes into consideration various other factors.

  1. Indicate low risk entry and exit points,
  2. Detect outliers in price and volume data,
  3. Detect high probability of changes in volatility,
  4. Build optimal portfolios of assets to hold for a given time frame.

All of these is designed to outperform benchmarks on a risk adjusted basis.

So how well an A.I. managed crypto portfolio performs compared to, let’s say BTC buy and hold, or compared to a passive index strategy such as Bitwise’s Cryptocurrency Index fund?. The chart below (you can find the Tableau version here), shows the yield of $10,000 invested in each one of the strategies.

If you invested $10,000 in Bitcoin on January 1st, 2017, it would have grown to approximately $69,000 by April 5th, 2018. However, if you invested the same amount in Bitwise’s Top 10 Index, it would have grown to around $97,000. And, if you invested in an AI-managed portfolio using a statistical arbitrage approach, the same $10,000 would have grown to about $170,000. It’s important to note that these figures take into account factors like slippage and transaction costs.

I have witnessed venture capitalists financing startups in the fintech sector with theoretically sound models, but I fear that they may not hold up in crisis situations like the 2008 Lehman collapse or the LTCM crisis. Many of the machine learning models created by these startups do not adequately (or at all) account for stressed macroeconomic circumstances and potential operational hazards. I fear that some hedge funds entering the cryptocurrency space with newfound capital may also neglect to take these risks into consideration.

To effectively profit from cryptocurrency arbitrage, it’s crucial to first assess all potential risks and perform thorough out-of-sample testing before investing any capital. Having a solid understanding of the domain also helps, so if you lack professional trading experience, consider partnering with individuals who share your interest and have financial engineering experience in practical settings.

Latest articles


Related articles

Leave a reply

Please enter your comment!
Please enter your name here