From the engineering team behind TCPShield

Celebrating 10 years: A decade in review

Celebrating 10 years: A decade in review

On April 29th 2024, TCPShield just surpassed its 10th birthday to officially mark a decade in operation. This blog aims to outline trends we have seen in the industry over the last 10 years, how our architecture has evolved over time, and where we stand now. Throughout the IT industry, the gaming sector has continued to hold its #1 position for DDoS attacks in terms of complexity, size, and scale. Some of you may recall the record 623 Gbps Mirai attacks of 2016, of which at the time was the largest ever recorded on the Akamai (Prolexic) platform. Today, attacks of similar scale continue to be the norm on a daily basis, with attacks <10G being increasingly rare.

Going even further back to 2014, the year of our inception, Cloudflare received a record 300Gbps DNS amplification attack, which at the time was enough to cause congestion on LINX and any other key transit providers.

And unfortunately, gaming continues to outpace virtually all other industries in terms of attack traffic share. It is my belief that gaming is where the forefront of attack sizes and complexity originate, and trickle down to all industries from there. For attackers, there may be less of an incentive to go after more established industries, as there typically is not a ransom or even financial motivation, but rather entertainment value derived from seeing your favorite Minecraft server’s community outraged from downtime and an unstable playing experience. As layer 4 mitigation becomes more of a solved problem, intricate and unique layer 7 (application) vectors start to outpace in frequency and scale.

Attack Landscape

The reason these events are being mentioned is because for Minecraft, multi-terabit attacks have become the norm in the last 2 years. In Cloudflare’s DDoS report of Q3 2022, Wynncraft received a 2.5Tbps multi-vector DDoS attack. While we have seen attacks that have exceeded 1Tbps, we have in-fact seen attacks equally as crippling from a packet rate standpoint. In this blog we discuss the challenges faced on traditional design of mitigation infrastructure, where we stand now, and what incredible strides we have achieved over the last year that have allowed us to block even the most sophisticated of layer 4 attacks.

From an economic point of view, gaming is even more challenging to optimize, as the attack sizes seen towards victims are several orders of magnitude greater than seen in the financial or telecom sector, yet defenses need to be equal to or greater than the capacity typically deployed in these industries.

For TCPShield, what started as a very simple architecture 3 years ago has evolved to a distributed, global approach where careful considerations in routing, packet processing, and horizontal scaling strategies needed to be considered in order to be one step ahead of attackers in this landscape. Internally, one major shift in trends we have seen is the rise of stateful TCP attacks.

The objective with these attacks is to reach a favorable security policy whereby a 3 way handshake is performed with the victim, sometimes across multiple destination IPs at the same time.

Carpet Bombs

These attacks, sometimes referred to as “carpet bombs” involve flooding multiple IP addresses on the victim network at the same time. The result is to evade traditional approaches in detection that involve sampling traffic on a per /32 destination basis only. When building DDoS defenses, often carpet bombs have to be thought of as the rule, rather than the exception. Since 2023, our assumptions have been made clear, most attacks that are serious in causing disruption will be stateful and multi-destination in nature.

Carpet Bomb attacks send malicious traffic to multiple destinations within a subnet at a time, causing significant impact to the organization

Carpet bomb attacks have a few advantages for attackers:

  • Due to the low bitrate and packet rate per individual target, it often flies under the radar from traditional threshold detection approaches.
  • Even if you can detect individual targets, the idea of inserting hundreds or even thousands of distinct routes towards a mitigation device can overwhelm legacy mitigation systems.
  • A skilled attacker will ensure that their carpet bombs mimic legitimate traffic as closely as possible while maintaining consistent per destination throughput.

TCPShield today

Today, we operate a global, high capacity infrastructure spanning over 10 locations allowing attacks to be automatically dispersed across a large surface area. This is thanks to highly optimized domestic routing in key markets resulting in not just players, but attack traffic to route to the closest point of presence for scrubbing.

This “divide and conquer” approach means redundancy on a per-site basis, but also geo redundancy in the event of unexpected infrastructure failures such as cable cuts, hardware faults, and congestion. The result is much greater resiliency to the overall uncertainty the internet as a whole brings.

This design, known as anycast, also allows us to optimize routing for where it makes sense for backends as well. We have carefully chosen these locations to allow for sub 1ms best case scenarios where customers have backends in the same metros, or facilities our infrastructure is present.

In fact, while carpet bombs may seem novel today, they were still conceived as much as 8 years ago in the original Mirai source code with the introduction of the STOMP method, and subnet parameters for destinations were also introduced. I have yet to see any further innovation in layer 4 attack vectors, however the trends in device exploit methodology and malware spread techniques remain prevalent.

// For prefix attacks
if (targs[i].netmask < 32)
    iph->daddr = htonl(ntohl(targs[i].addr) + (((uint32_t)rand_next()) >> targs[i].netmask));

In recent years, the technical barrier to entry for those interested in building their own botnet, rather than simply renting access to one, has also lowered significantly, thanks to the abundance of open source. This poses a significant threat to victim networks without adequate protection. Although infecting thousands of devices to execute malware without the owner's consent may seem challenging, the ease of engaging in this crime can be astoundingly simple.

There is a common misconception that botnet operators are sophisticated and organized criminals, however the reality is quite the opposite! Open source code from previously successful Mirai variants prove increasingly effective. With the rise of ChatGPT, AI code generation allows those with even minimal knowledge to create their own malware. And in case it cannot, closed source models such as the Llama-3 400b model can do justice. These low, or no-code botnets continue to be successful, despite their lack of sophistication, which only adds to the problem.

Attackers with even very little skill level can orchestrate significant damage, where a lot of the botnet marketplace and business is conducted over Telegram, and access to these botnets can be rented for cents on the dollar.

Over the last decade, residential broadband speeds have also seen significant growth across many geographic regions, which has exacerbated the capacity problem for most DDoS mitigation solutions. What used to require bot counts as high as 300,000 to orchestrate a 600 Gbps assault can easily be achieved with 1/10th the amount, and sometimes even less with the rise of cloud based attacks which offer a 1:5000 ratio for typical output.

These trends tend to follow Moore's law - as transistor counts double every two years, this increase in processing power enables infected devices to output greater attack sizes towards victims.

For the year of 2023, Cisco’s annual internet report states an average speed of 110Mbps, this is a 4.45x increase compared to 2015, with an average of 24.7Mbps. IoT, the main inflection surface for botnets today, grew 2.4 fold from 6.1 billion in 2018 to 14.7 billion in 2023.

10 year traffic statistics for IX.br, Brazil's largest internet exchange

We have also seen further evidence of this growth in internet exchange membership and throughput as well. Take for example IX.br, which has seen consistent 2x growth year over year. As worldwide broadband speeds maintain this growth, the need for DDoS mitigation capacity to follow a similar trajectory cannot be understated.

How we blocked a 250Mpps stateful TCP flood

On September 6th 2023 at 20:13 UTC, TCPShield successfully detected and mitigated a record 250Mpps assault against a Minecraft community in eastern Europe. The maximum duration of this attack lasted at most 1 minute, with several hits throughout the day. We are satisfied to say that this attack caused no harm to the end customer or the network as a whole, exemplifying the state of the art mitigation techniques and architecture we have refined over the last year through rigorous R&D investments in layer 4 and 7 mitigation.

This attack marks the largest DDoS attack the TCPShield platform has received on record in terms of packets per second. What makes this attack particularly unique is that it is entirely in-session.

This means that unlike traditional DDoS attacks which simply open a raw socket and spews packets, this attack vector:

  1. Opens a three-way handshake with the victim on the application port
  2. Convinces the TCP stack on the victim side that their session is established
  3. Creates state table entries on any relevant mitigation systems
  4. Spews data over the raw socket, which due to source port reuse means these payloads are not only delivered to the operating system TCP stack, but actually the userland application

Stateful floods like these are increasingly challenging for defenders to mitigate by virtue that they aim to appear as close to legitimate traffic as possible, and given its stateful nature, traditional session tracking and rate-limiting approaches implemented by current DDoS mitigation vendors is simply not enough.

5-tuple, SEQ/ACK analysis

Packet captures of this attack revealed something particularly interesting. Despite singular source IPs sending traffic to multiple destinations, the actual 5 tuple flows for each destination IP remained the same. While we do not believe that the Berkeley socket API was being used for full transmission (as we should rarely see 1409 size ACK), we do believe that it was at least used for setting up the 3 way handshake so that the same source port could then be re-used when generating traffic. This traffic behavior is not particularly resemblant of what you would see from a compliant TCP stack on the Linux kernel for instance when using send().

One thing immediately stood out with this attack is each arriving packet that appeared had a sequence number with a delta that nowhere near resembled legitimate traffic, despite technically being stateful. For instance, the sequence number for a given packet N and its predecessor N-1 had an absolute value difference of as much as 1.8*10^9. Something was clearly off, as in legitimate sessions, subsequent packets should never see a delta more than the MSS (max scaling size) negotiated between client and server.

Sequence number deltas were analyzed by bucketing individual flows into a hashmap and then computing the delta by sampling subsequent sequence numbers in order of arrival time between those flows. We found on average an interesting distribution. Despite the sequence numbers appearing seemingly random, the delta hardly even resembles a normal distribution but something more linear with a standard deviation of +/-1.8 billion. Perhaps this is the distribution of C’s pseudo-random rand()?

Traffic from legitimate TCP sessions were also sampled for analysis, and the following sequence number distribution was produced:

Here, we have a power law distribution without any deltas above 1500 in most cases, with the only negative values being sequence number roll over. With this data we can easily approximate a sequence number range we expect for normal player connections.

Country and ASN breakdown

The geographical breakdown of this botnet was quite well distributed, but was overwhelmingly skewed towards Russia, Indonesia, and Ukraine based sources originating from 12389, 9299, and 13489 respectively. For our network, no metro saw more than 20% of the total attack traffic, with Singapore being the largest ingest site.

Per AS distribution paints a similar story, with the majority of sources belonging to Rostelecom from devices infected with the Zyxel CVE-2023-28771. It is unclear how many compromised or otherwise vulnerable Zyxel devices there still are today, however according to Onyphe there are over 209k publicly exposed devices.

Conclusion

Throughout the last decade, layer 4 attacks have remained relatively simple, however their size from both a packet rate and bitrate remains on a sustained trajectory. We have seen a considerable shift and increase in in-session carpet bomb attacks including leveraging cloud providers to achieve their objective. Given high packet rate attacks allow attackers to overwhelm the network devices between them and the victim, this trend is further validated with recent events demonstrated by industry leaders such as Akamai’s staggering 900 Mpps observed in APAC in March of 2023, and also OVHCloud’s most recent report on their 840 Mpps in the beginning of 2023.

As the battle between defenders and attackers continue, those on the mitigation side must consider careful selection in transit providers, international connectivity, and of course the infrastructure involved in facilitating the scrubbing of high volume attack traffic. Like in most engineering domains, there is a careful tradeoff between horizontal and vertical scaling to achieve this outcome. Therefore, as the computational throughput of IoT and network devices increases, this should in turn challenge assumptions in how DDoS infrastructure is designed and deployed. Over the last year we have made significant strides in this respect, and have one of the most robust L4 packet processing systems in the gaming industry.

On the vendor side, this should be a wake-up call to ensure devices that reach consumer hands are rigorously validated from manifesting potentially CVEs capable of widespread damage. In our case with the above attack, these devices were primarily Zyxel Devices, who recently had not one but two CVEs related to remote code execution CVE-2023-28771 and CVE-2022-44877. CVE-2023-28771 involved a command injection during key exchange while CVE-2022-44877 allowed attackers to gain remote access via the control web panel.

To our customers, protecting your online communities over these last 10 years has been a privilege, and your continued support and demonstrated loyalty is invaluable. We have enjoyed long term collaboration with customers spanning many geographies and use-cases, and it has been a pleasure collaborating with you all. We look forward to serving you for years to come!

Cheers,
Steven

Subscribe to TCPShield

Sign up now to get access to the library of members-only issues.
Jamie Larson
Subscribe