Additional Funding Flows to Canadian AI Inference Material

AI inference hardware startup Untether AI has secured $ 125 million in new funding to push its new architecture to its early business customers in edge and data center environments.

Intel Capital had been a lead investor in Untether AI since its founding in 2018. When we delved into their architecture with their CEO in October 2020, the Toronto-based startup had already raised $ 27 million and was sampling its runAI200 devices. The team, made up of several former FPGA hardware engineers, were optimistic about the potential of custom ASICs for ultra-low power interference and apparently their investors are too.

This latest round of funding, led by Tracker Capital and Intel Capital, has also attracted a new investor, the Canada Pension Plan Investors Council (CPP Investments), which manages the country’s pension program money, with $ 20 million, with a total fund of over $ 492 billion.

The inference start-up is still in its early stages, but they have been successful in securing the systems integrator, Colfax, to carry their tsunAlmi accelerator cards for edge servers with their imAIgine SDK. Each of the boards has four of the runAI200 devices we’ve described here that Untether says can deliver 2 petaops of peak compute performance. In its own benchmarks, they say that translates to 80,000 frames per second on ResNet-50 (batch size 1) and on BERT, 12,000 queries per second.

The startup focuses on Int-8, an inference based on a low latency only server with small batch sizes in mind (batch 1 was at the heart of their design process). Company CEO Arun Iyengar (you might recognize his names from senior positions at Xilinx, AMD, and Altera) says they are researching NLP, recommendation engines and vision systems for heavy applications with fintech tops the list for markets, although he quickly pointed out that it was less about high-frequency trading and more about broader portfolio balancing (asset management, risk allocation, etc. .), because the AI ​​has real traction there.

At the heart of the single in-memory compute architecture is a memory bank: 385 KB of SRAM with a 2D array of 512 processing elements. With 511 banks per chip, each device offers 200MB of memory, enough to run many networks on a single chip. And with the multi-chip partitioning capability of the imAIgine SDK, larger networks can be split to run on multiple devices, or even multiple tsunAImi accelerator boards.

He also says their low-power approach would work well for on-premises centers performing large-scale video aggregation (smart cities, retail operations, for example). He readily admits that they start with these use cases instead of going bold with the ambition of finding a place among the sacred hyperscalers, but says there is enough market for low-power, high-power devices. performances that they will find their niches.

In the absence of public customers for its first silicon, the company is attractive beyond the financing and the uniqueness of the architecture. He has a few purebred people who support engineering, including Alex Grbic, who leads software engineering and is well known for a long career at Altera. On the hardware engineering side, Untether’s Alex Michael, also from Altera, brings decades of experience in integrated circuit design, products and manufacturing.

While the vendor’s word is that there is an explosive opportunity for custom inference devices in the data center and edge, it remains to be seen who the winners and losers are in the inference boot game. From our perspective, the edge opportunity has more wiggle room than the large data centers that we tend to focus on here at TNP and it will be a long and difficult battle to dislodge these high value customers ( high margin) of their CPU / GPU positions.

Subscribe to our newsletter

Featuring the week’s highlights, analysis, and stories straight from us to your inbox with nothing in between.
Subscribe now

Margie D. Carlisle