Semiconductor Engineering sat down to discuss AI and its move to the edge with Steven Woo, vice president of enterprise solutions technology and distinguished inventor at Rambus; Kris Ardis, executive director at Maxim Integrated; Steve Roddy, vice president of Arm’s Products Learning Group; and Vinay Mehta, inference technical marketing manager at Flex Logix. What follows are excerpts of that conversation. To view part one, click here.
SE: Ideally you want to design hardware, software, and algorithms together to maximize efficiency, but if your design is highly specific you’re also closing it off to changes to algorithms or neural networks. How do you account for that?
Ardis: Do you mean a change in the updates to the models and things like that? If I’ve got what I think is the world’s best cat identifier, next month the cat identifier 1.1 comes out?
SE: But some of these devices are going to be in the market for a years, at which point many of these systems will be completely outdated. How do you update it if you designed it that tightly?
Ardis: This goes back to the classical problem of an embedded microcontroller at the edge, whether or not it has a CNN accelerator. It’s a reprogramming. You start to get into things like, how do you securely validate that the new weights, rather than the new executable coming over the air, are appropriate. Do they come from a trusted source? A lot of good security knowledge exists to do things like that. But I also have seen through the years that when you’re running to chase the technology, you sometimes forget about the security side of things. The danger is you have people rushing to update models and not building in the proper security infrastructures to help you go to ‘Cat 1.1’ or ‘Cat 1.2.’ The over-the-air part is relatively solved. It is more about people actually implementing it.
Roddy: From a silicon standpoint, people will architect very scalable, flexible systems. A variety of specialized architectures are available for accelerating neural nets, but those are always paired with one or more different programmable elements. In more complex systems, you’ll have CPUs, GPUs, and some sort of dedicated NPU. Or in an embedded system, it will be, say, an M class microcontroller and a dedicated, maybe even specialized, neural network accelerator. You have that flexibility to reprogram or support a new operator or slightly new topology. Even if your accelerator can’t handle the new layer type, you fall back to the processing element that’s right next to it. And that tends to be the flexibility built in. As long as ‘Cat 1.2’ is roughly similar in approach to ‘Cat 1.0,’ you’re fine. As long as the math underlying it doesn’t radically change in that 5- or 10-year period, you’ll have some resiliency in the system.
Ardis: That’s a good point. How much foresight did the designer put into how much capacity will be needed in the future? That applies to memory storage, neural network acceleration, and whether you need a temperature sensor to make sure it’s a live cat. So there’s foresight required. What do you think you need to do in the future?
Roddy: It may have a network where you have highly optimized it. You do 4-bit weights at 8-bit activations, you’ve got an idiosyncratic specialty accelerator. It may be sitting next to a CPU, like an Arm M class CPU. So just in case a new 16-bit precision operator comes along, that is what you need for those ‘live cat/dead cat’ delta. You’ve got that flexibility built into the platform.
Mehta: But is this all assuming that your original inputs image or your original input data, never changes? So, on the source of something like embedded — embedded processors and embedded sensor solutions — the hardware never gets updated. Is that a fundamental assumption here?
Roddy: If I’ve deployed a system, and I’ve got a camera and an audio input, I know the bit precision of the camera and the audio input. Something could come along later that’s a different type of image sensor or radar or LiDAR, and that has different positions, different data rates. But for that, they got a whole different hardware platform. So I get to spin the silicon that goes with that. And that may replace everything, so all my smart cameras might come down off the light poles and something replaces it that’s better or cheaper. But at least it requires a truck roll to make that happen. But how do you do that securely? How do you flash a new image? How do you make sure that it wasn’t hijacked along the way? And how do you make sure no one’s cracking and reading the weights or the model itself. With digital TVs, for example, we see a lot of people wanting to do fancy upscaling of older material to higher resolution. But that content may be owned by a big media company. They want to protect the content, and maybe you want to predict your algorithm for that super resolution enhancement. A lot of those things have to be tackled along the way with a systematic review of the hardware, the software, the trusted firmware, and even the connection back and forth to the net.
Woo: Yes, and when upgrading a system, there’s so much more than just thinking about the actual neural network itself. You have to think about the whole infrastructure with which you’re doing your security. The data, of course, is always valuable. But even the model and the training infrastructure are really valuable, especially for these cloud-based infrastructures that people are going to be sharing. Layered on top of that, you have this problem of how much flexibility do you really need to be able to change things. On one hand you see people doing things like FPGAs — like Flex Logix’s embedded FPGA technology. Those things are great for larger-scale changes. They allow you to keep the hardware in the field, so long as it’s the right size and things like that. A lot of it is really about what are the scale of changes that are going to come along. Those kinds of things really can cause the technology to become obsolete very quickly. In other cases, things like FPGAs — they’ve been around for a long time and they’re great because they’ve been used to adapt to the changes we’ve seen so far. The one really interesting trend we’re seeing, especially in the data center, is the sizes of the models. The training sets are growing so enormous that you’re starting to see changes in the overall system architecture just to accommodate the data set sizes and the sizes of the models. It was hard to predict 5 or 10 years ago that you would get to this scale. One presentation I saw recently showed the size of models growing about 10X a year. It’s faster than every other technology trend, so that kind of thing drives change and makes deployed systems obsolete much more quickly — at least for training. I’m sure there’s other kinds of curves for things like inference, but those kinds of things will drive how quickly those technologies become obsolete in those locations.
Roddy: The largest training sets and training runtimes are exponential curves. They overlay Moore’s Law versus the complexity of training runs for neural nets, and it is just a staggering. It’s doubling every three months versus doubling every 18 months. There’s a lot that has to go on to basically keep pace.
SE: Does sparsity make it harder or easier to adapt and update systems?
Woo: I’ll cheat here and say it’s actually both. Part of the reason why sparsity is great is the fixed resources that you have, like memory capacity and on-chip storage, can go a lot further in terms of how long they’ll stay relevant in the market. But now you have changes as to how you actually use that storage, so it’s no longer the regular array type computations that you would do in something like a systolic array or a kind of a SIMD engine, because you’ve got these holes where you’re trying not to do some compute. The question is, ‘Do you have enough flexibility inside your compute engine where it really can take advantage of that type of sparsity?’ As the datasets get much larger than the level of sparsity, you’re going to want the sparsity to improve, as well, because that will make your fixed capacity stay in the market longer. And then the question starts to become, ‘Is the way that you’re implementing the calculations on this sparse data appropriate for an even higher level of sparsity?’ It makes it easier in some sense to keep the hardware in the field. It makes it harder in some sense to continue to make those calculations efficient and relevant.
Ardis: There’s also a tradeoff between flexibility and power. The more flexible you design something, the more processor involvement, the more power you burn. You must consider what’s the underlying hardware. Is it worth the power tradeoff for having that flexibility in the field?
Roddy: Yes, we’ve seen good results from highly sparse networks, particularly with clustering and pruning to get down to very sparse sets of weights, and then compressing those weights, storing them in the edge device, where we compress our architectures. We also store them in on-chip localized SRAM at a compressed level, expanding them back out for the actual consumption in MAC compute arrays. So you get good utilization of scarce memory on a device for a large deployed network. At the end of the day, whether it is the data center or the edge, most machine learning problems are data movement problems, not really density of compute problems. If you’ve got 80 million weights in a model you’re trying to run on the edge, it’s all about how often you thrash that in and out of your DRAM in your device. It’s not about how many MACs per square millimeter you can squeeze in. It’s how much do you hose the DDR connection on your chip over and over and over again because you’re running 60 inferences per second in a car, or something of that nature.
Woo: I’ve worked at Rambus for a long time. When people came to us looking for high-performance memory for AI, we would usually start the conversation by asking how much memory bandwidth they wanted. They would say, ‘You can’t possibly give me what I want, so why don’t you just tell us what you can give us.’
SE: Does the IoT term still apply, or is everything now part of the edge because we want some level of intelligence in all devices?
Ardis: I always bristled at the term IoT because I thought it was really just the natural evolution of gadgets getting smaller, compute power going up, and the availability of radio. This is the same trend. It’s processing at lower power. Things will run off a battery for longer. But now you’ve got this new idea that you can train these neural networks at relatively low power and so you start to think, what if my thermostat can see me, or even my doorbell? You start to brainstorm because the technology is going to more advanced process nodes with better techniques for accelerating it. But it’s a natural evolution of just pushing technology closer to the interaction points with people.
Roddy: From an overall architecture standpoint, the first generation that we all call IoT was sticking a radio on a sensor of some sort. Now we realize we can put small, efficient machine-learning enabled compute in between the sensor and the radio and only transmit when something useful or interesting has actually occurred. That has lots of benefits to the power characteristics of that device. It’s a lot easier to compute something than it is to beam it across to some sort of aggregation point in your house or in your industrial environment. So, yes, you’re going to see this rapidly transition to where most critical IoT things have some form of AI embedded in.
Ardis: The power cost of moving a bit of data increases exponentially from moving it across a transistor to transferring it from San Francisco to Washington. That tells you why you want the AI at the edge, and why you only want to turn on the radio from time to time.
Woo: Yes,AI is a natural next step for pervasive connectivity. When you’ve got just a couple devices in your home, transporting that data all the way up to a data center makes some sense. But once you start having lots of these devices and are generating so much more data, it’s the power and the cost of moving that data that becomes difficult. If you had a petabyte of information to move from like Los Angeles to New York, it’s actually cheaper and faster to copy it to a fleet of hard drives, and put it in the belly of a 747 and fly it across the country, than it would be transferred across the Internet.
SE: So now we’re getting into some fairly sophisticated designs. Are people going to pay for this? And if not, how do we get that cost down?
Ardis: It will depend on each use case. It’s going to have to bring a feature that solves a problem, for it to be useful. An example came up to me recently. There was a pairing problem. Which sensor goes with which person, and the cost of getting that answer wrong by human error of configuring that system was great enough that somebody was considering putting a camera on each sensor to look at the person and say, ‘Okay, I’m paired with Kris because I see him.’ It’s on a case-by-case basis. It is probably decades out, but I see interest from payment guys looking at facial ID as a form of authentication. You’re solving a big fraud problem there potentially, so there has to be some money that you’re saving in the system or some benefit that someone will pay for to be able to bring this technology in.
Roddy: It’s about overall cost of ownership, or cost of the complete system. I saw an analysis about glass breakage sensors, and you think, ‘What does that mean? Listening for sound? Wiring up, say, an industrial building or an office building for listening for doors opening or glass being broken from a security standpoint?’ You think, ‘Wow, who wants to pay a couple of bucks per module for every hallway or for every for every pane of glass?’ That’s until you stop and think that in a large environment you’ve got a dozen buildings unoccupied at night and you have to keep four security guards on the payroll so they could patrol around in a golf cart and watch everything. Or, you only need one or maybe two security guards, and they simply check electronically with the sensors. The guards can go to the trouble spots, and maybe they only have to make a loop around the complex once every couple hours instead of every half hour. At that level, suddenly you’re saving an entire two-person payroll, and the sensors pay for themselves. In so many cases it’s things like that coupled with the overall services attached to the devices. It becomes very effective to measure productivity of an entire business operation or business environment and to look at the benefit of these connected smart, AI IoT devices.
Ardis: I would pay extra to have my robot vacuum cleaner not get in a fight with my office chair every time the robot goes on. Another one that bothers me is the in-house consumer security camera system. It tells me every time the sun comes up goes down because it thinks there’s somebody moving through the house. I would pay an extra few dollars for that thing to also not give me false alarms, like when I’m traveling. In Asia, I’m getting an alarm at all the wrong times a day for me. I will pay an extra $3 not have to deal with this mess anymore.
Roddy: I had a recent situation where my garage door opener died. It was old and needed to be replaced. I put a new one in, and there’s an option to have a WiFi module on it. My first thought was why would I want that? I’m not going to open and close it with my phone. But then I realized that when I drive away and wonder whether I closed the garage door, I can just look at my phone.
Woo: As an industry we’re used to having Moore’s Law provide us more transistors, and so some of that real estate and some of the silicon has traditionally come for free in terms of area. It is just a question of what you choose to do with it. Implementing things like better security or upgradeability that allow you to keep the device in the field longer does factor into the TCO, as well. There are two sides of the issue. One is about the new features that you can implement, and one is about that upgradeability keeping you in the field a bit longer, contributing to a better TCO. As an industry, we’ve been lucky to be able to ride Moore’s Law to use additional silicon real estate to do those things to improve upgradeability and add new features. Hopefully that will continue into the future and allow us to use the same sort of techniques that we’ve relied on in the past.
social experiment by Livio Acerbo #greengroundit #thisisnotapost #thisisart