The New AAA: APIs, Authentication, and Availability

24.08.2016

Blog by Lori MacVittie, Principal Technical Evangelist at F5

pokemon-no-go_thumb You might have figured out that yes, I’m one of those people playing Pokémon Go. Or, as is often the case of late, not playing Pokémon Go. That’s bad, because it also means our youngest is not playing, because as it turns out we’re both using Pokémon Trainer Club (PTC) accounts to play, not Google accounts, and we can’t get in.

That’s significant, because while the two of us are frustrated by our inability to “authenticate” and login to play, my husband chose to use his Google account and, well, he’s not having the same issues.

Which got me digging into some technical concerns that should really resonate with every company, regardless of the app they’re launching. That concern revolves around the new AAA – APIs, Authentication, and Availability.

Interestingly, this quest began when I was reading an article in Forbes about tracking Pokémon in Pokémon Go. Yes, Forbes. It’s that big. In any case, that led to another article and another, with one speculating that the reason tracking was (at least initially) broken was due to a game update in which an API key was inadvertently left out of the tracking calls back to Niantic’s servers.

Whether this is the case or not, such a faux pas would, indeed, break APIs. But the thing I kept coming back to was that if I couldn’t login to my PTC account and play, why was it that I could switch to my Google account and get in easy peasy, lemon squeezy?

The API-Authentication Connection

Well, that got me digging around github and pawing through Pokémon Go APIs (I prefer Java, but Python is out there too, go crazy) and that finally made the ‘aha’ light go on. See, just about every API call in those repositories handles the same exception: LoginFailedException.

In other words, even a simple call to find nearby Pokémon may result in a LoginFailedException.

Which is really not all that surprising. See, monolithic web applications often track authenticated users via sessions, which often means a cookie that contains a session ID or some other token that the application checks before actually doing anything else (that’s a stateful architecture, for those following along). APIs aren’t that much different, in that each API call has to have a way to ensure the calling application (the user) is actually authorized to make the call in the first place. They have to be “logged in”.

Now, APIs often use API keys to achieve this. The key is generally checked against a user profile to ensure the call is authorized. Every call (in a statelessarchitecture). There are various reasons for such a decision, including the ability to rate-limit calls. Which is a big deal. Apigee’s State of APIs 2016report noted that 68% of APIs were taking advantage of quota management (which is also known as rate limiting, metering, etc…) In order to do that technical trick, one has to first know how many calls have been made in the past minute/hour/day, and thus it must be tracked somewhere safe (so users can’t manipulate it and trick the application into allowing more calls per time period).

In other words, APIs can be very taxing on authentication infrastructure because they have to verify status, authorization, and potentially apply rate limiting. That’s a lot of work.

Yet our often still-traditional app architecture mindsets don’t consider the impact of those extra calls on capacity. Those extra calls to verify and authorize, even if made on a period-basis to “refresh” a session, are going to put considerable stress on the authentication infrastructure. The same authentication infrastructure that is supporting login. It’s the same kind of stress we saw when browser limitations on connections per user were increased from 2 to 8. A single user was now consuming 8x the resources to access an application. Similarly, when considering the capacity needs for an app that relies on authorization on a per-API call basis, one has to do some math and figure out just how many Xs more that individual user is going to consume.

Failure to do so leads to, well, angry 8 year olds (and angry Loris, too, for that matter) when overwhelmed login services are what’s standing between them and that Pikachu they desperately want to catch.

Scaling ID and A Critical for Availability

Identity and access are critical app services. We’ve seen their importance rising in our State of Application Deliverysurveys for the past two years, and without giving away too much, we’re already seeing increases in our 2017 data for both identity federation and application access control. And it’s not just because of apps, it’s because of things, too, and the growing need to scale out the entire bready of identity services infrastructure to support more things, more users, more apps using APIs to interact with back-end applications.

Availability is often solely based on a measure of downtime. If the servers were up and working correctly, they’re available. It’s an inside-out perspective. But like security, we need to turn that measurement around and view it from the outside-in. Capacity counts, and merely being “up” and “available” isn’t enough. Services need to be “up” and “available” to everyone who wants to consume them. That means scaling as fast as your python scripts can execute.

It also means understanding the relationship between the various back-end services that actually implement the functionality presented by your APIs. Identity and access services are as critical to availability as the actual application itself. Availability, like security, is only as good as its weakest link. And if your identity services aren’t as scalable (or more scalable if your model is per-API call authentication) as the rest of your application, you’re going to find that availability is a significant problem, even if all your dashboards read “green” inside.

Because from the outside, we’re seeing “red.” Literally and figuratively.