A shifting cloud architecture
One of the most interesting things to observe within IT is how cyclical it can be - especially with regard to architecture. Like fashion, what was once considered old or outdated seems to always find a way back into the mainstream in one form or another - usually with a new method, process, or procedure married to it.
Take WAN architecture, for example. Historically speaking, networks were built with one thing in mind: getting user traffic to the data center (where business applications were housed). We did this through hub-and-spoke networks wherein the data center(s) became the hub site. Likewise, traffic from spoke sites would backhaul to the hub in order to reach the intended destination - be it the internet, or an internal application.
Fast forward a few years and this architecture was deemed to be wholly inefficient. As Software and Infrastructure as a Service vendors were on the rise, more and more traffic was destined to the internet - so diverting this traffic through a hub site negatively impacted user experience and application performance. The notion of a direct-to-internet WAN architecture (Software-Defined WAN) was then introduced wherein internet traffic could escape locally, instead of diverting through a hub site. While this greatly improved application latency and overall experience, security took a bit of a backseat since traffic no longer had an easy-to-control / easy-to-monitor chokepoint. Instead, organizations were forced to secure dozens, hundreds or even thousands, of spoke internet connections.
With security incidents on the rise, organizations sought ways to move back to a hub-centric architectural model wherein traffic funnels through a choke point - making it easier to enforce security policy, but (hopefully) without latency or experience penalties.
Today, with the introduction of colocation facilities, we’re back to a hub-and-spoke model, but with more efficient redirection. Using colocation facilities, customers can pin internet-bound traffic to regionally proximate colo sites that offer high-speed interconnects with top IaaS and SaaS providers. One could think of this new blended architecture as a multi-hub-and-spoke approach. Though internet-bound traffic has to incur a slight diversion penalty to flow through the nearest colo, we can outweigh the penalty with increased security, regionally proximate hubs, and high-speed connections to the intended destination.
So how does this relate to a shifting cloud architecture? Funnily enough, cloud architectures have undergone the same types of architectural transitions, and, in many ways, have mirrored the same trials that traditional WAN architectures endured - albeit, a bit backward. Several years ago, should your organization choose to embark on a cloud migration, you likely would have found yourself building multiple VNets or VPCs and servicing each VNet or VPC with its own set of dedicated resources - such as firewalls, proxies, load-balancers, etc. Each VNet/VPC was an island unto itself - not unlike a branch or spoke site.
Administrators quickly figured out, however, that this model was operationally expensive as well as difficult to scale out as cloud migration expanded. So, naturally, a hub-and-spoke concept was introduced to allow organizations to centralize shared resources (like firewalls) in a hub VNet/VPC - reducing operational burden and complexity. But even this was fraught with challenges since spoke-to-hub peerings often came with bandwidth limitations and added cost or complexity in their own right - particularly when organizations spanned multiple cloud regions.
Cloud providers, however, saw this challenge as an opportunity to improve. And thus, the concept of dedicated hub networking services - such as AWS Transit Gateway and Microsoft Azure vWAN - was born. You may notice that the concept of these services looks eerily familiar to the multi-hub-and-spoke WAN architecture discussed earlier. In the cloud model, customers can instantiate virtual hubs that are regionally proximate to the workloads they will service. Each hub is then interconnected to other hubs as well as the cloud provider’s global backbone, making it easy to provide high-speed transport around the globe and implement security in a scalable manner.
At present, many Zscaler customers are moving to such models as they continue their cloud journey. But how does one implement scalable zero trust security in this architecture?
Implementing zero trust with Zscaler Workload Communications
Zscaler Workload Communications is implemented in the form of cloud and branch connector appliances. These appliances are lightweight VMs that exist within the cloud or on-premises network and provide a highly-reliable transport service to the Zscaler Zero Trust Exchange. They’re easy to deploy, scalable, operationally efficient, and provide intelligent traffic forwarding for any workloads that use them as a gateway. They can be deployed anywhere within the cloud network, but quite often are deployed in a shared services model (hub-and-spoke) as this typically provides the best bang for your buck.
Likewise, for those customers just starting their journey with smaller footprints or more niche use cases, distributed Cloud Connectors are also supported:
In both models, cloud/branch connectors build dynamic UDP/443 (DTLS) tunnels to the Zero Trust Exchange. Traffic directed to the appliances will be encrypted with AES-256, encapsulated and sent on to the nearest Zscaler point of presence for processing. While this article focuses mostly on internet-bound traffic, cloud and branch connector appliances are also heavily leveraged to accommodate East → West (laterally moving) traffic. In this scenario, traffic moving between workloads in the cloud or branch is funneled through cloud/branch connector appliances instead of going direct. In a future blog post, we’ll explore how Zscaler’s Private Access (ZPA) solution, coupled with cloud and branch connector, can be provisioned to provide policy-driven, highly segmented, end-to-end security for this traffic.
Implementing Zscaler Workload Communications within Microsoft vWAN
Recently, Microsoft has been pushing many of its customers toward vWAN since it has now reached a maturity and feature level that warrants production use. For most customers, this generally means instantiating the vWAN service and building vHubs in each of the regions to which their cloud workloads reside. From there, spoke VNets are then peered with their respective regional vHub.
There are then two approaches to implementing Zscaler Workload Communications in this architecture. In both models, Cloud Connector appliances exist off of a VNet peering and are not instantiated directly within the vHub.
In the first model (loosely resembling the distributed WAN model described earlier), Cloud Connector appliances are placed adjacent to the workloads that they will service within the same VNet:
One might consider this model when granular control is needed on a per-VNet basis, such as when controlling laterally moving traffic or having a granular internet access policy. You’ll want to read up on your options when implementing VNet peering as well to account for how cloud workloads will (or will not) leverage Cloud Connector when contacting other VNet resources. Furthermore, it’s important to plan and account for growth in this model as compute and native resources (such as Standard Load Balancer, NAT Gateway, etc.) can become burdensome quickly when implemented across dozens, or hundreds, of VNets.
This model will also likely require two separate route tables: one for the workload(s), whose default route points to the load balancer, and one for the Cloud Connector, whose default route points at the “internet” (NAT Gateway):
Second is a shared services (hub-and-spoke) model. This is the most common deployment model for customers as it is more operationally efficient and scalable, and keeps costs at a minimum:
Here, spoke VNets send traffic to the vHub, which then directs the traffic toward the service VNet (Cloud Connector) for processing. In this model, we generally rely on the vHub route table to influence traffic from spoke locations to the Cloud Connector appliances by propagating a default route. Unlike the first model, however, route tables are not required in the spoke VNet or hub/shared services VNet. The spoke VNet will simply inherit its default route via vHub propagation and the hub/shared services VNet will use the “default” internet route that Microsoft installs directly. Be mindful, however, that you must disable default route propagation on the hub/shared services VNet. Else, the vHub default route will override the Microsoft-installed default route and create route asymmetry.
You may also wish to take a blended approach to these architectures (which is fully supported, by the way). Here, some Cloud Connector appliances are placed within the workload VNet (to service a specific use case or nuance), while other Cloud Connector appliances are placed within the hub/shared services VNet:
NOTE: Zscaler’s legacy “Secure vHub” solution has been deprecated and is no longer supported. Moving forward, we recommend that you proceed with Cloud Connector. Although Cloud Connector does not technically fall within Microsoft’s definition of “Secure vHub,” it offers higher performance and better resiliency than the legacy solution.
Things to keep in mind:
Regardless of which Microsoft vWAN architecture you wish to deploy, there are a few items that tend to cause trouble, if you aren’t careful:
Route tables - In a co-located model, route tables are required in spoke VNets in order to direct traffic through the local Cloud Connector appliances. In the second (hub-and-spoke) model, User-defined route tables are not required, but optional. If you wish to define your own route tables, use caution when installing routes as user-defined routes will override the vHub propagated routes.
Route propagation - In a hub-and-spoke model, when configuring the default route on vHub, be selective about where to propagate this route. Propagating it to the Cloud Connector VNet (hub/shared services) could cause a route loop.
Region selection - Ensure your vHub is in the same region as the standard load balancer and Cloud Connector appliances it will forward traffic to. Microsoft does not prevent a user from peering disparate regions, yet the default route will no longer correctly forward traffic to the load balancer (breaking outbound internet).
Network security groups - By default, Cloud Connector NSGs restrict inbound traffic from remote VNets, allowing only local VNet traffic. While this is acceptable in a co-located model (where Cloud Connector appliances are in the same VNet as the workloads), it will cause issues in a hub-and-spoke model. Hence, adjusting the NSG to allow traffic from spoke VNets will be required.
Domain Name System (DNS) - Consider how outbound DNS requests will be handled. Zscaler Private Access heavily leverages DNS to influence workload-to-workload traffic through the Cloud Connector. To do this, the Cloud Connector must have visibility into the original DNS request from the workload. In a co-located model, this may happen naturally as the DNS request passes through the Cloud Connector to reach the cloud DNS server. In a hub-and-spoke model, this can get tricky and may require implementing native tools like Azure DNS Private Resolver.
Conclusion
In this article, we explored some of the most common cloud architectures our customers are deploying within their cloud environments. Specific to Microsoft Azure vWAN, there are two (three, if you count the blended option) primary architectures that we recommend: co-located Cloud Connector appliances and centralized Cloud Connector appliances. Which option to go with rests heavily on the use cases and security requirements of the customer. Watch some of our educational videos and take Cloud Connector for a spin through our hands-on lab sessions. Click here for more details.
↧