Network Automation and Configuration Management
The way applications are deployed today is so different to the way application were being deployed 10-15 years ago. So much has changed with the app. The problem we are seeing today is that the network is not being tightly coupled with these other developments. The provision of various network policies and corresponding configurations are not tightly coupled with the application. Most of the time they are loosely coupled and reactive. To analyze firewall rules and provide a network assessment is nearly impossible with old security devices. There are always hundreds if not thousands of outdated rules still there even though the application service is not required. Another example is unused VLANs left configured on access ports posing as a security risk. The problem lies in the process: how we change and provision the network is not tied to the application. It is not automated. Inconsistent configurations tend to grow as human interaction is required to tidy things up. People move on and change roles. You cannot guarantee the person creating a firewall rule will be the engineer deleting the rule once the corresponding applications are decommissioned or changed. And if you don’t have a very strict change control process, deprecated configurations will be left idle on active nodes.
The network is critical for business continuity, which results in real pressure for uptime. The operational uptime is directly tied to the success of business. This results in a manual fashion. People are scared to automate anything that interacts with network equipment. The culture that manifests is manual and slow. The true bottleneck is our manual culture for network provision and operation.
Virtualization – Beginning the Change
Virtualization vendors are changing the manual approach. For example, if we look at basic MAC address learning and its process with traditional switches. The source MAC address of an incoming Ethernet frame is examined and if the source MAC address is known it doesn’t need to do anything, but if it’s not known it will add that MAC to its table and make note of the port the frame entered. The switch has a port to MAC mapping. The table is continually maintained and MAC addresses are added and removed via timers and flushing.
Vswitch operates differently. Whenever a VM spins up and a VNIC attaches to the vswitch, the Hypervisor programs into its process on the Vswitch everything it needs to know to forward that traffic. There is no MAC learning. When you spin down the VM, the hypervisor does not need to wait for a timer. It knows the source is no longer there and; as a result, it no longer needs to have that state anymore. Less state in a network is definitely a good thing. The key point is that the provision of the application/ virtual machine is tightly coupled with provisioning of network resources. Tightly coupling application to network resources / provisioning offers less “Garbage Collection.”
Box Mentality Vs Big Switch Networks
Network agility is affected by the current box mentality and as Ivan Pepelnjak calls it “box hugging.” When the contents of HLD / LLD is completed and you are now moving to configuration stage, the current implementation-specific details is done on a per-box basis. The commands are defined on individual boxes and are vendor specific. Functionally, this works and it’s how the Internet was built but it lacks agility and proper configuration management. Many repetitive task carried out with box mentality destroys your ability to scale. Business are mainly concerned with agility and continuity, but you cannot have these two things with manual provisions. You need to look at your network as a system and not individual boxes. Similar to what Big Switch are doing with their product sets. Their company moral is that you manage your network like “One Big Switch.” They have product sets called Big Tap Monitoring Fabric and Blog Cloud Fabric. Kindly review my recent post on Big Switch SDN.
When you look at applications and how they are scaling, current network style implementation method does not scale and keep in line with the apps. A move to network automation and automatic interaction is the solution.
The Move To Automation
We must move out of a manual approach and into an automated approach. Focus initially on low hanging fruit and easy wins. What takes engineers the longest to do? VLAN and Subnet allocation sheets ring a bell? We should size according to demand and not to care of the type of VLAN or the Internal subnet allocation. Microsoft Azure cloud is a perfect example. They do not care about the type of private address they assigned to internal systems. They automate the IP allocation and gateway assignment so you can communicate locally. Designing optimum networks to last and scale is not good enough anymore. The network needs to evolve and be programmed to keep up with app placement. The configuration approach needs to change and we should move to proper configuration management and automation.
Think of your network as a virtualization engineer thinks of servers.
SDN: A Companion To Network Automation?
One benefit of Software Defined Networking (SDN) gives is that it lets you view your network holistically; a central viewpoint. Network automation is not SDN and SDN is not network automation. They actually work side by side and complement each other. SDN gives you the ability to abstract and prevent those that do not need to see the detail from not seeing it. The application owners do not care about VLANs. Application designers should also not care about local IP allocations if they have designed the application correctly. Centralisation is also a goal for SDN. Centralisation with SDN is different to control-plane centralization. Control plane should not be fully controlled by central SDN controller devices. SDN companies have learned this and they are now allowing some or part control plane operations to be handled the network nodes. Especially time-sensitive control plane protocols.
Network Interaction And Network Programming
You don’t need to be a programmer, but you should start to think like one. Learning to program is going to help you adapt to these changes. If you learn to program you will be better equipped to deal with things to come. Programming networks is a diagonal step to what you are doing now; offering an environment to run code and ways to test code before you run it out. The most dangerous approach to device configuration is the current CLI, you can even lock yourself out of a device. Programming adds a safety net. It’s more of a mental shift. Stop jumping to the CLI and THINK FIRST. Break the task down and create workflows. Workloads are then be mapped to an automation platform.
TCL and EXPECT
TCL ( Tool Command Language ) is a scripting language that was created in 1988 at UC Berkeley. Its aims to tie together Shell scripts and Unix commands. EXPECT is a TCL extension written by Don Libes. It is used to automate Telnet, SSH, and Serial sessions to perform many repetitive tasks. EXPECT’s main drawback is that it is not secure and is synchronous only. If you log onto to a device you are displaying login credentials in the EXPECT scripts and you cannot encrypt that data in the code. Its operates sequentially, meaning you send a command and wait for the output, it’s not send send send and wait to receive, it’s a send and wait, send and wait mythology.
SNMP Has Failed | NETCONF Begins
SNMP is used for fault handling, monitoring equipment and retrieving performance data but very little are using SNMP for setting configurations. More than often, there is not a 1:1 translation between a CLI configuration operation and an SNMP “SET.” It’s hard to get this 1-2-1 correlation. As a result, not many people are using SNMP for device configuration and management of configurations. CLI scripting was the primary approach to making automated configuration changes to the network prior to NETCONF. CLI scripting has several limitations including lack of transaction management, no structured error management and ever-changing structure and syntax of commands that makes scripts fragile and costly to maintain.
People make mistakes and ultimately people are bad at stuff. It’s the nature of the beast. Human error plays big role in network outage and if a person is not logging in doing CLI, the less likely they are to make a costly mistake. Human interaction with the network is a major cause of network outages.
NETCONF & Tail-F
NETCONF ( network control protocol ) is an XML-based data encoding for configuration and protocol messages. It offers a secure transport and is Asynchronous so it’s not sequential like TCL and EXPECT. Asynchronous makes NETCONF more efficient. It allows the separation of the configuration from the non-configuration items. Things like backup and restore are difficult with SNMP as you have no idea what fields are used to restore. Also, because of the binary nature of SNMP, it is difficult to compare configurations from one device to another. NETCONF is much better at doing this. It offers a transaction-based approach. A transaction is a set of configuration changes, not a sequence. SNMP for configuration requires everything to be in the right sequence / order. But with a transaction you just throw in everything and the device figures out how to roll it out.
What matters really is operators can write service-level applications that activate service level changes and don’t have to make the application aware of all the sequence of changes that must be complete before the network can serve application response and requests.
Check out an interesting company called Tail-F (now part of Cisco) who offers a family of NETCONF enabled products.