#

Saturday, September 23, 2017

This is a concept which is misunderstood by many people. It is really because of the topologies they use to learn this concept.

When the switch has two interfaces connecting to the same switch, and the cost to reach the root bridge is the same it will use the interface with the lowest number as the root port..

For an example, let's say SW-A has the lowest MAC address hence SW-A will be the Root..

What will be the Alternate Port here?
Is it the e0/1 of SW-B?
Yes

By just looking at the port numbers, we can say that..
What will be the Alternate Port here?
Is it the e0/3 of SW-B?
No
E0/1 of SW-B will be the Alternate Port..




This is where most people will answer incorrectly.. The port priority which is considered is not the port priority of the of the SW-B. It is the port priority of the BPDU sender. Which in this case is the Root Bridge (SW-A).. Sender's port priority is what matters..

If we look at the output of show spanning-tree; (I use actual hardware from here)




























Because the priority is 128 default which is equal in all ports it has boiled down to the port number.. Lower port number will be the root port..

Let's change the port priority manually on SW-A..

SW-A(config-if)#int fa0/2
SW-A(config-if)#spanning-tree port-priority 16
















Now you can see the Alternate Port is changed to Fa0/3 of SW-B..

Additional Note:-
Decision making process of STP is like the following..

(1) Lowest bridge ID: the switch with the lowest bridge ID becomes the root bridge.
(2) Lowest path cost to Root Bridge: when the switch receives multiple BPDUs it will select the interface that has the lowest cost to reach the root bridge as the root port.
(3) Lowest sender bridge ID: when a switch is connected to two switches that it can use to reach the root bridge and the cost to reach the root bridge is the same, it will select the interface connecting to the switch with the lowest bridge ID as the root port.
(4) Lowest sender port ID: when the switch has two interfaces connecting to the same switch, and the cost to reach the root bridge is the same it will use the interface with the lowest number as the root port.

There are 2 types of Bridge Protocol Data Units (BPDUs)

(1) Configuration BPDUs
(2) Topology Change Notification BPDUs (TCNs)

Using these 2 types of frames, STP does it's everything..

A BPDU frame has the following format..

















Let's see a normal operation..

Configuration BPDUs are generated from the Root Bridge (Root Switch) and flow outward along the active paths and move away from the Root Bridge. Which means Non-root switches will receive these from their Root ports & Alternate ports and forwards out only from their Designated ports..

Each designated switch changes the Root Path Cost, Bridge ID & Port ID of the BPDU they receive before they send them downstream..

Following image shows the propagation of the Configuration BPDUs in a switching network..

Let's see what happens when a link goes down..

For example, let's assume a link on S-5 (not the link connected to S-3) goes down.
Here is where the TCNs (PBDUs which can go upstream) were introduced.

TCNs are generated normally from Non-root switches and flow upstream towards the Root Bridge to inform the Root Bridge that the network topology has changed. Which means Non-root switches will forward out these only from their Root ports and receive only from their Designated ports..

The TCN is a very simple BPDU that contains absolutely no information that a bridge sends out every Hello Time seconds (this is locally configured Hello Time, not the Hello Time specified in configuration BPDUs which were set by the Root Bridge).

The designated bridge acknowledges the TCN by immediately sending back a normal configuration BPDU with the topology change acknowledgement (TCA) bit in Flag field set. The bridge that notifies the topology change does not stop sending its TCN until the designated bridge has acknowledged it.

Note that TCA is not a new BPDU.. It's just a Configuration BPDU with a bit changed..

Propagation of TCNs are indicated in red and the propagation of TCAs are indicated in green..

















Once the TCNs hit the Root, it also acknowledges the designated switch and it starts to send out its Configuration BPDUs with the topology change (TC) bit set.

These BPDUs are relayed by every bridge in the network with this bit set. As a result all bridges become aware of the topology change situation and it can reduce its CAM table Aging Time to Forward Delay.

Bridges receive TC BPDUs on both forwarding and blocking ports because they are actually Configuration BPDUs.

TC BPDUs are propagated like the following way.. They are also not a new BPDU type. Just the Configuration BPDUs with a bit change.

















These TC BPDUs are sent for a period of Max Age + Forward Delay seconds, which is 20+15=35 seconds by default. Here is why..

We deal with 5 timers in STP..

(1) Hello Time
(2) Max Age
(3) MessageAge
(4) Forward Delay
(5) Aging Time

You can see those 4 timers in  the show spanning-tree output..













Hello Time is the time frequency which the Configuration BPDUs are sent.. Default is 2s.
Max Age is the time which a BPDU is considered valid by the Root switch.. Default is 20s.
MessageAge is the BPDU's age since it was originated by the Root switch. At the root switch it is set to 0 and other switches increment this value by 1 before forwarding further..
Therefore remaining lifetime of a BPDU after receiving by a switch is (MaxAge - MessageAge)
Forward Delay is the time which it takes to move from Listening State to Learning State and Learning State to Forwarding State.. Default is 15s.
Aging Time is the time which a CAM table entry will be valid.. Switches flush the MAC Address entry from it's CAM table after this time expires.. Default is 300s.

The reason for the TC BPDUs are sent for a period of Max Age + Forward Delay is that it will force the switch to flush the old remembered BPDUs and clear the MAC address table.

You can see the CAM Aging Time changes to Forward Delay on show spanning-tree output by entering the following debug command and unplugging a port of a switch..
S-1-ROOT#debug spanning-tree events
















Note:- 

You will not see the receipt of TC BPDUs on debug, but all the switches in the network will receive them silently and change their timers and relearn the MAC addresses..

We found 4 BPDUs in the above STP process.

(1) Config BPDU
(2) TCN
(3) TCA
(4) TC

But there are actually only 2 types which are Config BPDUs & TCNs. TCAs & TCs are just Config BPDUs with only a bit changed in the Flag field of the frame..

Wednesday, September 20, 2017

Default mode of operation for Cisco ASA is the Routed Mode. It can also operate in a mode called "Transparent Mode" which allows ASA to monitor traffic while forwarding in Layer 2 domain.

IP address of the PC: 192.168.10.10
Gateway (e0/0 of R): 192.168.10.1

We are going to put an ASA in between PC and Gateway, which will not change any configuration in Gateway.. This is the use case of Transparent ASA. It will not be discovered by other devices in the network. 

Let's see how it is configured..

Remember to backup your configurations before changing the ASA mode from Router to Transparent as it will clear all the current configuration.

Enter the following command to change the firewall mode to Transparent.
ciscoasa(config)#firewall transparent

Create a BVI, to know more about BVIs please go here.
ciscoasa(config)#interface BVI 1
ciscoasa(config-if)#ip address 192.168.1.150 255.255.255.0

Configure inside interface grouping to the BVI,
ciscoasa(config)#int gig 0/1
ciscoasa(config-if)#security-level 100
ciscoasa(config-if)#nameif INSIDE
ciscoasa(config-if)#bridge-group 1
ciscoasa(config-if)#no shut

Configure outside interface grouping to the BVI,
ciscoasa(config)#int gig 0/0
ciscoasa(config-if)#security-level 0
ciscoasa(config-if)#nameif OUTSIDE
ciscoasa(config-if)#bridge-group 1
ciscoasa(config-if)#no shut

As soon as you enter the above commands, it will be displayed in the int ip brief output like the following..




Now the configuration is over. PC will forward the general traffic to internet via the gateway without knowing there is an ASA in between. But the icmp pings will not work until you configure your ASA to inspect them.. You can do it either by ASDM or CLI..
Following are the commands to do it in CLI,
ciscoasa(config)#policy-map global_policy
ciscoasa(config-pmap)#class inspection_default
ciscoasa(config-pmap-c)#inspect icmp

Note:- 

If there are switches in both sides of ASA, you will have to use Ethertype ACLs to allow BPDUs. By default ASA will not forward BPDUs..
If there are routers in both sides which uses a routing protocol like OSPF, you will have to allow multicast traffic in order to make adjacencies.
If you configured a DHCP in Gateway and PC is a DHCP client, you will have to do additional configuration in ASA to allow broadcast traffic.
You will not be able to terminate VPNs in this mode of ASA because the interfaces work as L2 interfaces..

You can view the current mode of ASA by the following command..
ciscoasa#show firewall

You can change the ASA  back to routed mode by the following command..
ciscoasa(config)#no firewall transparent

Tuesday, September 19, 2017

Authenticating using Passwords

This is the simplest solution for both eBGP & iBGP neighbors..
Following is an example for eBGP neighbors.









R1(config-router)#router bgp 1
R1(config-router)#neighbor 10.0.0.2 remote-as 2
R1(config-router)#neighbor 10.0.0.2 password cisco

R2(config-router)#router bgp 2
R2(config-router)#neighbor 10.0.0.2 remote-as 1
R2(config-router)#neighbor 10.0.0.1 password cisco

You can give encryption levels from 1-7 or MD5 hash as per the corporate security policy..
But still the routers are vulnerable to CPU DoS attacks as they check each and every malformed packet attacker sends..

Protecting eBGP Neighbors by Changing TTL

Routers send BGP packets to eBGP neighbors with a TTL of 1 by default which implies they should be connected directly. This is the security mechanism of eBGP. But an attacker can spoof this TTL value easily by fixing the TTL value to be 1 when the packets reach the destination router from a remote location.










In the above example, if the attacker set the TTL to 3, it will appear at R1 as TTL of 1 which means R1 will think the attacker is directly connected..

Additional TTL security command is like the following..

R1(config-router)#neighbor 10.0.0.2 ttl-security hops 1
R2(config-router)#neighbor 10.0.0.1 ttl-security hops 1

This will change the TTL of eBGP packets to 255. Both the neighbors will only accept the packets if the TTL is 255 only. Which means only directly connected routers will be able to try a peering..

Note:- 

ebgp-multihop is not a security command, it will only change the TTL to the given number which will only allow eBGP peers to accept packets which are lower or equal to that given value.
You can learn more about this command here. Still vulnerable to the above attack. ttl-security hops command is actually the reverse logic of ebgp-multihop command hence you cannot use both commands together..

Monday, September 18, 2017

BGP FSM (Finite State Machine) has 6 states to help in troubleshooting..

In summary, once you configured the neighbor IP address, router will try to reach that neighbor IP address on destination TCP port 179 using his routing table.

When TCP 3-way handshake completes, router will send an BGP Open message. (This message is similar to to the hello packet that EIGRP & OSPF use)

When the Open message has been sent and received and all other parameters match (like authentication) then the neighbors will reach the established state..


All the states are described in detail in the following..




Idle
The this state is the initial BGP state. In Idle state, the router refuses all connection requests from neighbors.

The router statrs a TCP connection with its BGP peer and goes to Connect State only after receiving a Start event from the system.

The Start event occurs when an operator configures a BGP process or resets an existing BGP process or when the router software resets a BGP process.

Connect
In this state, router starts the ConnectRetry timer and waits to establish a TCP connection.

If the TCP connection is established, the router sends an Open message to the peer and goes to the OpenSent state.

If the TCP connection fails to be established, the router moves to the Active state.

Active
In this state, the router keeps trying to establish a TCP connection with the peer.

If the TCP connection is established, the router sends an Open message to the peer, closes the ConnectRetry timer, and changes to the OpenSent state.

If the TCP connection fails to be established, the router stays in the Active state.

If the router does not receive a response from the peer before the ConnectRetry timer expires, the BGP device returns to the Connect state.

OpenSent
In this state, the router waits an Open message from the peer and then checks the validity of the received Open message, including the AS number, version, and authentication password.

If the received Open message is valid, the router sends a Keepalive message and changes to the OpenConfirm state.

If the received Open message is invalid, the router sends a Notification message to the peer and returns to the Idle state.

OpenConfirm
In this state, the router waits for a Keepalive message or Notification message from the peer.

If the router receives a Keepalive message, it goes to the Established state. If it receives a Notification message, it returns to the Idle state.

Established
In this state, the router exchanges Update, Keepalive, Route-refresh, and Notification messages with the peer.

If the router receives a valid Update or Keepalive message, it considers that the peer is working properly and maintains the BGP connection with the peer.

If the router receives an ivalid Update or Keepalive message, it sends a Notification message to the peer and returns to the Idle state.

If the router receives a Route-refresh message, it does not change its status.

If the router receives a Notification message, it returns to the Idle state.

Also If the router receives a TCP connection termination notification, it terminates the TCP connection with the peer and returns to the Idle state.


Note:- 

BGP message types (blue colored text) involved are;
1) Open Message (AS number, version, authentication password)
2) Keep Alive
3) Notification
4) Update
5) Router-refresh

Here is a table summary of the state machine


Thursday, September 7, 2017

If you are upgrading your ASA from version 8.2 or older to newer codes, you may have to worry about the NAT rules. The reason is that the older versions of ASA were depending on NAT rules to forward traffic. The newer don't.. So you will have to understand how to read those rules and understand what they were doing before migrating to the new ASA...

Let's see how the different types of NAT are configured in CLI in older ASAs..

Dynamic NAT & PAT

nat (inside) 1 10.0.0.0 255.255.255.0
global (outside) 1 192.168.1.10-192.168.1.100
global (outside) 1 192.168.1.101

1st statement is the match statement for incoming traffic.. It says that "If the source is coming from INSIDE interface & if the source IP is in the 10.0.0.0/24 range, put it into the NAT group 1"

2nd and 3rd statements here are action statements for the outgoing traffic..
2nd statement says the NAT group 1 should be translated to the pool starting from 192.168.1.10 and ends from 192.168.1.100 when it is going to OUTSIDE interface..
3rd statement says the NAT group 1 should be translated (PAT/NAT overload) to 192.168.1.101 when it is going to OUTSIDE interface..
This 3rd rule applies to the traffic after the dynamic pool is exhausted because the command is entered after the dynamic NAT statement (2nd)..
For the 3rd statement, you can give an interface instead of an IP address too..

Static NAT

static (dmz,outside) 192.168.1.175 172.16.0.5
static (dmz,inside) 172.16.0.5 172.16.0.5

Above are 2 static NAT rules..

1st one says when the traffic is moving between DMZ interface (source) and OUTSIDE interface (destination), the source IP 172.16.0.5 should be translated to 192.168.1.175..

2nd line translates to the same IP, which is called "Identity NAT"

NAT 0 Policy

In older ASAs, NAT was a mandatory feature. Which means there is an implicit NAT rule which NATs all the traffic which has not a specific NAT rule configured. (This is somewhat like the implicit deny rule at the end of ACLs) In some versions you can disable it using "no-nat control" command. Anyhow, now it is no longer there after 8.4 version..

In older versions, if you are not disabling it, you have to disable NAT for the specific traffic you don't want NAT to happen. As an example you will need to turn off NAT for the IPs of hosts which you configure IPSec site to site VPNs..
This can be achieved by using an ACL with nat 0 policy..

access-list NONAT extended permit ip any 57.234.195.128 255.255.255.192
nat (inside) 0 access-list NONAT

1st line is just an ACL which identifies the traffic.
2nd line says if the traffic match the NONAT range coming from INSIDE source, put it in the NAT group 0, which does not do NAT..

This traffic will be added to the NAT rules section in the ASDM as "Exempt"s, which means it exempt this traffic from being NATing by the implicit NAT rule which NATs all the traffic which has not a specific NAT rule configured.. You can see the above NAT rule at the 23rd line.













Translating both Source & Destination of an Incoming Packet

In these old versions of ASAs, you have to use 2 NAT rules to do this. Refer the below illustration..

This packet is coming from inside interface and goes out to outside interface. Source of the incoming packet is 1.1.1.1 and it should be natted to 3.3.3.3 while the destination of the incoming packet which is 2.2.2.2 should be natted to 4.4.4.4

Following static rules will do the job,

static (inside,outside) 3.3.3.3 1.1.1.1
static (outside,inside) 2.2.2.2 4.4.4.4

If the TCP/UDP port (service) should be translated too, you can do it on ASDM easily.

Monday, September 4, 2017

Flavor of dynamic NAT that maps multiple private IP addresses to a single public IP address using different ports..



















Let's configure PAT on R1;

Define inside & outside..

R1(config)#int e0/0
R1(config-if)#ip nat outside

R1(config)#int e0/1
R1(config-if)#ip nat inside

Create a pool for private IP range..
R1(config)#access-list 10 permit 192.168.1.0 0.0.0.255

Do the mapping..
R1(config)#ip nat inside source list 10 interface e0/0 overload

As soon as you enter the above commands, you will not see anything on nat translations & routing table like in static NAT.. But when the traffic is generated, they will start to populate..

When PC-1 is pining the server 203.115.41.221 & PC-2 is pining the server 203.115.41.221; following will be the output..


Inside local address – The private IP address assigned to a host in the inside network.
Inside global address – The public IP address which represents a host in the inside network.
Outside local address – The public IP address of a host in the outside network as it is seen to the hosts in the inside network.
Outside global address – The public IP address which represents a host in the outside network.

Above terms are local to the router.. Inside and Outside terms are adapted from the router's interface definitions (inside nat interface & outside nat interface.

You will not see a new entry for the public IP address in the routing table too to the outside interface like in Static NAT or Dynamic NAT as it makes no sense. Because the the public IP is already a connected IP..

Maps a local address with a pool global addresses..
Need to have one real public IP address for every private IP address..
Cannot permanently bind a public IP address with host like in static NAT..
When the pool is exhausted, router discards the translation..


















Let's configure dynamic NAT on R1..

Define inside & outside..

R1(config)#int e0/0
R1(config-if)#ip nat outside

R1(config)#int e0/1
R1(config-if)#ip nat inside

Create a pool for private IP range..
R1(config)#access-list 10 permit 192.168.1.0 0.0.0.255

Create a pool for public IP range..
R1(config)#ip nat pool DYNAMIC 203.115.41.110 203.115.41.120 netmask 255.255.255.0

Do the mapping..
R1(config)#ip nat inside source list 10 pool DYNAMIC

As soon as you enter the above commands, you will not see anything on nat translations & routing table like in static NAT.. But when the traffic is generated, they will start to populate..

When PC-1 is pining the server 203.115.41.221; following will be the output.


Inside local address – The private IP address assigned to a host in the inside network.
Inside global address – The public IP address which represents a host in the inside network.
Outside local address – The public IP address of a host in the outside network as it is seen to the hosts in the inside network.
Outside global address – The public IP address which represents a host in the outside network.

Above terms are local to the router.. Inside and Outside terms are adapted from the router's interface definitions (inside nat interface & outside nat interface.

You will see a new entry for the public IP address in the routing table too to the outside interface..
Note that it will clear this entry when you clear ip nat translations..


Sunday, September 3, 2017

One-to-one mapping between local and global addresses..
Need to have one real public IP address for every private IP address..
Used with servers mostly..



















Let's configure static NAT on R2 where the servers are..

Define inside & outside..

R2(config)#int e0/0
R2(config-if)#ip nat outside

R2(config)#int e0/1
R2(config-if)#ip nat inside

Do the mapping..
R2(config)#ip nat inside source static 192.168.2.21 203.115.41.221
R2(config)#ip nat inside source static 192.168.2.22 203.115.41.222

As soon as you enter the above commands, you will see the following output on nat translations..






If you look into the routing table, you will see the public IP address are taken into the routing table like the following.. It will not go away even though you cleared the nat translations..
















When the servers are generating traffic destined to outside of their network (ex:- pinging to 203.115.41.111 which is actually the PC1 from 203.115.41.221), you will see the following output..




But when an outside host try to reach the servers, you will see something like the following..
(pinging from 203.115.41.111 which is actually the PC1 to 203.115.41.221)




As you can see, both the outputs are same..

Inside local address – The private IP address assigned to a host in the inside network.
Inside global address – The public IP address which represents a host in the inside network.
Outside local address – The public IP address of a host in the outside network as it is seen to the hosts in the inside network.
Outside global address – The public IP address which represents a host in the outside network.

Above terms are local to the router.. Inside and Outside terms are adapted from the router's interface definitions (inside nat interface & outside nat interface)

Saturday, September 2, 2017

As a network engineer you may have to have some idea of these basic services running in enterprise environments. If you want to install Windows Server 2012 with a basic understanding about the common terms you may need to go through following posts..


If you haven't changed the server name after installation, go to Server Manager > Local Server 


Click on the Computer Name and give a name of your choice & restart the server..




Before installing the services like DHCP & DNS, you will have to assign an IP address to the network interface like the way you do in your Windows PC.

To install Active Directory, DNS and DHCP; click on the Manage > Add roles & features on the Server Manager dashboard.

It will prompt "Add Roles & Features" wizard. Basically you will need only to hit Next until where you will asked to select Server Roles..


Select the roles and hit Next all the way to Install. 

When adding roles, it will ask about the features, mostly you will have to continue with Next..























After the installation process completes, you will need to do 2 things which are marked in blue color in the results page. Click on Promote this server to a domain controller..
Because this is a clean installation (no domain nor forest), I am selecting Add a new forest & giving Root domain name as roshanznet.local










Give the DSRM password on the next page and click Next..
For the next pages, you will mostly hit Next until you find the page to Install..

After the reboot you will a yellow flag icon on the Server Manager dash board asking to complete DHCP configuration. Mostly for a basic setup it will just be few Next Nexts..

Adapting a best practice framework is crucial to improve the quality of any IT service delivered to a customer. There are some well known frameworks designed to meet business requirements.

You can use one of them which suits for your organization and do some adjustments / customization if needed. 

The framework we chose to implement is ITIL (Information Technology Infrastructure Library) which was developed in United Kingdom around late 90's and currently at its version 3 which is used by many companies around the world. I have done some customization to this version of ITIL to match the NOC I work and currently we are adapting to this new framework.. If you are working in a small NOC too, you will be able to implement ITIL in your work place after reading this.. 

The NOC I work is a small technical team which consists about 15 engineers including the Team Leader. We have the Help Desk function & the L1 / L2 support functions. We give onsite support as a 3rd party contractor to the national airline at an international airport. What we do here mostly resides in 2 stages (Service Transition & Service Operation) out of 5 stages of ITIL. These stages group processes which we should follow..

Five Stages of ITIL are as following;

(1) Service Strategy
(2) Service Design
(3) Service Transition
(4) Service Operation
(5) Continual Service Improvement

Service Transition is the implementation stage while Service Operation is the monitoring & support stage.. First let's look at the original framework..
Processes of Service Transition stage are like the following.. (click on the image to view in full size)


























The objective of ITIL Service Transition is to build and deploy IT services. The Service Transition lifecycle stage also makes sure that changes to services and service management processes are carried out in a coordinated way.

Processes of Service Operation stage are like the following..
















The objective of ITIL Service Operation is to make sure that IT services are delivered effectively and efficiently. The Service Operation lifecycle stage includes the fulfilling of user requests, resolving service failures, fixing problems, as well as carrying out routine operational tasks.

Because Request Fulfillment is a process handled by Service Desk of the customer in our environment, we could neglect it. However the most important aspect of ITIL is the idea of responsibilities assigned to individuals. Every process needs to be assigned a process owner to ensure that the process activities are carried out smoothly. Many can be assigned responsibilities but only one should be assigned accountability in any process.

Steps of Implementing ITIL?

(1) Study the work currently doing by the employees and identify the current procedures.
(2) Study the ITIL framework. Here is a good resource. Click here
(3) Decide the ITIL stages which the organization / team operate.
(4) Define the processes with necessary adjustments.
(5) Assign the manager roles to selected employees.

Because we are a small team, I merged some roles in Service Transition stage with some roles in Service Operation stage; so that they can cover more work while operating in both the stages. 

So I created 6 designations (manager roles) who are accountable in carrying out  the above processes.

IT Operations Manager (Team Lead)
Operational Stage: Service Transition, Service Operation
Associated Functions: IT Operations Control, Technical Management
Accountable Processes: Change Evaluation, Release & Deployment Management, Knowledge Management
Databases Maintained: n/a
Responsibilities: This guy is accountable for daily work, new implementations, projects coordination, knowledge sharing, supervising other manager roles. 

 - Represent NOC and lead the team to meet business requirements
 - Evaluate the major changes raised from Change Manager
 - Lead the technical team in implementations
 - Share technical knowledge with other team members
 - Plan & implement best practices
 - Give technical solutions for daily technical matters
 - Verify all documents coming through all other processes
 - Execute root cause analysis for deployment issues
 - Follow up L2 support for deployment issues
 - Follow up L3 support for deployment issues
 - Follow up the life cycle of the deployment issues
 - Create & maintain the RACI Matrix
 - Design SKMS
 - Create roster

Incident Manager
Operational Stage: Service Operation
Accountable Processes: Incident Management, Problem Management
Databases Maintained: IRDB, KEDB
Responsibilities: This person is accountable for handling incidents & problems.

 - Log issues in IRDB
 - Deal with Service Desk
 - Handle customer employees
 - Carryout normal changes to the network
 - Escalate issues to relevant parties
 - Create & maintain the Escalation Matrix
 - Analyze and diagnose the problems (recurring issues)
 - Execute root cause analysis for problems
 - Update KEDB with work arounds to problems
 - Follow up L2 support for problems
 - Follow up L3 support for problems
 - Follow up the life cycle of the problems

Event Manager
Operational Stage: Service Operation
Accountable Processes: Event Management
Databases Maintained: AEDB
Responsibilities: This guy is accountable for the proactive monitoring of the network.

 - Maintain monitoring tools
 - Log alerts & events in AEDB
 - Inform NOC about the issues to attend
 - Execute root cause analysis for alerts / events
 - Follow up L2 support for alerts / events
 - Follow up L3 support for alerts / events
 - Follow up the life cycle of the alerts / events

Change Manager
Operational Stage: Service Transition
Accountable Processes: Change Management, Transition Planning & Support
Databases Maintained: n/a
Responsibilities:  This guy is accountable for the changes doing to the network.

 - Creates RFCs/CRs
 - Plan maintenance windows
 - Create change schedules
 - Define & communicate with CAB (Change Advisory Board)
 - Categorize changes (Standard, Normal, Emergency)
 - Create emergency change plans
 - Execute root cause analysis for change issues
 - Follow up L2 support for change issues
 - Follow up L3 support for change issues
 - Follow up the life cycle of the change issues

Test Manager
Operational Stage: Service Transition
Accountable Processes: Service Validation & Testing
Databases Maintained: n/a
Responsibilities: This guy is responsible for the resiliency of the network.

 - Prepare test cases
 - Perform tests
 - Produce test reports
 - Carryout fail over tests
 - Prepare user acceptance
 - Execute root cause analysis for test failure issues
 - Follow up L2 support for test failure issues
 - Follow up L3 support for test failure issues
 - Follow up the life cycle of the test failure issues

Configuration Manager
Operational Stage: Service Transition
Accountable Processes: Service Asset & Configuration Management
Databases Maintained: CMDB
Responsibilities: This guy is accountable for everything about network devices.

 - Keep the inventory (CMDB) up to date
 - Deal with RMAs
 - Carryout Audits
 - Managing software & hardware licenses
 - Backup configurations
 - Maintain network diagrams
 - Execute root cause analysis for RMA/ VAPT/ ISO issues
 - Follow up L2 support for RMA/ VAPT/ ISO issues
 - Follow up L3 support for RMA/ VAPT/ ISO issues
 - Follow up the life cycle of the RMA/ VAPT/ ISO issues

Those above roles are all the IT service management roles we have.
Database Components in SKMS (Service Knowledge Management System) are as following..


AEDB - Alerts & Events Database

IRDB - Incident Records Database

KEDB - Known Errors Database

CMDB - Configuration Management Database


Communication Protocol within the team?

All the written communication will be carried out via emails. Every manager will send addressed mails directly to the IT Operations Manager + all the managers who are directly accountable for every issue raised through their processes. Additionally all the team members (not only managers) should be copied in the mailing list. Because we only have about 15 members in total, it is OK to put every one in the list so that everyone have an idea about the issue.

RACI Matrix

This document is created by IT Operations Manager defining groups and roles that are responsible for performing a defined activity.

Here is an example matrix format..

R for Responsible: 
These are the people who is executing the work.
A for Accountable: 
This is the person that at the end is in charge for the results / outcome, usually is an executive.
C for Consult: These are the people in the related fields that we should keep a two-way communication to consult for problem solving and improvement.
I for Informed: These are the people that should receive one-way communication.
(ex:- a report)



Escalation Matrix

This document is created by Incident Manager to define when and how to escalate issues beyond the operational scope of the team. Escalation procedure will be carried out by Incident Manager and will be followed up by IT Operations Manager.

Framework Customization Summary:

(01) Change Evaluation, Release & Deployment Management and Knowledge Management processes are assigned to the IT Operations Manager which will be handled by the Team Lead Engineer. This is the superior role of IT Manager in ITIL who supervises and represents the entire team and all other manager roles. IT Operations Control/Management and Technical Management functions are associated to this role.
(02) 'Event Manager' role is introduced for Event Management process which is dedicated for proactive monitoring. Originally in the framework, this is a process handled by IT Operations Manager but because lot of dedicated work to be carried out regarding this process related to a NOC, this new role is created.
(03) Incident Management & Problem Management processes are assigned to Incident Manager.
(04) Change Management, Transition Planning & Support processes are assigned to Change Manager.
(05) Task of maintaining backups and creating backup plans are removed from IT Operations Manager and added to the Configuration Manager.

Friday, September 1, 2017

IP SLA (Service Level Agreement) allows us to generate traffic which can be used to check delay/latency, jitter etc. When it is used with object tracking, we can check the reachability of an IP address (by pinging) or a certain service by connecting to it (using TCP).
If the IP address/service is unreachable we can apply a certain action to happen..

This note explains how to configure IP SLA with track objects to change a route..


















Let's assume our router is R1. We have 2 internet links from 2 service providers..
For this lab, let's assume that the circuit 1 is from R1 to R4 & the circuit 2 is from R1 to R5..

Requirement:-
(1) We want to route all traffic to internet via ISP-1 as the primary path.
(2) If ISP-1 is unable to give a circuit which has a RTT of 100 ms, change the path to ISP-2.

Assuming ISP routing & other basic configurations work well;

IP SLA configuration is as follows..

R1(config)#ip sla 1
R1(config-ip-sla)#icmp-echo 172.16.24.4
R1(config-ip-sla-echo)#threshold 100
R1(config-ip-sla-echo)#timeout 200
R1(config-ip-sla-echo)#frequency 1

Commands above will implement the following respectively..

IP SLA entry number is 1
Target to ping is 172.16.24.4
RTT (Round Trip Time) value of the icmp-echo operation is 100 ms
Operation will timeout in 200 ms if no reply considering unreachable
Operation will execute in every second

Following command will start the operation from now and will run forever..
R1(config)#ip sla schedule 1 start-time now life forever

Following command will bind track object 10 with ip sla 1's return code..
R1(config)#track 10 ip sla 1

Following command will bind the static route with track object 10..
R1(config)#ip route 0.0.0.0 0.0.0.0 172.16.12.2 track 10

Following command will state the fallback route to ISP-2 with a higher metric (2)
R1(config)#ip route 0.0.0.0 0.0.0.0 172.16.13.3 2

Note that without IP SLA, if & only if the R1-R2 link goes down, the route will be failed over..

Configurations are over & it will work fine..

Threshold is boundary value measured over the operation result (e.g. RTT, or jitter value collected during the operation). Crossing threshold usually means SLA contract violation.

Timeout is the maximum time required for SLA operation to complete - for example the timeout waiting for probe response.

Timeout is directly used to restart the operation. Threshold is used to activate a response to IP SLA violation, e.g. send SNMP trap, start secondary SLA operation, route fallback etc..

Frequency > Timeout > Threshold

Important show commands:-
R1#show ip sla summary
R1#show ip sla statistics
R1#show track brief

In normal operation following healthy outputs will be visible..











As you can see;
the RTT is 1 hence the return code is OK..
Track object will remain up..





Return code will be displayed as "Over Threshold" & the track object will be "Down" when the RTT goes over 100ms. As soon as the IP SLA return code becomes OK again, (RTT becomes lesser than 100 ms), IP SLA code will be OK and the track object will be up changing the route again to ISP-1..

When the R2-R4 link goes down (unreachable), following will be the outputs..
(1st show ip route is when everything is ok)