Tuesday, December 8, 2015

Multicast Part 2


RP Configuration:

Static:
The following command is used to configure rp in the router,
Ip pim rp-address <IP_Of_Router> <access-listNo>

If multiple rp’s are configured,the one with higher ip will be selected by the router.

Auto RP:
Cisco proprietary tool for advertising RP info for multicast groups.
It uses multicast to distribute group to RP mapping info.

Cisco PIM routers learn about the group to RP mapping by joining the group Cisco-RP-discovery 224.0.1.40, the mapping agent will advertise the mapping info to this group.
The mapping agent will learn of the possible RP candidates by joining to group Cisco-RP-announce 224.0.1.39.
Candidate RPs announce their intention to be RP for a group or group range by multicasting RP announce messages to the group 224.0.1.39.

Configuring mapping agent:
ip pim send-rp-discovery scope ttl

Configuring candidate RPs:
ip pim send-rp-announce interface scope ttl [group-list acl]
If group-list is not specified, the router will announce as candidate for 224.0.0.0/4

If the mapping agents receive multiple rp announcements, all of them will cache the group to RP announcements and select RP with higher IP address.

Multiple mapping agents can be configured in a network, all mapping agents will select the same RP for a given group and routers will have the same set of rp mappings. Only ‘the source info’ of the mapping will be toggling in the routers.
Can tweak the RP-announce-interval to have short failover times, however with the default SPT threshold of zero, all the routers would have switched to SPT and the failure of a RP will have little effect.

RP-announce and RP-discovery are always operated as dense mode groups. If the RP info is not found for a group, the group will be operated in dense mode.

Security:
To stop sending rp discovery messages, configure the following on the interface
‘ip multicast boundary <access-list>’
access-list 10 deny 224.0.1.39
access-list 10 deny 224.0.1.40
access-list 10 deny 239.0.0.0 0.255.255.255
access-list 10 permit 224.0.0.0 15.255.255.255


We can configure the following on the mapping agent to prevent candidate RP spoofing,
ip pim rp-announce-filter rp-list acl [group-list acl]
eg:
access-list 1 permit host 1.1.1.2
access-list 2 deny any
ip pim rp-announce-filter rp-list 1 group-list 2

With the above configuration, the filtering is performed on the IP addresses permitted by the rp-list i.e. 1.
Here, the RP permitted in ACL 1 denied to be the RP for groups referenced in ACL2.
1.1.1.2 is denied to be the RP for all multicast groups.

All the interfaces must be configured to operate in ‘sparse-dense’ mode.

When the interfaces are configured to operate in sparse mode,
‘ip pim autorp listener’ àAllows the two group addresses 224.0.1.39 and 224.0.1.40 to operate in dense mode and other groups in sparse mode.

Misc:

  • If router interfaces are configured in sparse mode, Auto-RP can still be used if all routers are configured with a static RP address for the Auto-RP groups.
  • RPs discovered dynamically through Auto-RP take precedence over statically configured RPs
  • To accept all RPs advertised with Auto-RP and reject all other RPs by default, use the ip pim accept-rp auto-rp command.

PIM V2 Bootstrap Mechanism:
BSR uses hop by hop flooding of special bootstrap messages to distribute all group to RP mapping info.

The combination of hop-by-hop flooding of BSR messages and unicasting C-RP advertisements to the
BSR completely eliminates the need for multicast in order for the BSR mechanism to function.

ip pim rp-candidate interface [group-list acl]
When this global configuration command is added to a router's configuration, the router begins to
unicast PIMv2 C-RP advertisements to the currently elected BSR.


ip pim bsr-candidate interface hash-mask-length [priority]
After configuring, the router sets its Bootstrap timer to the bootstrap timeout value (150 sec) and enters
the C-BSR state ,waiting to receive BSR messages from current BSR.

If the router receives a BSR messages with higher priority, it accepts the message, the timer is reset and forwards out all the other interfaces.
Low priority messages will be discarded.

If the bootstrap timer expires, the C-BSR will start sending BSR messages every 60 sec.
If a high priority BSR message is received, it will transition back to C-BSR state.

In this way, the candidate RP router will come to know of the BSR and starts unicasting its RP intention to BSR.
The BSR will cache all such mappings and send them in BSR messages.
Each router now receives all the RP to group mapping info through hop by hop flooding mechanism and run some hashing algorithm to identify the RP for a group.

If two routers announce to be the RP candidates for entire multicast range, in BSR, the routers will share the RP workload for multicast range.
By changing the hash mask length value, it is possible to control the no. of consecutive group addresses that map to the same candidate RP.

BSR messages are flooded to all PIM routers 224.0.0.13 with a TTL of 1.They contain the following info
·         Ip address of current BSR
·         Group tot RP mapping cache
·         Priority
·         Hash mask length value

‘Ip pim border’ command to constrain BSR messages. This command will not affect the flow of other PIM messages join, prune,etc


Forcing groups to remain in Dense mode:

The following command can be used to force certain groups to operate in dense mode
ip pim accept-rp {rp-address | Auto-rp} [group-list acl]

When the router receives IGMP join from a local host, it will run the RP and group address against this filter, if the filter permits, the group will be created in sparse mode, else the group will be created in dense mode.
When the router receives (*, G) join from a downstream router, the RP address in the join message and group address will be run against the filter, if the filter allows, the join is propagated towards RP, else discarded.
When the router receives register messages for a group, the group address and destination address will be run through the filter, if the filter allows, the register is processed else it will send register stop is sent.

The ip pim accept-rp command has the following three basic forms:
ip pim accept-rp rp-address [group-list acl] àIf the matching entry found, search terminates. If permitted, sparse mode will be used.
ip pim accept-rp Auto-rp [group-list acl]àIf the group to RP cache permits, the group is created in sparse mode. If denied, wildcard entry will be tried.
ip pim accept-rp 0.0.0.0 [group-list acl]à If the matching entry found, search terminates. If permitted, sparse mode will be used.

Configure ip pim rp-address to force the group to operate in sparse mode.

MSDP:

MSDP is a mechanism to connect multiple PIM-SM domains. It shares the active multicast sources in a domain to RPs in other domains.
MSDP is configured between RPs, it uses TCP over port 639.

On receiving the register messages from first hop router, the RP will re-encapsulated in source-active messages and are forwarded to all MSDP peers.

MSDP messages are flooded across MSDP peers.
R1----R2-----R3
R1 & R2 msdp peers
R2 & R3 msdp peers

If R1 send a SA message to R2, R2 can forward it to R3.

SSM:

In SSM,only the router closest to the receiving host needs to have SSM enabled.

access-list 1 permit 232.0.0.0 0.255.255.255
ip pim ssm range 1

When SSM is enabled, only (S, G) state will be created ,no  (*, G) will be created for the groups specified in SSM range.

Bi-directional PIM:

ip pim bidir-enable --- This must be enabled
ip pim rp-address 1.1.1.3 bidir

A designated forwarder is elected for each segment, DF is nothing but a multicast router that can forward (*, G) traffic in 2 different directions.
The router with lowest cost to RP will get elected as DR.

IGMP:
In IGMP v2, the router with lowest ip address will become the querier for that segment.

The DR is the router with the highest IP address on the subnet, whereas the IGMP querier is the router with the lowest IP address.

The router periodically send query message to all host 224.0.0.1
The hosts which want multicast traffic will reply with membership reports to 224.0.0.2
While leaving, IGMP uses group specific queries to improve the performance. Host will send a leave message and router will send a group specific query.

By default, if PIM is enabled on the interface, IGMP v2 is also enabled.

R2#sh ip igmp int fa0/0
  IGMP is enabled on interface
  Current IGMP host version is 2
  Current IGMP router version is 2
  IGMP query interval is 60 secondsàto discover active multicast group receivers. If two queries are missed, election for new querier starts.
  IGMP querier timeout is 120 secondsàif no query seen for 120 sec, the other router will trigger an election for selecting new querier
  IGMP max query response time is 10 secondsàTweak to improve the burstiness of the query responses
  Last member query count is 2àno.  of queries sent after receiving group specific leave and before stopping forwarding of multicast traffic
  Last member query response interval is 1000 ms
  Inbound IGMP access group is not setàaccess-list to restrict hosts from joining some mcast groups
  IGMP activity: 1 joins, 0 leaves
Interface IGMP State Limit: 0 active out of 2 maxàMax no. of groups that hosts can join. After two groups are joined, third group joins are access denied.
  Multicast routing is enabled on interface
  Multicast TTL threshold is 0
  Multicast designated router (DR) is 10.1.100.2 (this system)
  IGMP querying router is 10.1.100.1-àLower ip address router will assume the role of querier. This is different from PIM DR router.
  No multicast groups joined by this system

Sunday, November 29, 2015

Multicast - Part1

PIM:
224.0.0.13 multicast group for all PM routers.
Sends hello message every 30sec and hold down timer three times of hello interval.
Highest priority router will become DR for the segment, if priorities are equal, router with highest ip will become DR. (Sending PIM register and PIM join and prune messages toward the RP, Sending IGMP host-query messages.)
RPF check is done to ensure the packet arrived on the correct interface in the direction of source.
When multiple entries exist in the routing table, the entry with the highest next-hop will get selected.

PIM DM:

PIM-DM builds source-based multicast distribution trees that operate on a "flood and prune" principle.

In dense mode, prune messages are sent when
Traffic arrives on non-RPF interfaces
No receivers

Prune override:
On multi access networks, if routers sees a prune message and has receivers, it will send join message to override the prune message.
A 3 second prune delay timer is started on receiving a prune message, if no join is received in this 3 sec prune will take place.

PIM asserts:
On multi-access networks to avoid duplicate multicast traffic.
If a router receives a multicast packet via an interface in the outgoing interface list associated with a multicast source, send a PIM Assert message out the interface to resolve which router will continue forwarding this traffic. The router with better metric to the source will win and continue to relay multicast traffic.
If tie in metric, the router with highest ip will win.

PIM graft:
To restart the flow of multicast traffic on a previously pruned interface without having to wait for the timers to expire.


PIM State refresh capability
If state refresh is enabled on an interface, the router will send a mcast control packet, if the receiving has no interfaces in OIL, it will send a prune back to sender refreshing the state.


If  PIM dense mode (PIM-DM) is enabled on a router interface, the PIM Dense Mode State Refresh feature is also enabled by default.

General Rules in forwarding multicast traffic:
The multicast traffic is forwarded using mroute tables, the following rules help in understanding the mroute tables
·         When creating (S, G) entry, create (*, G) if it doesn’t exists
·         The RPF interface will be calculated as the interface with lowest cost to the source in case of (S, G) and with lowest cost to the RP in case of sparse mode (*, G).If multiple interfaces have the same cost, the interface with highest IP will become the RPF interface.
·         When creating (S,G) entry, its outgoing interface list(OIL) is copied from the parent (*,G)
·         The incoming interface of a multicast forwarding entry  must never appear in the OIL
·         The RPF interface of every multicast state entry will be calculated every 5 sec and the OIL is adjusted according with the rules.
·         Addition/deletions to the outgoing interface list of (*, G) are replicated to the associated (S, G) entries.


Sparse Mode:
Unlike dense mode, sparse mode uses (*, G) to forward multicast traffic.

Sparse mode (*, G) rules:
·         A (*, G) is created as a result of explicit join operation. Due to a directly connected host joining the group or in response to (*, G) join request downstream router
·         The incoming interface of (*, G) always points up the shared tree towards the RP.

Sparse mode (S, G) rules:
·         A (S, G) entry is created on receiving a (S,G) join/prune message or last hop router deciding to switchover to shortest path or unexpected arrival of (S,G) traffic or RP on getting register message
·         When a (*, G) join is received, the interface is added to the outgoing interface list of (*, G) and subsequently to (S,G)
When a (S, G) join is received, the interface is added to the OIL of (S, G) only. The (S, G) join is specific to SPT for source S and group G and is not applicable to the shared tree.
·         The interface is removed from OIL on receiving a (*, G) or (S, G) prune or interface expiration timer counts to zero.
·         The expiration timer is reset on receiving a (*, G) or (S, G) join or IGMP membership report. Downstream routers will refresh the state by sending the joins periodically every minute.
·         When the last hop router decides to switch to shortest path, the router no longer needs to receive the traffic via shared tree. To stop the flow of this redundant traffic  down the shared tree,
router will send (S, G) prune with RP bit set. Router receiving the prune message sees this message as a request to prune the specified (S, G) traffic from this branch of the shared tree.
Router will send an (S, G) RP-bit prune when the RPF interface of (*, G) is different from RPF interface of (S, G) i.e. shortest path.
·         When the router receives an (S, G) RP-bit set prune from a downstream neighbor, it will remove the interface from the OIL of (S, G) and sets the R flag.
·         The RPF of (S, G) is calculated by using the IP address of the source except when RP bit is set, in which case the IP address of RP is used(Incoming interface)





Say R3 and R4 have sent join for the group G1.
At this point, all the routers R2, R3, R4 and RP will have (*, G1) entries.

When the source starts sending traffic, RP will forward the multicast traffic and subsequently it reaches R3 and R4.
R3 sees that it has shortest path to the source and will send an (S, G1) join to R1 and (S, G1) RP-bit prune to R2.
R2 will create (S, G1) state and set the R flag and copies the OIL from (*, G1) and removes the interface S1 from it.
R3 will get the traffic from R1 directly and R4 will continue to get the traffic from shared tree. At the point, the (S, G1) entry in R2 will also have a flag T indicating the traffic is forwarded via the (S, G1) entry.

Prunes are sent up the shared tree to prune off sources whose traffic is being received directly via the SPT along a different path. These (S, G)RP-bit Prunes must continue to be sent periodically along with the associated (*, G) Join to refresh state along the shared tree. When these periodic Joins are sent up the shared tree, both the (*, G) Join and any associated (S, G) RP-bit Prunes are all sent inside of the same PIM Join/Prune message. This leads to the following two categories of (*, G) Joins:
Atomic (*, G) Joins---These are Join/Prune messages that contain both the (*, G) Join along with all associated (S, G)RP-bit Prunes for Group G.
 Nonatomic (*, G) Joins---These are Join/Prune messages that contain only the (*, G) Join without any associated (S, G)RP-bit Prunes for Group G.



If the source joins first, Register, Register-stop sequence will continue to happen until some receiver joins.

SPT Switchover:
Once each second, the router will compute the total traffic flowing down the shared tree.
If this exceeds the threshold, the router will set the J flag on (*, G) and join the (S, G) on arrival of next packet.
Here are the steps
·         Set J flag on (*, G) and wait for the next (S, G)
·         When (S, G) arrives down the shared tree, clear J flag on (*, G) and send (S, G) join.
The J flag will again be set after 1 sec interval on (*, G).This is to avoid multiple (S, G) switchovers.
·         Once the traffic is pruned of the shared tree and traffic arrives on (S, G),the router will continue o calculate the rate of traffic on (S, G)
 If the rate is lower than the threshold, the router will switchover to shared tree and prune off the flow down the SPT.


Pruning:
Shared Tree: If router no longer wants multicast traffic it will send (*, G) prune.
When (*, G) prune is received, the interface is removed from OIL. If the OIL is null, the P flag is set on the (*, G) and a prune is sent to upstream router.

Shortest Tree: If router no longer wants multicast traffic it will send (*, G) prune up the SPT.
No (S, G) prune will be sent, instead P flag is set on the (S, G) and expire timer is triggered on them and allowed to age out.


R4 will not send (S, G) prune, it will start expire timer on (S, G) and will send (*, G) prune up the shared tree.
R4 will also not send periodic (*, G) and (S, G) joins to refresh the state.

When (*, G) prune arrives at R2, it will send the (*, G) prune up the shared tree and start expiration timer on (S, G).
The source will continue to send the (S, G) traffic and on arrival of (S, G), the router R2 will send (S, G) prune.

The reason that (S, G) Prunes are triggered only by the arrival of data is to optimize the amount of control traffic sent in the network, bandwidth is not wasted sending (S, G) Prunes for bursty or other low-rate sources in the network.

Turnaround Router:
The turnaround router scenario occurs when the SPT and shared tree paths merge at a router (with traffic flowing in opp directions).
The router at which the paths merge is called turnaround router, it is upstream of the shared tree receivers and downstream of the source on the SPT




Proxy join timer is used to handle such scenario, it is associated only with (S,G) entries in the mroute table.
Rules:
A proxy-join timer is started on RP when the (S, G) entry is created by a register message and OIL of (*, G) is not null.
When the router receives a non-atomic (*, G) join on the incoming interface of (S, G) entry from a non RPF neighbor.

The proxy-join timer is reset by the receipt of non-atomic joins,
They are simply allowed to age out if the non-atomic joins are not received.

                When the proxy-join timer is running on an (S, G) entry,
                The router will send (S, G) joins towards the source and suppress sending (S, G) prunes towards the source.

Say R2 has a receiver and it joins the shared tree.
·         When a source starts sending at R1, it will send register message to RP.
·         RP will have (*, G) with OIL, so it will start proxy join timer and starts sending (S, G) joins towards the source.Ideally,the OIL of (S, G) on RP will be null and it should have been sending prunes.
·         Once the SPT is built, R3 will hear the non-atomic join sent by R2 and it will start its proxy-join timer.
·         R3 will start sending (S, G) joins towards the source and will also send atomic joins towards RP. Atomic joins will be sent because RPF of (*, G) and (S, G) are different(PIM SM rules,prune with rp bit set).
·         The proxy join timer will be reset by non-atomic joins, since RP is receiving atomic joins, the proxy join timer on RP will age out
·         When the timer expires on RP, it will not send periodic (S, G) joins to R3 and the interface will eventually be removed from OIL of (S, G) in R3.

The turnaround router functionality will not work if R2 switches over to SPT.This is because, it will be sending atomic joins.


Thursday, July 30, 2015

ip access-lists

Reflexive ACL:
The idea is to selectively allow the outside traffic for some time.


    R1------------------------------(fa0/1) R2 (fa0/0)------------------------------R3
(LAN)                                                                                         (INTERNET)


The requirement is R1 and R2 can initiate the traffic towards R3.
R3 cannot initiate traffic to R1 or R2

We will use reflexive acl to achieve this goal,

Reflexive ACL will have 3 components,

First one to match the interesting traffic from R1,R2 and apply it to the interface fa0/0
R2(config)#ip access-list extended LAN_TO_INTERNET
R2(config-ext-nacl)#permit icmp any any reflect MIRROR_ACL
R2(config)#int fa0/0
R2(config-if)#ip access-group LAN_TO_INTERNET out

The packets matched by the acl LAN_TO_INTERNET will be reflected into the acl MIRROR_ACL

We can see the contents of the acl, (the acl will get dynamically updated on seeing the matching traffic for the acl LAN_TO_INTERNET),when I ping R3 from R1,the acl will get updated as
R2#sh ip access-lists MIRROR_ACL
Reflexive IP access list MIRROR_ACL
     permit icmp host 10.23.1.3 host 10.12.1.1  (15 matches) (time left 138)


To apply this reflexive acl, we need to associate it to an ACL and apply inbound on fa0/0.
R2(config)#ip access-list extend INTERNET_TO_LAN
R2(config-ext-nacl)#evaluate REFLEX_ACL( can add permit/deny statements as well)
R2(config)#int fa0/0
R2(config-if)#ip access-group INTERNET_TO_LAN in

With this configuration, R1 will be able to ping R3 but not vice versa
R1#ping 10.23.1.3
Sending 5, 100-byte ICMP Echos to 10.23.1.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 80/99/128 ms
R1#telnet 10.23.1.3
Trying 10.23.1.3 ...
% Destination unreachable; gateway or host down

R1 is able to ping R3 but not able to telnet, this is because we permitted only icmp traffic in the LAN_TO_INTERNET acl.To allow telnet,lets modify the acl as

R2(config)#ip access-list extended LAN_TO_INTERNET
R2(config-ext-nacl)#permit tcp  any any reflect MIRROR_ACL

If we try to telnet now,

R1#telnet 10.23.1.3
Trying 10.23.1.3 ... Open
User Access Verification
Password:
R3#

The access-list on R2 will be
R2#sh ip access-lists
Extended IP access list INTERNET_TO_LAN
    10 evaluate MIRROR_ACL
    Extended IP access list LAN_TO_INTERNET
    10 permit icmp any any reflect MIRROR_ACL (25 matches)
    20 permit tcp any any reflect MIRROR_ACL (139 matches)
Reflexive IP access list MIRROR_ACL
     permit tcp host 10.23.1.3 eq telnet host 10.12.1.1 eq 15837 (28 matches) (time left 295)
(The above is reflection of tcp flow source-10.12.1.1,destination-10.23.1.3,source port-15837,
destination port-23)
     permit icmp host 10.23.1.3 host 10.12.1.1  (10 matches) (time left 275)

Let’s try the ping from R2,
R2#ping 10.23.1.3
Sending 5, 100-byte ICMP Echos to 10.23.1.3, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

It’s failing because locally generated packets will not be inspected by outbound access-lists, so it will not be reflected under reflexive access-lists. So R3 replies to the ping will be blocked by the inbound access-list.

We can use local policy routing to fix this issue.
With local policy routing, we will force the traffic to reenter the router and be inspected by the outgoing access-lists

Create an access-list that matches the traffic from R2 to R3
R2(config)#ip access-list extended LOCAL_TRAFFIC
R2(config-ext-nacl)#permit tcp any any
R2(config-ext-nacl)#permit icmp any any

Create a route-map that matches the access-list and set output interface to some loopback
R2(config)#route-map LOCAL_POLICY 10
R2(config-route-map)#match ip address LOCAL_TRAFFIC
R2(config-route-map)#set interface lo100

Apply the route-map in global config
R2(config)#ip local policy route-map LOCAL_POLICY




The ping should be successful now
R2#ping 10.23.1.3
Sending 5, 100-byte ICMP Echos to 10.23.1.3, timeout is 2 seconds:
.!!!!


Saturday, June 20, 2015

MPLS Part1

VRF-lite:

VRFs create an instance of the routing table.
VRF, when used inside a single router or without MPLS is VRF-Lite

We can create VRFs in two ways

Legacy method—supports only ipv4
R6(config)#ip vrf VPN_A
R6(config)#int fa0/0
R6(config)#ip vrf forwarding  VPN_A
When applied, this will remove only the ipv4 address attached to the interface. The ipv6 address of the interface will be part of global routing table and ipv4 address will be part of corresponding VRF table

Newer method—supports ipv4 and ipv6, we need to mention with address family commands under vrf
R6(config)#vrf definition VPN_B
R6(config-vrf)#address-family ipv4
R6(config-vrf)#address-family ipv6
R6(config)#int fa0/0
R6(config)# vrf forwarding  VPN_B
When applied, this will remove both the ipv4 and ipv6 addresses attached to the interface.

Each VRF instance has its own RIB and FIB.
An interface in VRF instance A1 cannot ping an interface in VRF instance A2.

To facilitate inter VRF reachability,
·         Ip route VRF VRF_Name prefix mask [interface] [next-hop]àThe interface can be in any VRF
·         The other option is to use the “global” keyword on the end of the route statement to instruct the router to look up the next hop from the global routing table



Some useful show commands,
Show vrf
Show run vrf

LDP:
LDP advertises its router-id as the transport address in the hello discovery messages.
So make sure the router-id is reachable. There must be an exact match for the router-id in the routing table.

The hello messages are sent to 224.0.0.2 on the UDP port 646.
After discovering a neighbor, the tcp connection will be established on 646 and labels are exchanged.

We can change the transport address
R1(config)#int fa0/0
R1(config-if)#mpls ldp discovery transport-address interface

The tcp session will be reestablished on giving the above command.
The TCP connection can be authenticated using an MD5 hash option.
 The hashing key is defined per-neighbor by using the command mpls ldp neighbor <IP> password <password>.
The IP address here is the neighbor’s LDP Router ID. To make the use of passwords mandatory, we need the global command mpls ldp password required.

When an LDP session is established, the hold time used for the session is lower of the values configured on the two routers.
R1(config)#mpls ldp holdtime 45

To change the neighbor discovery interval and hold time
R1(config)#mpls ldp discovery hello interval 15
R1(config)#mpls ldp discovery hello holdtime 45

To change the router-id
R1(config)#mpls ldp router-id lo0 forceàIf force is not used, the router must be reloaded to get the change into effect
‘Force’ will reset the tcp session


Normally, LDP advertises ‘implicit-null’(i.e. Label 3) for connected routes. So PHP router will pop the label before sending the packet.
Say if the packet contains Qos markings and we don’t want the PHP to pop the top label, we can configure the router to advertise ‘explicit-null’ for connected routes.
In such a case, the router will receive packets with ‘label 0’ for connected routes.

R1(config)#mpls ldp expliticit-null for <prefixes> to <ldpPeers>

Normal trace route from a customer router to other customer site
BB1#traceroute 1.1.1.1
  1 10.1.67.6 72 msec 80 msec 60 msec
  2 10.1.56.5 [MPLS: Label 16 Exp 0] 156 msec 148 msec 152 msec
  3 10.1.35.3 [MPLS: Label 16 Exp 0] 152 msec 148 msec 128 msec
  4 10.1.23.2 [MPLS: Label 27 Exp 0] 104 msec 108 msec 104 msec
  5 10.1.12.1 160 msec 132 msec 132 msec

The network is
R1=====R2-----R3-----R5-----R6=====BB1


In the above output, customer is able to see the routers and transit links in the provider’s network.
If we want to hide these details from the customer, we should configure the following command on the Edge router (not required on all P routers)
R6(config)#mpls ip propagate-ttl
R6(config)#no mpls ip propagate-ttl forwardedàThis will cause not to copy the TTL from IP into MPLS label for forwarded traffic only ,for locally generated traffic it works normal.
So the traceroute from PE routers will show all the transit links and for CE they will be hidden.


Then the trace route output from CE and PE routers will looks as
BB1#traceroute 1.1.1.1
  1 10.1.67.6 84 msec 72 msec 72 msec
  2 10.1.23.2 [MPLS: Label 27 Exp 0] 124 msec 120 msec 124 msec
  3 10.1.12.1 152 msec 132 msec 124 msec

R6(config)#do traceroute 1.1.1.1
  1 10.1.56.5 [MPLS: Label 16 Exp 0] 120 msec 168 msec 140 msec
  2 10.1.35.3 [MPLS: Label 16 Exp 0] 104 msec 112 msec 112 msec
  3 10.1.23.2 [MPLS: Label 27 Exp 0] 80 msec 92 msec 84 msec
  4 10.1.12.1 120 msec 108 msec 104 msec

R6(config)#mpls ip propagate-ttl
R6(config)#no mpls ip propagate-ttl local àThis will cause not to copy the TTL from IP into MPLS label for locally generated  traffic only ,for forwarded traffic it works normal.
So the traceroute from CE routers will show all the transit links and for PE router they will be hidden

R6(config)#do traceroute 1.1.1.1
  1 10.1.23.2 [MPLS: Label 27 Exp 0] 120 msec 84 msec 140 msec
  2 10.1.12.1 132 msec 160 msec 108 msec
R6(config)#

BB1#traceroute 1.1.1.1
  1 10.1.67.6 60 msec 56 msec 56 msec
  2 10.1.56.5 [MPLS: Label 16 Exp 0] 172 msec 156 msec 152 msec
  3 10.1.35.3 [MPLS: Label 16 Exp 0] 280 msec 148 msec 124 msec
  4 10.1.23.2 [MPLS: Label 27 Exp 0] 140 msec 112 msec 104 msec
  5 10.1.12.1 124 msec 128 msec 128 msec



LDP targeted hellos:
·         To establish ldp adjacency with devices that are not directly connected
·         Hellos will be unicasted
·         Normally used in TE for LDP session between tunnel endpoints
·         When enabled between directly connected devices, may improve the convergence by retaining the labels even when the link to neighbor is down.


By default, LDP will generate and advertise labels for every prefix found in the local routing table.
If we want to change this behavior and generate labels only for specific prefixes, we can use access-list to select the prefixes eligible for label generation.
R4(config)#no mpls ldp advertise-labelsàThis command must be entered to see the change
R4(config)#mpls ldp advertise-labels for 10 


 A sample traceroute in a network with LDP not turned on completely
R1#traceroute 10.1.67.7
  1 10.1.12.2 [MPLS: Label 26 Exp 0] 72 msec 52 msec 52 msec
  2 10.1.23.3 48 msec 56 msec 68 msec
  3 10.1.35.5 [MPLS: Label 25 Exp 0] 100 msec 100 msec 44 msec
  4 10.1.56.6 104 msec 120 msec 68 msec
  5 10.1.67.7 120 msec 132 msec 128 msec

Some useful show commands
Sh mpls ldp binding 10.1.67.0 24 àto check the LIB
Sh mpls forwarding-table 10.1.67.0 24 àto check the LFIB
Sh mpls ldp discovery detail
Sh mpls ldp  neighbor
Sh mpls ldp parameter


Wednesday, June 17, 2015

DMVPN-Part2

Routing protocols in Phase 1:

The next hop will always be the HUB.

In Phase1, the control plane should be kept as simple as possible because the data plane is always going to be point-to-point hub and spoke tunnels
regardless of the next-hop and routing protocol.

Eigrp:

On enabling eigrp, the spokes can establish adjacency with the hub.
They can’t establish adjacency with other spokes as they cannot replicate multicast traffic directly between them (in all the three phases of dmvpn).

To establish connectivity between spokes, we have two options

First one is, advertise a default route
R5(config-if)#int tun0
R5(config-if)#ip summary-address eigrp 100 0.0.0.0 0.0.0.0

Second one is, disable split-horizon. Spokes will learn the routes from other spokes but the next-hop will be the HUB.
R5(config-if)#int tun0
R5(config-if)#no ip split-horizon eigrp 100

ODR:
ODR is based on CDP.
CDP is enabled by default from IOS 15.x.Just make sure cdp is running on the tunnel interfaces.
R5#show cdp neighbors

Steps to run ODR
First enable cdp on the tunnel interface,
R5(config)#cdp run
R5(config)#int tun0
R5(config-if)#cdp enable

Enable ODR, this must be done the hub, the hub will announce a default route to the spokes and spokes will send their connected links information in the cdp messages to the HUB.
R5(config)#router odr

If any other routing protocol is enabled ODR will not run.
Exchange routing information without enabling any routing protocol.

BGP:

One of the major advantage of DMVPN is ,
We can easily add a spoke without changing any configuration on the existing devices.

When using BGP,this advantage is broken. We may have to do BGP configuration changes, policy changes.
We can use dynamic BGP configuration as a workaround for this.

We can use iBGP or eBGP to speak to the HUB.


RIP:
Normal configuration commands to enable rip.

We can send a default route to the spokes as
R5(config)#route-map DEFAULT  permit 10
R5(config-route-map)#set interface Tunnel0
R5(config)#router rip
R5(config-router)#default-information originate route-map DEFAULT
Or
Disable split horizon
R5(config)#int Tu0
R5(config-if)#
no ip split-horizon

OSPF:

When OSPF is configured over GRE tunnel interfaces, the OSPF network type defaults to point-to-point.
This is not supported in a DMVPN design, because the hub must maintain multiple adjacencies on the same interface, one for each remote spoke.

In DMVPN Phase 1 with OSPF, the OSPF network type is set to point-to-multipoint on the hub at a minimum. With the hub being OSPF network type point-to-multipoint and the spokes being OSPF network type point-to-point, adjacency is supported, as long as the timer values match.

DMVPN PHASE2:

The main problem with phase1 is all the spoke to spoke traffic must pass through HUB putting huge stress on the hub resources.
This limitation was primarily due to the configuration of the spoke as ‘point-point gre tunnel’ rather than ‘multipoint gre tunnel’.

Phase2 permits spoke to spoke tunnels, for this we need to configure the spokes as ‘multipoint gre tunnels’

The only configuration change we need to do is, on all the spokes
R4(config-if)#int tun0
R4(config-if)#no tunnel destination àremoving point-to-point tunnel setting
R4(config-if)#tunnel mode gre multipoint àenabling multipoint gre tunnel on spokes

No configuration changes on the hub.

Routing tables in phase2,
We now know all the networks behind the spokes with next-hops

R4#sh ip route
Gateway of last resort is not set

      14.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        14.1.1.0/24 is directly connected, Tunnel0
L        14.1.1.4/32 is directly connected, Tunnel0
      150.1.0.0/32 is subnetted, 5 subnets
D        150.1.1.1 [90/28288000] via 14.1.1.1, 00:00:16, Tunnel0
D        150.1.2.2 [90/28288000] via 14.1.1.2, 00:00:16, Tunnel0
D        150.1.3.3 [90/28288000] via 14.1.1.3, 00:00:16, Tunnel0
C        150.1.4.4 is directly connected, Loopback0
D        150.1.5.5 [90/27008000] via 14.1.1.5, 00:00:33, Tunnel0

The implications are
·        Summarization is not allowed on the hubàif summarized, all the traffic will take the path spoke-hub-spoke
·        Next-hop must always be preserved by the hub


Routing protocols in phase2:

Eigrp:
The following configuration must be done on the hub
R5(config-if)#int tun0
R5(config-if)#no ip split-horizon eigrp 100àto advertise networks behind the spokes
R5(config-if)#no ip next-hop-self eigrp 100

OSPF:
One of the main requirements in phase2 is, the routing protocol must preserve the next-hop.
We need to use the OSPF network type that preserves the next-hop, so ospf network type point-to-multipoint is not supported in phase2.

Routing table with ospf network type point-to-multipoint
R2#show ip route ospf
O        150.1.1.1 [110/2001] via 14.1.1.5, 00:0:27, Tunnel0
O        150.1.3.3 [110/2001] via 14.1.1.5, 00:0:27, Tunnel0
O        150.1.4.4 [110/2001] via 14.1.1.5, 00:0:34, Tunnel0

To run ospf in phase2, we need to use the network type Broadcast or NBMA which preserves the next-hop.
It means
·        we need to configure the spokes that they never become DR and BDR(spoke to spoke direct flooding is not possible, spokes all are in same layer3 but not in same layer2 )
·        Not more than 2 hubs are permitted one DR and BDR

By default the network type on tunnel interface is point-to-multipoint, the following configuration must be done on the spokes
R4(config-if)#int tun0
R4(config-if)#ip ospf priority 0 àso that spokes never attempt to claim as DR/BDR because hub cannot preempt them once they think they are DR/BDR
R4(config-if)#ip ospf network broadcast

On the hub,
R5(config-if)#int tun0
R5(config-if)#ip ospf network broadcast

The routing table with ospf network type broadcast/NBMA
R2#show ip route ospf
O        150.1.1.1 [110/2001] via 14.1.1.1, 00:0:27, Tunnel0
O        150.1.3.3 [110/2001] via 14.1.1.3, 00:0:27, Tunnel0
O        150.1.4.4 [110/2001] via 14.1.1.4, 00:0:34, Tunnel0

The next-hop is preserved and when R2 wants to communicate with 150.1.3.3,a spoke-to-spoke tunnel will be established between R2 and R3

A nice and simple explanation of spoke-to-spoke tunnel creation,

Here are the steps,
  1. R2 gets a packet with a next hop R3. There is no NHRP map entry for R3, so an NHRP resolution request is sent to the hub.
  2. The request from R2 will also have the NBMA address of R2. The hub relays the request to R3.
  3. R3 receives the request, adds its own address mapping to it and sends it as an NHRP reply directly to R2.
  4. R3 then sends its own request to the hub that relays it to R2.
  5. R2 receives the request from R3 via the hub and replies by adding its own mapping to the packet and sending it directly to R3
Technically, the requests themselves provide enough information to build a spoke to spoke tunnel but the replies accomplish two things. They acknowledge to the other spoke that the request was received and also verify that spoke to spoke NBMA reachability exists.
DMVPN PHASE3:

The problem with phase2 is scalability.

·        Summarization is not allowed at hub, as a result all the spokes must have routes to all the subnets. This results in huge routing tables/updates.
·        Scalability when the no. of devices increases, very good explanation is provided at the following link

Phase3 solves the main issue of phase1 in a different way.
When the spoke forwards a packet to the hub, the hub will check if the destination is reachable via the same tunnel and in such a case will redirect the spoke to the destination attached spoke.

This is how phase3 works,





1.     R1 and R2 announce the subnets attached to the hub.
2.     The hub can be configured to advertise a default route to the spokes.
3.     Now R1 needs to send traffic to 23.1.1.1, it will send the packet to hub.
4.     The hub will see that the destination is reachable via the same tunnel, so it will send nhrp redirect packet to R1.
5.     R1 will send an nhrp resolution request for the IP 23.1.1.1.The hub will relay this nhrp request to R2.
6.     R2 will send the nhrp reply directly to R1(nhrp request packet will have the nbma address of R1).
7.     R1 will install a route in its routing table for the prefix 23.1.1.0/24 via 22.1.1.1 with an AD of 250

In phase3,a new route is installed in the routing table that tells the spoke how to reach the remote spoke.
We can do summarization and use default routes at the hub.

Configuration:

On the hub,
R5(config-if)#int tun0
R5(config-if)#ip nhrp redirect

On the spokes,
R4(config-if)#int tun0
R4(config-if)#ip nhrp shortcutàmake sure tunnel mode is gre multipoint


OSPF:
In phase2, ospf network type point-to-multipoint is nor supported as the hub will not preserve the next-hop and will always set itself as the next-hop.
In phase3, we can use point-to-multipoint network type as the hub can send a redirect message for other spokes traffic.