Wednesday, 13 May 2015

Cost Savings for Azure Backup

In order to receive this, all that needs to be done is choose this option (see instructions below).


Information on how to enable:

There have been queries from Azure Backup EA customers on how to get the new pricing. The steps are outlined below:


(Note: non-EA and new EA customers should already be on the new pricing by default – this choice is available only to existing EA customers)




1.     Go to the Azure portal.

2.     Go to the backup vault, Click on configure

3.     Change the billing model and save

4.     If there are no items already registered/protected to Azure Backup, you can also change the storage type


Best practices regarding SAN Switch Zoning

Best practices regarding zoning are as follows:

n   The zones should always be 2 – member zones. That is, a zone should contain a single  initiator and a single target.

n   The same initiator or target can be a part of multiple zones. For eg: If you have to zone one server HBA port with 2 storage ports, then create 2 zones, with HBA port <->36

n  Storage port 1, and HBA port <-> Storage port 2. Do not create a single zone with the HBA port, and both the storage ports.

n   The alias-name should be easily identifiable and unambiguous. For eg: Rather than a  name like ‘database_port’ you can give ‘hostname_HBA1_P1’ which is more
unambiguous.

n   The zone name should be alias1_alias2. This establishes a clear pattern, and easily tells us which members are present in a particular zone.

n   The common switch administration commands are as follows:

switchshow: This is a frequently used command, and can give you data about which devices are plugged into the switch at the time of the command fire. It shows a port-byport
listing of WWPNs plugged into the switch. This is very useful to note down the WWPN of any particular device which may be required further. It also shows the currently effective configuration in the switch.

n   cfgshow: This shows the currently enabled configuration. This is useful to see exactly which zones are currently enabled in the switch.

n   alicreate “aliasname”,”<WWPN>”: This is used to create an alias, which is an easily understandable name for a WWPN of a device.

n   zonecreate “zonename”,”member1;member2”: This is used to create a zone. Here member1 and member2 should be aliases.

n   cfgadd “cfgname”,”member1;member2;member3…”: This is used to add the 3 members (in this case, zones) to the configuration.

n   cfgsave: Will save changes to the configuration. These changes are not applied as of yet.
n   cfgenable cfgname: Will apply changes to the current fabric.
n   cfgremove “cfgname”,”member1; member2…”: Will remove zones from the configuration.
n   zonedelete: deletes a zone.
n   alidelete: deletes an alias.
n   licenseshow: Lists the licensed features.
n   licenseadd <licensekey>: Adds licenses to the switch.
n   configshow: this command shows the fabric configuration in the switch. This command is different from cfgshow in that it shows the fabric parameters as opposed to the zoneset.
n   Remember, after any sequence of operations, you have to fire cfgsave and cfgenable to actually commit the changes.
Example: Configuring 2 WWPNs in the fabric:

n   2 WWPNs are available, one as the host port and one as the storage port. The configuration name is KB3001_Config. All these details are obtained from switchshow
output. The 2 WWPNs are 21:00:00:1b:32:86:40:8c and 20:27:00:a0:b8:47:ac:ce.

n   The command sequence is as follows:
§ alicreate “Node1_Pci2”,” 10:00:00:00:c9:89:7a:e1”
§ alicreate “Storage_B1”,” 20:13:00:a0:b8:56:4e:fc”
§ zonecreate “Storage_B1_Node2_Pci3”,” Node2_Pci3; Storage_B1”
§ cfgadd “switch1_config”,”M4000_HBA1_P1_Stor_B2”
§ cfgsave
§ cfgenable switch1_config

This will enable communication between the 2 WWPNs.

SAN Switch - ISL Trunking

Have a look at the below article  which describes the method to do ISL ( Inter Switch Link ). We have done it between Brocade 48000 Director SAN switch and Brocade Silkworm 3900 SAN switch.

Prerequisites for cascading SAN Switches:

·         The various compatibility criteria’s regarding the host OS, HBA Card, SAN switch model, FOS versions etc has to be verified.
·         Need to collect the configuration backup of both the SAN switches. ( Commands : configupload, switchshow, zoneshow, fabricshow, supportshow )
·         The Domain ID’s of both the SAN Switches should be unique.
·         Change the names of switches for better understanding using switchname command.
·         The firmware (FOS) of both the SAN switches has to be upgraded to the latest as per the compatibility.

The procedure:

1.       Check the Domain ID’s on Both the SAN Switches using fabricshow command. And change the Domain ID of the switch that is going to be cascaded. Reboot the switch after changing the Domain ID and check whether Domain Id is changed.
2.       Disable and clear the Zone Configuration of the switch that is going to be cascaded using cfgdisable, cfgclear and cfgsave commands.
3.       Disable the Switch using switchdisable command and change the Role of the Switch from Principal to Subordinate usingfabricprincipal command.
4.       Configure e-ports in both the Switches using portcfgeport command and connect the FC cables between the switches through this ports.
5.       Enable the disabled Switch using switchenable command and check the Switch Role in both the Switches using switchshow command.


Detailed activity logs are given in the attachment. Go through it for more insight.


ISL Activity logs


Here I have two switches. One switch was already in production and the other one has 

to be cascaded to the production switch.

Production Switch Name: IC_Prod_SW1

1. To Change the Name of the Switch use the below mentioned commands in both 

Swd77: admin> switchname IC_Prod_SW1

Swd77: admin> switchname IC_DR_SW1

I am going to cascade the Second Switch with the Production Switch. 



2. Take configuration backup of both the Switches

Server Name or IP Address [host]: 193.1.1.177

File Name [config.txt]: config_IC_DR_SW1.txt

Configupload complete: All config parameters are uploaded

IC_Prod_SW1: admin> configupload

Server Name or IP Address [host]: 193.1.1.177

File Name [config.txt]: config_IC_Prod_SW1.txt

Configupload complete: All config parameters are uploaded 



3. Check the Domain ID of both the Switches

Switch ID Worldwide Name Enet IP Addr FC IP Addr Name

-------------------------------------------------------------------------

1: fffc01 10:00:00:05:1e:8d:b3:27 193.1.1.185 0.0.0.0 >" IC_Prod_SW1"

Switch ID Worldwide Name Enet IP Addr FC IP Addr Name

-------------------------------------------------------------------------

1: fffc02 10:00:00:05:1e:8b:f5:ef 193.1.1.166 0.0.0.0 >"IC_DR_SW1"



4. Disable the Switch and Change the Domain ID of the DR Switch

Fabric parameters (yes, y, no, n): [no] y

Insistent Domain ID Mode (yes, y, no, n): [no]

Virtual Channel parameters (yes, y, no, n): [no]

F-Port login parameters (yes, y, no, n): [no]

Zoning Operation parameters (yes, y, no, n): [no]

RSCN Transmission Mode (yes, y, no, n): [no]

Arbitrated Loop parameters (yes, y, no, n): [no]

Portlog events enable (yes, y, no, n): [no]

webtools attributes (yes, y, no, n): [no]

WARNING: The domain ID will be changed. The port level zoning may be affected




5. Login and check the domain id of the DR Switch

Switch ID Worldwide Name Enet IP Addr FC IP Addr Name

-------------------------------------------------------------------------

100: fffc02 10:00:00:05:1e:8b:f5:ef 193.1.1.166 0.0.0.0 >"IC_DR_SW1" 



6. Disable and Clear the Zone configurations from DR Switch

You are about to disable zoning configuration. This

action will disable any previous zoning configuration enabled.

Do you want to disable zoning configuration? (yes, y, no, n): [no] y

The Clear All action will clear all Aliases, Zones, FA Zones

and configurations in the Defined configuration.

cfgSave may be run to close the transaction or cfgTransAbort

Do you really want to clear all configurations? (yes, y, no, n): [no] y

Note: Disabling the Zone requires downtime



7. Disable the Switch and Change the Role of the Switch From Principal to 

IC_DR_SW1: admin> fabricprincipal 0



8. Configure E-ports in both the Switches

IC_DR_SW1: admin> portcfgeport 7 1

IC_Prod_SW1: admin> portcfgeport 7 1

Note: To make a particular port a E port use portcfgeport <port. No> 1 (1 to enable 



9. Connect the FC cables between both the Switches and enable the disabled switch

=====================================

1 1 id N4 Online F-Port 10:00:00:00:c9:86:1b:94

7 7 id N4 Online E-Port 10:00:00:05:1e:8d:b3:27 "IC_Prod_SW1" (upstream)

8 8 -- N8 No_Module (No POD License) Disabled

9 9 -- N8 No_Module (No POD License) Disabled

10 10 -- N8 No_Module (No POD License) Disabled

11 11 -- N8 No_Module (No POD License) Disabled

12 12 -- N8 No_Module (No POD License) Disabled

13 13 -- N8 No_Module (No POD License) Disabled

14 14 -- N8 No_Module (No POD License) Disabled

15 15 -- N8 No_Module (No POD License) Disabled

16 16 -- N8 No_Module (No POD License) Disabled

17 17 -- N8 No_Module (No POD License) Disabled

18 18 -- N8 No_Module (No POD License) Disabled

19 19 -- N8 No_Module (No POD License) Disabled

20 20 -- N8 No_Module (No POD License) Disabled

21 21 -- N8 No_Module (No POD License) Disabled

22 22 -- N8 No_Module (No POD License) Disabled

23 23 -- N8 No_Module (No POD License) Disabled

In the above output 7th Port is changed to E Port and will have upstream. The Same port 

in the First Switch will also have E Port with downstream. The Role of First Switch 

(IC_Prod_SW1) will be Principal and the Role of Second Switch (IC_DR_SW1) will be 

=====================================

1 1 id N4 Online F-Port 10:00:00:00:c9:86:1f:1d

2 2 id N4 Online F-Port 10:00:00:00:c9:86:20:39

7 7 id N4 Online E-Port 10:00:00:05:1e:8b:f5:ef "IC_DR_SW1" (downstream)

8 8 -- N8 No_Module (No POD License) Disabled

9 9 -- N8 No_Module (No POD License) Disabled

10 10 -- N8 No_Module (No POD License) Disabled

11 11 -- N8 No_Module (No POD License) Disabled

12 12 -- N8 No_Module (No POD License) Disabled

13 13 -- N8 No_Module (No POD License) Disabled

14 14 -- N8 No_Module (No POD License) Disabled

15 15 -- N8 No_Module (No POD License) Disabled

16 16 -- N8 No_Module (No POD License) Disabled

17 17 -- N8 No_Module (No POD License) Disabled

18 18 -- N8 No_Module (No POD License) Disabled

19 19 -- N8 No_Module (No POD License) Disabled

20 20 -- N8 No_Module (No POD License) Disabled

21 21 -- N8 No_Module (No POD License) Disabled

22 22 -- N8 No_Module (No POD License) Disabled

23 23 -- N8 No_Module (No POD License) Disabled

10. Check the fabricshow output in both the switches

Switch ID Worldwide Name Enet IP Addr FC IP Addr Name

-------------------------------------------------------------------------

1: fffc01 10:00:00:05:1e:8d:b3:27 193.1.1.185 0.0.0.0 >"IC_Prod_SW1"

100: fffc64 10:00:00:05:1e:8b:f5:ef 193.1.1.166 0.0.0.0 "IC_DR_SW1"

In the above output the “>” symbol denotes that switch is a Principal Switch

Netbackup 7.5 - Backup failed with snapshot error 156

Description

Job details error: ERR - Error encountered while attempting to get additional files for System State:\

On the client in the event viewer: The shadow copies of volume C: were aborted because the shadow copy storage could not grow due to a user imposed limit.


ISSUE:-

Backups failed and in event viewer of the client it showed "shadow copy storage could not grow due to a user imposed limit"



ERROR:-

Job details error: ERR - Error encountered while attempting to get additional files for System State:\

On the client in the event viewer: The shadow copies of volume C: were aborted because the shadow copy storage could not grow due to a user imposed limit.



TROUBLESHOOTING:-

Check the writers from command line

vssadmin list writers

All writers should show stable and no error. If any are in “waiting for completion” or “failed” status a restart of the server may clear the writer and restore it to a stable state.



SOLUTION/WORKAROUND:

After the writers are in a stable state check the shadow copy limit for each volume

This should be 10-15% of the actual drive space.

Open Computer Management for Windows 2003 clients

- In the console tree, right-click Shared Folders, -> select All Tasks, and click -> Configure Shadow Copies

- Click the volume where you want to make changes, and then click Settings.

- In the Settings dialog box, change the settings as appropriate..10-15% of the actual drive space

For Windows 2008 clients

Double click "my computer"

Right click the C drive and click "configure shadow copies"

Click the first volume and click settings

The "storage area" will show where the cache files are being created for that volume.

Change the "Maximum size" to "use limit" and set the limit to 10% of the volumes free space. Min is 300MB.

Repeat for the remaining volumes and retry the backup.

SAN Switch Inital Troubleshooting

First of all take a look at the over health of the switch:


CommandExplanationExample
switchstatusshowProvides an overview of the general components of the switch. These all need to show up HEALTHY and not (as shown here) as "Marginal"Sydney_ILAB_DCX-4S_LS128:FID128:admin> switchstatusshow
Switch Health Report                        Report time: 06/20/2013 06:19:17 AM
Switch Name:     Sydney_ILAB_DCX-4S_LS128
IP address:    10.129.2.143
SwitchState:    MARGINAL
Duration:    214:29

Power supplies monitor    MARGINAL
Temperatures monitor      HEALTHY
Fans monitor              HEALTHY
WWN servers monitor       HEALTHY
CP monitor                HEALTHY
Blades monitor            HEALTHY
Core Blades monitor    HEALTHY
Flash monitor             HEALTHY
Marginal ports monitor    HEALTHY
Faulty ports monitor      HEALTHY
Missing SFPs monitor      HEALTHY
Error ports monitor      HEALTHY


All ports are healthy
switchshow
Provides a general overview of logical switch status (no physical components) plus a list of ports and their status.

The switchState should alway be online.
The switchDomain should have a unique ID in the fabric.
If zoning is configured it should be in the "ON" state.

As for the ports connected these should all be "Online" for connected and operational ports. If you see ports showing "No_Sync" whereby the port is notdisabled there is likely a cable or SFP/HBA problem.

If you have configured FabricWatch to enable portfencing you'll see indications like here with port 75

Obviously for any port to work it should be enabled.
Sydney_ILAB_DCX-4S_LS128:FID128:admin> switchshow
switchName:    Sydney_ILAB_DCX-4S_LS128
switchType:    77.3
switchState:    Online 
switchMode:    Native
switchRole:    Principal
switchDomain:    143
switchId:    fffc8f
switchWwn:    10:00:00:05:1e:52:af:00
zoning:        ON (Brocade)
switchBeacon:    OFF
FC Router:    OFF
Fabric Name:    FID 128
Allow XISL Use:    OFF
LS Attributes:    [FID: 128, Base Switch: No, Default Switch: Yes, Address Mode 0]

Index Slot Port Address Media  Speed        State    Proto
============================================================
   0    1    0   8f0000   id    4G       Online      FC  E-Port  10:00:00:05:1e:36:02:bc "BR48000_1_IP146" (downstream)(Trunk master)
   1    1    1   8f0100   id    N8       Online      FC  F-Port  50:06:0e:80:06:cf:28:59
   2    1    2   8f0200   id    N8       Online      FC  F-Port  50:06:0e:80:06:cf:28:79
   3    1    3   8f0300   id    N8       Online      FC  F-Port  50:06:0e:80:06:cf:28:39
   4    1    4   8f0400   id    4G       No_Sync     FC  Disabled (Persistent)
   5    1    5   8f0500   id    N2       Online      FC  F-Port  50:06:0e:80:14:39:3c:15
   6    1    6   8f0600   id    4G       No_Sync     FC  Disabled (Persistent)
   7    1    7   8f0700   id    4G       No_Sync     FC  Disabled (Persistent)
   8    1    8   8f0800   id    N8       Online      FC  F-Port  50:06:0e:80:13:27:36:30
  75    2   11   8f4b00   id    N8       No_Sync     FC  Disabled (FOP Port State Change threshold exceeded)
  76    2   12   8f4c00   id    N4       No_Light    FC  Disabled (Persistent)
sfpshow <slot>/<port>
One of the most important pieces of a link irrespective of mode and distance is the SFP. On newer hardware and software it provides a lot of info on the overall health of the link.

With older FOS codes there could have been a discrepancy of what was displayed in this output as to what actually was plugged in the port. The reason was that the SFP's get polled so every now and then for status and update information. If a port was persistent disabled it didn't update at all so in theory you plug in another SFP but sfpshow would still display the old info. With FOS 7.0.1 and up this has been corrected and you can also see the latest polling time per SFP now.

The question we often get is: "What should these values be?". The answer is "It depends". As you can imagine a shortwave 4G SFP required less amps then a longwave 100KM SFP so in essence the SFP specs should be consulted. As a ROT you can say that signal quality depends ont he TX power value minus the link-loss budget. The result should be within the RX Power specifications of the receiving SFP.

Also check the Current and Voltage of the SFP. If an SFP is broken the indication is often it draws no power at all and you'll see these two dropping to zero.
Sydney_ILAB_DCX-4S_LS128:FID128:admin> sfpshow 1/1
Identifier:  3    SFP
Connector:   7    LC
Transceiver: 540c404000000000 2,4,8_Gbps M5,M6 sw Short_dist
Encoding:    1    8B10B
Baud Rate:   85   (units 100 megabaud)
Length 9u:   0    (units km)
Length 9u:   0    (units 100 meters)
Length 50u (OM2):  5    (units 10 meters)
Length 50u (OM3):  0    (units 10 meters)
Length 62.5u:2    (units 10 meters)
Length Cu:   0    (units 1 meter)
Vendor Name: BROCADE       
Vendor OUI:  00:05:1e
Vendor PN:   57-1000012-01 
Vendor Rev:  A 
Wavelength:  850  (units nm)
Options:     003a Loss_of_Sig,Tx_Fault,Tx_Disable
BR Max:      0 
BR Min:      0 
Serial No:   UAF110480000NYP
Date Code:   101125
DD Type:     0x68
Enh Options: 0xfa
Status/Ctrl: 0x80
Alarm flags[0,1] = 0x5, 0x0
Warn Flags[0,1] = 0x5, 0x0
                                           Alarm                  Warn
                                    low         high       low         high
Temperature: 25      Centigrade      -10        90         -5          85
Current:     6.322   mAmps           1.000      17.000     2.000       14.000
Voltage:     3290.2  mVolts          2900.0     3700.0     3000.0      3600.0
RX Power:    -3.2    dBm (476.2uW)   10.0   uW  1258.9 uW  15.8   uW   1000.0 uW
TX Power:    -3.3    dBm (472.9 uW)  125.9  uW  631.0  uW  158.5  uW   562.3  uW

State transitions: 1
Last poll time: 06-20-2013 EST Thu 06:48:28
porterrshow
For link state counters this is the most useful command in the switch however there is a perception that this command provides a "silver" bullet to solve port and link issues but that is not the case. Basically it provides a snapshot of the content of the LESB (Link Error Status Block) of a port at that particular point in time. It does not tell us when these counters have accumulated and over which time frame. So in order to create a sensible picture of the statuses of the ports we need a baseline. This baseline can be created to reset all counters and start from zero. To do this issue the "statsclear" command on the cli.

There are 7 columns you should pay attention to from a physical perspective.

enc_in - Encoding errors inside frames. These are errors that happen on the FC1 with encoding 8 to 10 bits and back or, with 10G and 16G FC from 64 bits to 66 and back. Since these happen on the bits that are part of a data frame these are counted in this column.

crc_err - An enc_in error might lead to a CRC error however this column shows frames that have been market as invalid frames because of this crc-error earlier in the datapath. According to FC specifications it is up to the implementation of the programmer if he wants to discard the frame right away or mark it as invalid and send it to the destination anyway. There are pro's and con's on both scenarios. So basically if you see crc_err in this column it means the port has received a frame with an incorrect crc but this occurred further upstream.

crc_g_eof - This column is the same as crc_err however the incoming frames areNOT marked as invalid. If you see these most often the enc_in counter increases as well but not necessarily. If the enc_in and/or enc_out column increases as well there is a physical link issue which could be resolved by cleaning connectors, replacing a cable or (in rare cases) replacing the SFP and/or HBA. If the enc_in and enc_out columns do NOT increase there is an issue between the SERDES chip and the SFP which causes the CRC to mismatch the frame. This is a firmware issue which could be resolved by upgrading to the latest FOS code. There are a couple of defects listed to track these.

enc_out - Similar to enc_in this is the same encoding error however this error was outside normal frame boundaries i.e. no host IO frame was impacted. This may seem harmless however be aware that a lot of primitive signals and sequences travel in between normal data frame which are paramount for fibre-channel operations. Especially primitives which regulate credit flow. (R_RDY and VC_RDY) and signal clock synchronization are important. If this column increases on any port you'll likely run into performance problems sooner or later or you will see a problem with link stability and sync-errors (see below).


Link_Fail - This means a port has received a NOS (Not Operational) primitive from the remote side and it needs to change the port operational state to LF1 (Link Fail 1) after which the recovery sequence needs to commence. (See the FC-FS standards specification for that)

Loss_Sync - Loss of synchronization. The transmitter and receiver side of the link maintain a clock synchronization based on primitive signals which start with a certain bit pattern (K28.5). If the receiver is not able to sync its baud-rate to the rate where it can distinguish between these primitives it will lose sync and hence it cannot determine when a data frame starts.

Loss_Sig - Loss of Signal. This column shows a drop of light i.e. no light (or insufficient RX power) is observed for over 100ms after which the port will go into a non-active state. This counter increases often when the link-loss budget is overdrawn. If, for instance, a TX side sends out light with -4db and the receiver lower sensitivity threshold is -12 db. If the quality of the cable deteriorates the signal to a value lower than that threshold, you will see the port bounce very often and this counter increases. Another culprit is often unclean connectors, patch-panels and badly made fibre splices. These ports should be shut down immediately and the cabling plant be checked. Replacing cables and/or bypassing patch-panels is often a quick way to find out where the problem is.



The other columns are more related to protocol issues and/or performance problems which could be the result of a physical problem but not be a cause. In short look at these 7 columns mentioned above and check if no port increases a value.

============================================
too_short/too_long - indicates a protocol error where SOF or EOF are observed too soon or too late. These two columns rarely increase.

bad_eof - Bad End-of-Frame. This column indicates an issue where the sender has observed and abnormality in a frame or it's transceiver whilst the frameheader and portions of the payload where already send to its destination. The only way for a transceiver to notify the destination is to invalidate the frame. It truncates the frame and add an EOFni or EOFa to the end. This signals the destination that the frame is corrupt and should be discarded.

F_Rjt and F_Bsy are often seen in Ficon environments where control frames could not be processes in time or are rejected based on fabric configuration or fabric status.

c3timout (tx/rx) - These are counters which indicate that a port is not able to forward a frame in time to it's destination. These either show a problem downstream of this port (tx) or a problem on this port where it has received a frame meant to be forwarded to another port inside the sames switch. (rx). Frames are ALWAYS discarded at the RX side (since that's where the buffers hold the frame). The tx column is an aggregate of all rx ports that needs to send frames via this port according to the routing tables created by FSPF.

pcs_err - Physical Coding Sublayer - These values represent encoding errors on 16G platforms and above. Since 16G speeds have changed to 64/66 bits encoding/decoding there is a separate control structure that takes car of this.

As a best practise is it wise to keep a trace of these port errors and create a new baseline every week. This allows you to quickly identify errors and solve these before they can become an problem with an elongated resolution time. Make sure you do this fabric-wide to maintain consistency across all switches in that fabric.
Sydney_ILAB_DCX-4S_LS128:FID128:admin> porterrshow
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss   frjt   fbsy  c3timeout    pcs
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig                 tx    rx    err
  0:  100.1m  53.4m   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   
  1:  466.6k 154.5k   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   
  2:  476.9k 973.7k   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   
  3:  474.2k 155.0k   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0