Wednesday, 26 October 2016

IBM DS3500 Storage - Logical Drive not on preferred path due to ADT/RDAC failover




Problem Cause :-

It happens is that when a LUN is owned by a preferred controller and it failed over to the other controller, you get the error message. 



Explanation :- 

Lets explain about LUN fail over using a real life example. Say host A loses access to array controller port B0 (which is currently active & preferred across all hosts). It must then select a new path. It may try to select controller port B1 then, but if this is also dead(here the controller ports might be working and because of some slower response also hosts will select other path), it might select port A0 and move all its I/O over to that path. Since we are switching I/O to a different controller, this will cause a trespass where LUNs may have to move ownership from controller B to controller A. If controller B later becomes available, then host A can switch back to its original preferred path since it was the cause of the original failover. 

However other hosts, which had to move to controller A because of the actions taken by host A, cannot switch I/O back to controller B. Those other hosts do not know if host A can still access controller B (even if they can access it). But if host A can see the path to port B0 again, then it can try to switch back to the preferred path. 

On the other hand, if it was another host which lost access to controller B and they moved their active path to another port on controller A, this also implies that host A loses access to controller B as I/O has to move to the new active path on controller A. In this case host A cannot pull back the LUN to its preferred path on controller B, even if it has access to it. This is because host A did not initiate the failover and make the paths to controller B standby. 

-Trespass of Storage LUNs is common in any storage array and it depends on the internal SCSI algorithms within storage OS designed by the vendor due to storage path access issues. 
-However if you still want to move your logical drives back you can do so by using the storage manager by hitting the option under Advanced -> Recovery -> Redistribute Volumes. 
-This will set the storage array status to optimal and please do this in a less I/O transaction time and if this is not setting the LUN to its original preferred path we would need to get in touch with IBM. 



Solution :-

The message received mean the logical drive changed its preferred path 
intermittently.  What happens is that when a LUN is owned by a preferred controller and 
it failed over to the other controller, you get the error message. 

Manually change the preferred path for these volumes in the Storage manager using 

Change >> Ownership/Preferred Path or Logical Drive >> Change >> 
Ownership/Preferred Path menu options 

Tuesday, 25 October 2016

Dirty Cow’ Linux Vulnerability


What is ‘Dirty Cow’ Linux vulnerability and will it impact you.


Dirty COW is a marketing name given to CVE-2016-5195. It describes a bug which allows a malicious actor to increase their level of privilege in a Linux environment up to and including ‘root’. The bug itself is an exploitable race condition. A race condition occurs when two different threads of execution are able to modify the state of the program or system based solely on timing.


Impact Statement

The core issue in CVE-2016-5195 has been present in all Linux kernels since version 2.6.22 released in 2007. The latest long term supported kernel version is 4.4.26. There are known in the wild exploits for CVE-2016-5195. Phil Oester, the security researcher who identified the vulnerability, first identified the issue through forensic log analysis of web server traffic. This implies exploit code is or will soon be part of malicious toolkits.

Mitigation

Mitigation of this issue is best accomplished via kernel update. The known exploit has been reported as non-viable for certain Linux distributions, but users of those distributions should minimize any patching delay to reduce the risk of exploit. Due to the nature of race conditions, the potential exists for other viable trigger models than those currently identified.

Does this impact me

Most users of desktop or server Linux devices are aware of the fact they have a Linux environment. For those users, obtaining an update from their Linux distribution is the ideal path to remediation. Linux as a platform has been used as a core operating system for consumer and industrial devices for much of its 25 year history. This continues today, and includes many of the most popular IoT and home automation devices on the market including devices such as internet routers, WiFi enabled thermostats and internet cameras. Owners of those devices should proactively contact the vendor to verify when an update will be available for their devices.”



How To Patch and Protect Linux Kernel Zero Day Local Privilege Escalation Vulnerability CVE-2016-5195


A very serious security problem has been found in the Linux kernel. A 0-day local privilege escalation vulnerability has existed for eleven years since 2005. This bug affects all sort of of Android or Linux kernel to escalate privileges. Any user can become root in less than 5 seconds. The bug has existed since Linux kernel version 2.6.22+. How do I fix this problem?


This bug is named as Dirty COW (CVE-2016-5195) is a privilege escalation vulnerability in the Linux Kernel. Exploitation of this bug does not leave any trace of anything abnormal happening to the logs. So you can not detect if someone has exploited this against your server.


What is CVE-2016-5195 bug?

From the project:

A race condition was found in the way the Linux kernel’s memory subsystem handled the copy-on-write (COW) breakage of private read-only memory mappings. An unprivileged local user could use this flaw to gain write access to otherwise read-only memory mappings and thus increase their privileges on the system.

A nasty bug for sure. Any local users can write to any file they can read, and present since at least Linux kernel version 2.6.22. Linus Torvalds explained:

This is an ancient bug that was actually attempted to be fixed once (badly) by me eleven years ago in commit 4ceb5db9757a (“Fix get_user_pages() race for write access”) but that was then undone due to problems on s390 by commit f33ea7f404e5 (“fix get_user_pages bug”).

In the meantime, the s390 situation has long been fixed, and we can now fix it by checking the pte_dirty() bit properly (and do it better). The s390 dirty bit was implemented in abf09bed3cce (“s390/mm: implement software dirty bits”) which made it into v3.9. Earlier kernels will have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the “yes, we already did a COW” rather than play racy games with FOLL_WRITE that is very fundamental, and then use the pte dirty flag to validate that the FOLL_COW flag is still valid.

A list of affected Linux distros (including VMs and containers that share the same kernel)

Red Hat Enterprise Linux 7.x
Red Hat Enterprise Linux 6.x
Red Hat Enterprise Linux 5.x
CentOS Linux 7.x
CentOS Linux 6.x
CentOS Linux 5.x
Debian Linux wheezy
Debian Linux jessie
Debian Linux stretch
Debian Linux sid
Ubuntu Linux precise (LTS 12.04)
Ubuntu Linux trusty
Ubuntu Linux xenial (LTS 16.04)
Ubuntu Linux yakkety
Ubuntu Linux vivid/ubuntu-core
SUSE Linux Enterprise 11 and 12.
Openwrt

How do I fix CVE-2016-5195 on Linux?

Type the commands as per your Linux distro. You need to reboot the box. Before you apply patch, note down your current kernel version:

$ uname -a
$ uname -mrs

Sample outputs:

Linux 3.13.0-95-generic x86_64



Debian or Ubuntu Linux

$ sudo apt-get update && sudo apt-get upgrade && sudo apt-get dist-upgrade



Reboot the server:

$ sudo reboot

Related: Ubuntu Linux users can hotfix this Linux kernel bug without rebooting the server.



RHEL / CentOS Linux 5.x/6.x/7.x

$ sudo yum update
$ sudo reboot



RHEL / CentOS Linux 4.x

$ sudo up2date -u
$ sudo reboot


Suse Enterprise Linux or Opensuse Linux


To apply all needed patches to the system type:

# zypper patch
# reboot


Verification

You need to make sure your version number has changed:

$ uname -a
$ uname -r
$ uname -mrs


Determine if your system is vulnerable

For RHEL/CentOS Linux, use the following script:

$ wget https://access.redhat.com/sites/default/files/rh-cve-2016-5195_2.sh
$ bash rh-cve-2016-5195_2.sh



For all other distro try PoC (proof of concept exploit code)

Grab the PoC:

$ wget https://raw.githubusercontent.com/dirtycow/dirtycow.github.io/master/dirtyc0w.c


Run it as follows. First be root:

$ sudo -s
# echo this is not a test > foo



Run it as normal user:

$ gcc -lpthread dirtyc0w.c -o dirtyc0w
### ***[ If you get an error while compiling code, try ***] ###
$ gcc -pthread dirtyc0w.c -o dirtyc0w
$ ./dirtyc0w foo m00000000000000000
mmap 56123000
madvise 0
procselfmem 1800000000
$ cat foo


Enabling X11 Access Control

Enabling X11 Access Control (Fixing xhost +)


Introduction


The number 1 rated high risk system vulnerability noted by the recent ISS audit of BNL was the use of "xhost +" or an open X display. Using "xhost +" allows anyone the ability to watch your keystrokes, capture windows and insert command strings into your windows. This situation is particularly bad when you have root access to a machine. There is no legitimate reason to run "xhost +". Most people will be using ssh to make their connections to other machines than their desktop and ssh tunnels X11 traffic, eliminating any need for "xhost +". To use turn on X11 forwarding with ssh call it like:

ssh -X host.domain

This can be turned on by default by adding the following to $HOME/.ssh/config:

Host *.bnl.gov
ForwardX11 yes

Make sure of the following things:


• You should not set your DISPLAY variable, ssh will do it for you. It will look something like:
• echo $DISPLAY
• localhost:12.0
• X11 forwarding must be allowed by the SSH server. Check /etc/ssh/sshd_config for a line saying

"X11Forwarding yes".

Windows Machines Running eXceed Version 6.2 or better

On a Windows machine running eXceed, go into the "Security" part of "Xconfig", select "Enabled (no host access)" in the "Host Access Control List" part of the window and click "OK". If eXceed is running, you will lose all open windows when the X server gets reset.

Windows Machines Running older eXceed Versions

On Windows machine running version 6.1 or older of eXceed, the option listed above just shuts down the X server. There are two options, upgrade to a newer version (Current is 7.0) or use the "File" option. If you select the "File" option on the Xconfig security page, select the "Edit" button on that line and add 127.0.0.1to the end of the file. Save the file and click and click "OK". If eXceed is running, you will lose all open windows when the X server gets reset.

eXceed and No ssh

If you do not use ssh to make your X connections under eXceed (you really should), then you have to use the "File" method of security as outlines in the version 6.1 and older section and add all of the names of the machines from which you will be opening X applications to the xhost.txt file. Since this method only provides security at the host level, anyone on the machines you let in can watch your X sessions.


UNIX and Linux Machines

On Linux/UNIX machines, the "xhost +" command can be issued at many locations, so you will have to remember where you did it or find the location to turn it off (I believe that all recent version of the Linux X server have "xhost -" as the default). If you cannot find where the "xhost +" command is issued, adding a call to "xhost -" somewhere will turn it off.
Some of the most common files where you can find the "xhost +" command are in the X11 startup files. These file are

$HOME/.Xclients
$HOME/.Xclients.gnome
$HOME/.Xclients.kde
$HOME/.xinitrc
$HOMN/.xsession
/etc/X11/xinit/xinitrc
/usr/X11R6/bin/startx
/usr/X11R6/lib/X11/xdm/Xsession


Also, doing a man xinit will give you more information on startup files which are executed when one starts up X11.


If you want to test to see whether you have fixed the "xhost +" problem on your systems, log into another unix computer, disable the ssh X11 encryption channel by resetting the $DISPLAY environment variable back to the server port 0 of your desk top, and then try starting up an xclock. For example, type the following commands

   ssh youraccount@yourfavoritunixserver.phy.bnl.gov
   setenv DISPLAY yourdesktop.phy.bnl.gov:0
   xclock

If an xclock pops up on your screen, you still have not properly enabled X11 access control. You should contact your computer liaison for further assistance.

Xterminals

To enable access control (set xhost -) on Tektronix Xterminals bring up the "Setup" menu (F3 key). In the "Configuration Summaries" pull down menu select "X Environment". On the X Environment page toggle "Enable Access Control" to "Yes". Return to the Main Menu and then "Save Settings to NVRAM". The terminal will now reject all X connections except those coming from the machine you connect to via XDM and those coming through tunnels to you XDM host created when you ssh to another machine. If you run "xhost +" on the XDM host, then you will again disable access control, so you should make sure that you do not do this in any of the X setup files (see the UNIX discussion above).

The following is an e-mail from Ofer Rind who tells us how to enable X11 authentication on NCD Xterminals. Thanks Ofer for you post.

------------
Disabling Xhost+ on an Xterminal

(NB: This was tried on both NCD and Textronix Xterminals and seemed to
work; however, your mileage may vary.  The description is for an NCD.)

Press Alt-F3 to pull up the Xterminal control bar.  Select "Change Setup Parameters" from the "Setup" menu.  When the setup parameters window pops up, select "Access Control."  This will expand the menu, revealing an option called "Enable Access Control."  Turn this on by pressing the
adjacent square.  Then, at the bottom of setup window, press the "Apply" button to effect the change. This sometimes takes several seconds, be patient.  When the arrow cursor returns, close the setup window and return to your previously scheduled program.  X access control should now (hopefully) be enabled.  NOTE that this access control can be superseded by a user who logs in on the Xterm and sets "xhost +".

-------------

ECC Memory - ECC RAM




ECC Memory



ECC stands for "Error Correction Codes" and is a method used to detect and correct errors introduced during storage or transmission of data. Certain kinds of RAM chips inside a computer implement this technique to correct data errors and are known as ECC Memory.
ECC Memory chips are predominantly used in servers rather than in client computers. Memory errors are proportional to the amount of RAM in a computer as well as the duration of operation. Since servers typically contain several Gigabytes of RAM and are in operation 24 hours a day, the likelihood of errors cropping up in their memory chips is comparatively high and hence they require ECC Memory.

Memory errors are of two types, namely hard and soft. Hard errors are caused due to fabrication defects in the memory chip and cannot be corrected once they start appearing. Soft errors on the other hand are caused predominantly by electrical disturbances.
Memory errors that are not corrected immediately can eventually crash a computer. This again has more relevance to a server than a client computer in an office or home environment. When a client crashes, it normally does not affect other computers even when it is connected to a network, but when a server crashes it brings the entire network down with it. Hence ECC memory is mandatory for servers but optional for clients unless they are used for mission critical applications.
ECC Memory chips mostly use Hamming Code or Triple Modular Redundancy as the method of error detection and correction. These are known as FEC codes or Forward Error Correction codes that manage error correction on their own instead of going back and requesting the data source to resend the original data. These codes can correct single bit errors occurring in data. Multi-bit errors are very rare and hence due not pose much of a threat to memory systems.