Upgrading my Medion akoya E1232T with an SSD.

For a while now, I’ve been using this little clamshell as my private traveling machine, I was dragging it along as far as Japan and in general, it never let me down. Granted, the battery lifetime isn’t great, the shrunk keyboard isn’t for writing a thesis and the 2 core baytrail Celeron isn’t the best-performing mobile CPU. But it’s tiny housing makes it fit into my A4-sized bag, it has a touch screen and with the 4G of main memory it even runs Visual Studio 2015 community at a decent speed. It also has an Ethernet port, a HDMI port, an SD-Card reader and an USB3 Host. And all this without adapters, dongles, port replicators etc. It’s even got 2.4 and 5GHZ WIFI and Miracast.  

The main drawback the machine has is its 500GB HGST spinning disk. But this was about to change.

So I found a 128 GB Sandisk SSD (Z400), 2.5’’ SATA at reichelt.de for a reasonable price. Its 7mm housing is the same size as the internal HGST drive, so I ordered one.

Now SanDisk offers a number of software packages that make your life easier with the SSD, the most important one is their SSD Dashboard. Inside, it also contains a link to a single use version of a harddisk to SSD Transfer software. So I downloaded the dashboard, hooked up the SSD to an USB to SATA converter and fired up the transfer software to check if this setup would be ok. But before making actual changes, I ran the disk2vhd tool from sysinternals to capture a full disk image of the internal harddisk to an external drive.

Now in order to do the actual transfer, I removed a lot of things from the old harddrive. I changed the OneDrive config to not keep anything local (down 20 GB), removed all local media files (down another 60GB), uninstalled some older versions of software (VS2010) and cleaned up my downloads directory. A very helpful tool for this process is windirstat that I just ran in a portable version. (I actually keep it around in the tools directory of my OneDrive.)

After having shrunk down the content of the C drive, I found that I still had a D drive that the transfer software insisted to move to the SSD. Now on the Medion akoya, that’s actually the recovery drive used to reset the machine to its initial state that it came in which was Windows 8. Now I never planned to go back to that, so I decided to remove the partition to save some precious SSD space. Note that if you do that, the recovery function of the notebook that’s triggered by holding down F11 upon boot will not work anymore. But I decided that I’ll be fine with using the build-in recovery mechanisms of Windows 10. But that’s up to everyone to decide for himself.

So I then fired up the transfer software and a few hours later, I had a SSD with the content of my harddrive. So I disconnected the SSD in its USB-Sata housing and then shut down the machine.

The disassembly process was actually very smoth and simple, essentially it was removing the screws on the bottom and then using my trusty iFixit Spudger to carefully pry open the plastic housing. After that, it was just two more screws to remove the harddisk frame, pulling out carefully the SATA cable and then a few more screws to take the harddisk out of the frame. I then mounted the SSD into the frame, fastened the screws, put the harddisk into its place in the housing, carefully attached the SATA cable, fastened the screws of the harddisk frame, then put on the plastic cover and tightened all the remaining screws. Needless to say I did all this with the machine shut down, the power supply disconnected and paying attention not to damage the LiPo battery since these can get rather nasty when punctured in the wrong spot. (make that: in any spot!).

Booting up the machine initially got me a boot failure (probably since Windows 10 actually doesn’t shut down on “shutdown”, but actually hibernates, but the “shrinking process” left the SSD with a stale hibernation image that Windows correctly refused to restore) but the second boot was all right and from then on everything worked as it should.

Or almost…

Working with the machine a few hours made me notice a strange behavior. A couple of times every hour the machine would “freeze” for a few seconds and almost do nothing. But the mouse cursor was still working (so no blocked interrupts) and even the GUI of some apps was still responsive, but other apps just froze. Even Windows would sometimes gray them out and show “not responding” dialogs.

And then I noticed that during this time, the HDD LED was full on. Not the usual flicker when the disk was working but just lighting up steadily. So I fired up the task manager and looked for processes with unusual activity. There were a few random processes that seemed to be “stuck” on I/O, but there was no clear pattern. So I switched over to the “Performance” tab and took a look at the Disk IO graph. And whenever the system behaved “frozen”, the disk activity percentage graph would be stuck at 100% busy while the throughput graph would show zero throughput. After a couple (10 to 40) seconds the activity would drop and the throughput would go up as if nothing had happened.

After watching this for a couple of days (and even seeing one or two bugchecks (AKA blue screens) during disk activity, I decided to involve Sandisk’s support.

After a couple of obvious starter questions (have you tried using a different SATA connection? No, I only have one in my notebook. Is the BIOS/OS/Drive Firmware up to date? Yes, I checked in your SanDisk Dashboard!) Sandisk recommended that I format the harddisk and reinstall everything. So I actually did what they asked me to do on the idea that maybe with the initial windows 8 install and the insider updates and then the final Windows 10 bits installed there was something “stuck” in the driver versions installed.

After re-installing Windows 10 (which amazingly worked without any major trouble, Windows recognized my already-activated Windows 10 license I got by upgrading the machine from Windows 8, I even did not have to install a single driver by hand since they now seem to all be available in Windows Update!) I started checking for the presence of the bug. And yes, it was still there, on my clean install machine. Here’s a screenshot of how this looks in the task manager: Disk 100% busy, no data transfer. In this case for about 45 seconds.

Neu installiert

So I started looking at the documentation of the Z400 drive at the Sandisk website. To be precise, it’s a SanDisk SD8SBAT128G1122Z 128G

Turns out, it’s actually not a consumer drive, it’s mostly meant for embedded OEM systems like point-of-sales terminals (aka cash registers). And then I dug some more and found a standalone firmware updater for the drive called “ssdupdater-z2201000-or-z2320000-to-z2333000”. Wait! Didn’t the Sandisk dashboard just tell me that there was no firmware update? But the same dashboard told me that the firmware revision of my drive was z2320000. OK, maybe the ssd dashboard does not know about these embedded drives and only knows about consumer drive firmware updates. So I downloaded and ran the standalone firmware updater and voila: The bug disappeared, no further bluescreens and the machine feels about 5 times faster than before.

So, my lessons learned for today: Don’t trust support too much, especially if going through consumer/end-user channels. You might have hardware they don’t even know about. And don’t trust their tools. You might get wrong answers.

To be precise, the Sandisk support was really quick to answer for a consumer query that came to them via a web form. The answers were professional and to the point without any useless chitchat, but if the answer isn’t available to them, they simply can’t help. So it would be great if either Sandisk could enhance their SSD dashboard tool to give correct answers or enhance their support database so that this bug can be found. Because I’m pretty sure it is documented somewhere in the bug list of firmware Z2320000 or the release notes of firmware z2333000.

Hope this helps,

H.

Posted in Computers | Leave a comment

AzureIotHubProxy

Today, I uploaded a project to github that I wrote in the last weeks in order to simplify things with the Azure IoT hub for demos, makers etc.

If you haven’t heard about Azure IoT hub, this is a very nice service you can use to hook up your IoT devices to a central service that you can use to receive data, send commands and, in general, manage your devices.

https://azure.microsoft.com/en-us/documentation/services/iot-hub/ is the official starting point for the documentation, but basically, the Azure IoT hub has a device and a service API. Through the device api, you can basically send messages to the cloud and receive messages from the cloud. The cool thing about this is that the device side only does outbound connections (e.g. this works through firewalls, through NAT devices such as DSL routers and even through IP connections provided by mobile phone providers. Read this again: Back channel to your device works through mobile phone network!

And the best thing: This service incudes a free tier that allows you to register 500 devices and send 8.000 Messages of 0.5k per day. See here https://azure.microsoft.com/en-us/pricing/details/iot-hub/ for details.

But in order to get to all this goodness, you need to manage the IOT hub via its service API. You can do that through the Device Explorer tool (see https://github.com/Azure/azure-iot-sdks/tree/master/tools/DeviceExplorer ) but that’s a manual process that involves creating devices on the hub and then copying the device connection strings manually into the device configuration. Or you can deal with the standard management API which is a bit tricky to use and actually would require you to keep the management keys where ever you would like to manage it.

Wouldn’t it be nice if the devices could actually manage themselves?

So I wrote a little API Proxy service that the device can query to get a connection string. The service just implements four calls.

GET /api/Device get just returns the list of devices configured in a JSON form

GET /api/Device/(id) returns the JSON just for this device

POST /api/Device/(id) creates a new device in the IOT hub and returns a JSON that includes a connection string

DELETE /api/Device/(id) deletes the device in the IOT hub.

In order to secure these, they all require an API key send in the query string.

The implementation I made is really simple and not very secure. But it can be used as a starting point to think about more complex authentication schemes, e.g. one could implement a one-time token mechanism that would only allow a single device registration for each token.

To try out the implementation, I added a swagger interface, so if you go to /swagger/ you can play around with the API yourself. You should disable that for production use.

The service can easily be run in an Azure Web App. And again, there is a free tier that is sufficient to run this service. See here https://azure.microsoft.com/en-us/pricing/details/app-service/ Azure app services also support SSL that you should use in order to protect your API key. (SSL is not supported for custom domains, in the free tier so your website will all end on “azurewebsites.net”)

To get started, clone the project from github https://github.com/holgerkenn/AzureIotHubProxy and then go to https://azure.microsoft.com/free/ to start a free trial on azure in case you don’t have a subscription yet. Through this link, you will also get some free credit to use the paid azure services for a limited time, but since everything presented here also works on the free tiers of the services, you can actually run all this even after the free trial credits expire.

Go to https://azure.microsoft.com/en-us/develop/iot/ to see how to create your first IoT hub, then get its connection string from the Azure Portal and add it to the web.config file in the repository. Then create a web app on Azure as explained here https://azure.microsoft.com/en-us/documentation/services/app-service/web/ and publish the service to this app. In Visual Studio, this is as simple as right-clicking the project, selecting publish and then “Microsoft Azure App Service”. This will then guide you to select or create a new Azure web app for your service. After the publish, your service should be up and running. And since the swagger api is enabled, you will find the trial api on “https://<yourservicename>.azurewebsites.net

Then you can go and compile the test client. Enter the name of your web app in program.cs. When you run it, it will connect to the service, create a device named “1234567” and send a few messages to the IoT hub. If you have device explorer connected, you can receive those messages and send a few back.

And now you should probably change that default API key (“1234”) and republish.

Hope this helps,

H.

Posted in Uncategorized | Tagged , | Leave a comment

My MSDN Blog posts are now here as well

Since I’m waiting for my work machine to install the newest insider build of Windows 10, I decided to polish the old blog a bit. So I decided to add a plugin to my blog that will pull the posts from the MSDN blog into this one. From now on, they will automatically show up in the Microsoft category.

And then I decided to push a few things to my github repository https://github.com/holgerkenn/ as well.

H.

Posted in Microsoft | Leave a comment

Goodbye Facebook

So finally, after being on Facebook for, well as long as Facebook existed, I decided to deactivate my account today. I still remember the time when you actually needed an .edu or otherwise academic e-mail address to register, that must have been in April or May 2004 when I was still at Jacobs International University Bremen.

So after about 10 years, I think it’s been a great time but since I haven’t looked at anything there since last year and also did not miss it too much, I decided to get rid of the dormant account.

In addition I realized that anybody who wants to contact me can find me via any search engine. And as far as my professional life is concerned, linkedin, xing and twitter seem to be more useful.

H.

Posted in Technology | Leave a comment

My azure scripts on github

 

Hi!

I’ve decided to put my azure scripts on Github, that keeps them in one place and I can update whenever I find bugs.

https://github.com/holgerkenn/azurescripts 

I have more scripts in the queue, but I first need to remove credentials, hostnames etc. before I put them on github.

Hope it helps,

H.


Source: msdn

Posted in Microsoft | Leave a comment

Linux and Azure Files: you might need some help here…

 

Hi!

tl;dr: To mount Azure Files from linux, you need cifs support in the kernel, the right mount helper and versions recent enough to supports the SMB2 protocol version.

I just got a ping from a customer who had trouble mounting an Azure Files filesystem from Linux. According to the Azure team blog, this should work: http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/12/introducing-microsoft-azure-file-service.aspx 

So I tried it myself on a Ubuntu 14.04 LTS and found the following:

If I used smbclient, everything worked:

kenn@cubefileclient:~$ smbclient -d 3 //cubefiles.file.core.windows.net/cubefiletest <storage key goes here> -U cubefiles -m SMB2
[lots of debug output deleted here]
Connecting to 168.61.61.18 at port 445
Doing spnego session setup (blob length=0)
server didn’t supply a full spnego negprot
Got challenge flags:
Got NTLMSSP neg_flags=0x628a8015
NTLMSSP: Set final flags:
Got NTLMSSP neg_flags=0x60088215
NTLMSSP Sign/Seal – Initialising with flags:
Got NTLMSSP neg_flags=0x60088215
Domain=[X] OS=[] Server=[]
smb: > dir

  .                                       D        0  Mon Sep  8 14:49:55 2014
  ..                                      D        0  Mon Sep  8 14:49:55 2014
  testdir                             D        0  Mon Sep  8 14:47:08 2014
                83886080 blocks of size 65536. 83886080 blocks available
Total bytes listed: 0
smb: > quit

Don’t be alarmed by all those scary looking messages, I’m running smbclient with –d 3, so there are a lot of debug messages.

Now I tried to mount the filesystem:

kenn@cubefileclient:~$ sudo bash
root@cubefileclient:~# mount -t cifs \cubefiles.file.core.windows.netcubefiletest /mountpoint -o vers=2.1,username=cubefiles,password=<storage key goes here>,dir_mode=0777,file_mode=0777
mount: wrong fs type, bad option, bad superblock on cubefiles.file.core.windows.netcubefiletest,
       missing codepage or helper program, or other error
       (for several filesystems (e.g. nfs, cifs) you might
       need a /sbin/mount.<type> helper program)
       In some cases useful info is found in syslog – try
       dmesg | tail  or so

OK, this did not work.

So let’s check if the cifs filesystem is actually in the kernel.

root@cubefileclient:~# grep cifs /proc/filesystems
nodev   cifs

Yes, looks good.

So is there a mount helper for cifs?

root@cubefileclient:~# ls -la /sbin/mount.cifs
ls: cannot access /sbin/mount.cifs: No such file or directory

That’s it! we’re missing the mount helper!

root@cubefileclient:~# apt-get install cifs-utils

root@cubefileclient:~# mount -t cifs \cubefiles.file.core.windows.netcubefiletest /mountpoint -o vers=2.1,username=cubefiles,password=<storage key goes here>,dir_mode=0777,file_mode=0777

root@cubefileclient:~# mount
[…]
\cubefiles.file.core.windows.netcubefiletest on /mountpoint type cifs (rw)

root@cubefileclient:~# ls /mountpoint/
testdir

So this is great, and I thought this was the bug our customer was hitting. But I was wrong: Even with installing the mount helper nothing worked. Even the smbclient did not work for him.

So I recreated his setup (based on Suse Enterprise 11) and I saw the following:

cubefileclient2:~ # smbclient -d 3 //cubefiles.file.core.windows.net/cubefiletest <storage key goes here> -U cubefiles -m SMB2
[lots of debug output deleted here…]
protocol negotiation failed: NT_STATUS_PIPE_BROKEN

And also the mount failed.

So I decided to look at what’s going on on the wire. I opened up a second ssh window to the VM and ran tcpdump on the second terminal while attempting to connect to Azure Files in the first. ( tcpdump –s 65535 –w tcpdump.pcap port 445  to be precise)

Since the output of tcpdump wasn’t too enlightening, I decided to load the output using Microsoft Network Monitor and look at the packets there. (To load the capture files from tcpdump, make sure they have the extension .pcap) And then it was quite obvious:

In Ubuntu 14.04 LTS:

image

In Suse Enterprise 11:

image

The SMB2 protocol was missing. So I started looking at the version numbers of smbclient, the cifs mount helper and the kernel.

Suse:

cubefileclient2:~ # smbclient -V
Version 3.6.3-0.54.2-3282-SUSE-CODE11-x86_64
cubefileclient2:~ # uname -a
Linux cubefileclient2 3.0.101-0.35-default #1 SMP Wed Jul 9 11:43:04 UTC 2014 (c36987d) x86_64 x86_64 x86_64 GNU/Linux
cubefileclient2:~ # mount.cifs -V
mount.cifs version: 5.1

Ubuntu:
root@cubefileclient:~# smbclient -V
Version 4.1.6-Ubuntu
root@cubefileclient:~# uname -a
Linux cubefileclient 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@cubefileclient:~# mount.cifs -V
mount.cifs version: 6.0

So here’s the solution: The Suse Enterprise 11 images contain a cifs implementation both in the kernel and in smbclient that hasn’t the SMB2 protocol implemented yet. And Azure files requires SMB2 otherwise the protocol negotiation will fail.

One closing remark: Please check the date when this was posted, software versions change all the time and what is described here may not be accurate anymore when you read this. I’m not posting this to point to any specific bugs or to promote one distribution over the other. It’s just a fact of life that one cannot support everything with every single version of an OS or service, this post is intended to give you ideas what to look for and give you some tools to debug low-level system behavior. And of course one could have checked the version numbers first or looked for protocol version negotiation mismatches in the debug output. But when I have no clue what to look for, I found it sometimes helpful to start with the lowest level and work my way up until I find something. 

Hope this helps,
H.


Source: msdn

Posted in Microsoft | Leave a comment

Attacks from Mars! Azure ILB and Linux

Hi!

tl;dr: Azure ILB and Linux IP spoofing protection prevent a connection from a machine to itself via the ILB.

A few days ago, I talked to a customer who had quite some trouble using the Azure Internal Load Balancer with his Linux VMs.

From his tests, he concluded that ILB “is broken”, “is buggy” and “is unstable”. What he observed is the following:

– He created two linux VMs in a virtual network on Azure: Machine A on IP address 10.0.0.11 and Machine B on IP Address 10.0.0.12. And then he set up an internal load balancer for HTTP with the following configuration: Input address 10.0.0.10, output 10.0.0.11 and 10.0.0.12, source, destination and probe ports all set to 80.

Then he opened a SSH connection to Machine A (10.0.0.11) and observed the following behavior:

root@ILB-a:~# curl 10.0.0.10
^C

root@ILB-a:~# curl 10.0.0.10
<html><head><title>B</title></head>
<body> B </body> </html>

So it seems that only every second connection worked. Or to be more precise, whenever the ILB was forwarding the connection to the machine he was working on, the connection failed.

So I recreated this setup and tried for myself, this time looking at the tcpdump output for the case when the connection did not work:

root@ILB-a:~# tcpdump -A  port 80
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
10:32:05.210953 IP 10.0.0.11.58705 > 10.0.0.10.http: Flags [S], seq 265927267, win 29200, options [mss 1460,sackOK,TS val 1471701 ecr 0,nop,wscale 7], length 0
E..<..@.@.W.

..
.Q.P…c……r…………
..t………
10:32:05.216395 IP 10.0.0.11.58705 > 10.0.0.11.http: Flags [S], seq 265927267, win 29200, options [mss 1418,sackOK,TS val 1471701 ecr 0,nop,wscale 7], length 0
E..<….@…

….Q.P…c……r..:………
..t………
10:32:06.210783 IP 10.0.0.11.58705 > 10.0.0.10.http: Flags [S], seq 265927267, win 29200, options [mss 1460,sackOK,TS val 1471951 ecr 0,nop,wscale 7], length 0
E..<..@.@.W.

..
.Q.P…c……r…………
..u………
10:32:06.212291 IP 10.0.0.11.58705 > 10.0.0.11.http: Flags [S], seq 265927267, win 29200, options [mss 1418,sackOK,TS val 1471951 ecr 0,nop,wscale 7], length 0
E..<….@…

….Q.P…c……r..@………
..u………

It looked like the ILB was forwarding the packets (it’s actually just rewriting the destination IP and port, as you can see the rest of the packet just stays the same.) And then the linux kernel would just drop the packet. Turns out this behavior is actually a clever idea. But why?

Because the packet the linux kernel sees in its interface input queue could actually never get there in the first place! It carries the local IP address both in its source and destination address. So if the network stack would want to send a packet to the local host, it would have never been sent into the network but would have been handled in the network stack already, much like the packet to localhost (127.0.0.1). So any incoming packet with a local IP address as source must therefore be evil, probably a spoofing attack directed at some local service that would accept local connections without further authentication. So dropping this packet is actually clever.

But how can we prove that this is actually the case? Fortunately, there’s a kernel runtime configuration switch which enables logging of such packets. And here’s where the title of this post comes from: the configuration is called log_martians. This can be set globally (echo 1 > /proc/sys/net/ipv4/conf/all/log_martians) or for a specific interface (e.g. echo 1 > /proc/sys/net/ipv4/conf/eth0/log_martians). The kernel then logs these events to syslog and can be seen by running dmesg.

In syslog, these packets show up like this:

Sep 15 11:05:06 ILB-a kernel: [ 8178.696265] IPv4: martian source 10.0.0.11 from 10.0.0.11, on dev eth0
Sep 15 11:05:06 ILB-a kernel: [ 8178.696272] ll header: 00000000: 00 0d 3a 20 2a 32 54 7f ee 8f a6 3c 08 00        ..: *2T….<..

Conclusion: The linux kernel default behavior drops any packet that seems to be coming from a local IP but shows up in the network input queue. And that’s not the ILBs fault. The ILB works just fine as long as you don’t connect to it from a machine that’s also a potential destination for the ILB.

Fortunately, this limitation rarely occurs in real life architectures. As long as the clients of an ILB and the servers load-balanced by the ILB are distinct (as they are in the Galera example in this blog) the ILB just “works”. In case you actually have to “connect back” to the same server, you have to either build a workaround with a timeout and retry in the connection to the ILB or reconfigure the linux kernel to allow such packets in the input queue. How to do that with the kernel configuration parameters, I leave as an exercise to the reader. 😉

Hope this helps,
H.


Source: msdn

Posted in Microsoft | Leave a comment

Running a MySQL Galera cluster on Microsoft Azure

 

A few weeks ago, I was looking into running a MySQL Galera Cluster for a customer with a large Linux IAAS deployment on Azure.

Why that? There’s ClearDB, a Microsoft Partner that offers MySQL on Azure as SaaS (Software as a service), so you can go to https://www.cleardb.com/store/azure and pick your size. Or, if you want to run it on your own, you can pick a Ubuntu Linux Gallery image and type “apt-get install mysql-server” and that’s it, right? Well, not so fast…

ClearDB is a great offering for most customers that need a MySQL backend, but in this case, even the largest ClearDB offer was not sufficient.

So the customer followed the second path down, he created an IAAS VM (actually several VMs which each run an independent database server for different purposes) and configured his services to use these databases via the internal IP addresses of these servers. But there’s one problem with this approach: Occasionally, Azure needs to deploy patches to the host systems running these VMs. And occasionally, the Linux VMs also need patches that require a restart of the database server or a reboot of the machines. Whenever this happened, the customer site would be down for a few minutes. 

To avoid this occasional downtime, I teamed up with Oli Sennhauser, CTO at FromDual and my colleague Christian Geuer-Pollmann to set up a MySQL Galera Cluster on Azure.

Such a cluster consists of three MySQL VMs. Database connections can be handled by all three machines, so the DB (read) load is distributed as well. As long as two machines are up, the database service is available. Galera achieves this by implementing the replication of database write transactions. More information can be found on http://galeracluster.com/ and on https://launchpad.net/galera/ 

So, here’s the tl;dr version of what we did:

– Set up three Ubuntu 14.04 LTS IAAS VMs with fixed internal IP addresses
– Set up an Azure internal load balancer so that database clients have a single IP they connect to
– Installed mysql-server-wsrep-5.6.16-25.5 and galera-25.3.5 plus a few dependencies
– Configured galera on these three machines
– Added a bit of iptables magic, courtesy of FromDual, to the VMs to block access to the MySQL port while a database server is recovering. The internal load balancer then moves the clients to the other servers of the cluster in case one is down.
– And in order to keep this all neat and clean, we used Powershell to automate the Azure setup part.

0. Prerequisites

The fixed internal IP and the internal load balancer make use of features that were only added to the Azure virtual network quite recently. Chances are that if you configured an Azure virtual network a while ago, these function may not be available. So just configure a new virtual network for this.

Currently, some of these features can only be configured via Powershell. So you need a (windows) machine to run powershell on, if you don’t have one handy, just create a small (A1) Windows server machine in the Azure portal and use RDP to connect to it. Then install the Azure Powershell, see here.

And you should do a bit of planning ahead for your new virtual network. It should have sufficient IP addresses to host all your database clients, the three servers of the cluster and the additional IP input address of the load balancer. In this case, we used the 10.0.0.0/8 default setting but placed all the database servers in the 10.11.0.0/16 subnet.

1. Creating the machines and the internal load balancer

As said before, we scripted all this in powershell. And in order to keep the configuration apart from the actual commands, we set a bunch of variables in the header of our script that contain the actual settings. So when you see $servicename in the examples below, that is something we’re setting in this header.

The Load balancer is created by this powershell command:

Add-AzureInternalLoadBalancer -ServiceName $servicename -InternalLoadBalancerName $loadBalancerName –SubnetName $subnetname –StaticVNetIPAddress $loadBalancerIP

When running this command, we found that the service needs to be deployed before running this command. So in order to ensure this, we just created a small toy IAAS VM, then created the loadbalancer and the database VMs and then removed the toy VM again.

To configure a VM to use the internal load balancer, we add an endpoint to the VM configuration:

Add-AzureEndpoint `
            -Name mysql `
            -LocalPort 3306 `
            -PublicPort 3306 `
            -InternalLoadBalancerName $loadBalancerName `
            -Protocol tcp `
            -ProbePort 3306 `
            -ProbeProtocol “tcp” `
            -ProbeIntervalInSeconds 5 `
            -ProbeTimeoutInSeconds 11 `
            -LBSetName mysql

Since we have multiple Linux VMs in the same cloud service, we need to remove the standard SSH endpoint and create an individual SSH endpoint for each machine:

Remove-AzureEndpoint `
            -Name SSH `
            | `
Add-AzureEndpoint `
            -Name SSH `
            -LocalPort 22 `
            -PublicPort $externalSshPortNumber `
            -Protocol tcp `
            |`

And we want to use a static internal IP for each machine since we need to specifiy these IP adresses in the galera configuration:

Set-AzureSubnet -SubnetNames $subnetname `
            | `
Set-AzureStaticVNetIP -IPAddress $machineIpAddress `
            | `

We wrapped all this into a configuration function called GetCustomVM. So here’s the complete script:

 1: #

 2: # Set up three VMs for a Galera Cluster

 3: #

 4:  

 5:  

 6: # Azure Cmdlet Reference

 7: # http://msdn.microsoft.com/library/azure/jj554330.aspx

 8:  

 9: $subscriptionId     = "<your subscription ID here>"

 10: $imageLabel         = "Ubuntu Server 14.04 LTS"       # One from Get-AzureVMImage | select Label

 11: $datacenter         = "West Europe" # change this to your preferred data center, your VNET and storage account have to be set up there as well

 12: $adminuser          = "<your linux user name here>"

 13: $adminpass          = "<a linux password>"

 14: $instanceSize       = "ExtraSmall" # ExtraSmall,Small,Medium,Large,ExtraLarge,A5,A6,A7,A8,A9,Basic_A0,Basic_A1,Basic_A2,Basic_A3,Basic_A4

 15: $storageAccountName = "<the storage account name for the vm harddisk files>"

 16: $vnetname           = "<the name of your vnet>"

 17: $subnetname         = "<the name of the subnet for the database servers>"

 18:  

 19: $loadBalancerName   = "galera-ilb" # should be changed if there are multiple galera clusters  

 20: $loadBalancerIP     = "10.11.0.10"

 21:  

 22: $servicename        = "<your service name>" # all machines will be created in this service

 23: $availabilityset    = "galera-as" # should be changed if there are multiple galera clusters  

 24:  

 25: #

 26: # Calculate a bunch of properties

 27: #

 28: $subscriptionName = (Get-AzureSubscription | `

 29:     select SubscriptionName, SubscriptionId | `

 30:     Where-Object SubscriptionId -eq $subscriptionId | `

 31:     Select-Object SubscriptionName)[0].SubscriptionName

 32:  

 33: Select-AzureSubscription -SubscriptionName $subscriptionName -Current

 34:  

 35: $imageName = (Get-AzureVMImage | Where Label -eq $imageLabel | Sort-Object -Descending PublishedDate)[0].ImageName

 36:    

 37: $storageAccountKey = (Get-AzureStorageKey -StorageAccountName $storageAccountName).Primary

 38:  

 39: $storageContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey

 40:  

 41: #

 42: # Fix the local subscription object

 43: #

 44: Set-AzureSubscription -SubscriptionName $subscriptionName -CurrentStorageAccount $storageAccountName 

 45:  

 46:  

 47: #

 48: # This function encapsulates the configuration generation of a single new Galera VM

 49: #

 50: Function Get-CustomVM

 51: {

 52:     Param (

 53:         [string]$customVmName, 

 54:         [string]$machineIpAddress, 

 55:         [int]$externalSshPortNumber,

 56:         [string] $storageAccountName = $storageContext.StorageAccountName

 57:         )

 58:  

 59:     # 

 60:     # configure the VM object

 61:     #

 62:     $vm = New-AzureVMConfig `

 63:             -Name $customVmName `

 64:             -InstanceSize $instanceSize `

 65:             -ImageName $imageName `

 66:             -AvailabilitySetName $availabilityset `

 67:             -MediaLocation "https://$storageAccountName.blob.core.windows.net/vhds/$customVmName-OSDisk.vhd" `

 68:             -HostCaching "ReadOnly" `

 69:             | `

 70:         Add-AzureProvisioningConfig `

 71:             -Linux `

 72:             -LinuxUser $adminuser `

 73:             -Password $adminpass `

 74:             | `

 75:         Set-AzureSubnet -SubnetNames $subnetname `

 76:             | `

 77:         Set-AzureStaticVNetIP -IPAddress $machineIpAddress `

 78:             | `

 79:         Remove-AzureEndpoint `

 80:             -Name SSH `

 81:             | `

 82:         Add-AzureEndpoint `

 83:             -Name SSH `

 84:             -LocalPort 22 `

 85:             -PublicPort $externalSshPortNumber `

 86:             -Protocol tcp `

 87:             |`

 88:         Add-AzureEndpoint `

 89:             -Name mysql `

 90:             -LocalPort 3306 `

 91:             -PublicPort 3306 `

 92:             -InternalLoadBalancerName $loadBalancerName `

 93:             -Protocol tcp `

 94:             -ProbePort 3306 `

 95:             -ProbeProtocol "tcp" `

 96:             -ProbeIntervalInSeconds 5 `

 97:             -ProbeTimeoutInSeconds 11 `

 98:             -LBSetName mysql

 99:  

 100:     $vm

 101: }

 102:  

 103: #

 104: # 0. Create cloud service before instantiating internal load balancer

 105: #

 106: if ((Get-AzureService | where ServiceName -eq $servicename) -eq $null) {

 107:     Write-Host "Create cloud service"

 108:     New-AzureService -ServiceName $servicename -Location $datacenter

 109: }

 110:  

 111: #

 112: # 1. Create a dummyVM with an external endpoint so that the internal load balancer (which is in preview) is willing to be created

 113: #

 114: $dummyVM = New-AzureVMConfig -Name "placeholder" -InstanceSize ExtraSmall -ImageName $imageName `

 115:     -MediaLocation "https://$storageAccountName.blob.core.windows.net/vhds/dummy-OSDisk.vhd" -HostCaching "ReadWrite" `

 116:     | Add-AzureProvisioningConfig -Linux -LinuxUser $adminuser -Password $adminpass `

 117:     | Set-AzureSubnet -SubnetNames $subnetname `

 118:     | Set-AzureStaticVNetIP -IPAddress "10.0.1.200" 

 119:  

 120: New-AzureVM -ServiceName $servicename -VNetName $vnetname -VMs $dummyVM 

 121:  

 122: #

 123: # 2. Create the internal load balancer (no endpoints yet)

 124: #

 125: Add-AzureInternalLoadBalancer -ServiceName $servicename -InternalLoadBalancerName $loadBalancerName –SubnetName $subnetname –StaticVNetIPAddress $loadBalancerIP

 126: if ((Get-AzureInternalLoadBalancer -ServiceName $servicename) -ne $null) {

 127:     Write-Host "Created load balancer"

 128: }

 129:  

 130: #

 131: # 3. Create the cluster machines and hook them up to the ILB (without mentioning "-Location $datacenter -VNetName $vnetname ", because the $dummyVM pinned these already

 132: #

 133: $vm1 = Get-CustomVM -customVmName "galera-a" -machineIpAddress "10.11.0.11" -externalSshPortNumber 40011

 134: $vm2 = Get-CustomVM -customVmName "galera-b" -machineIpAddress "10.11.0.12" -externalSshPortNumber 40012

 135: $vm3 = Get-CustomVM -customVmName "galera-c" -machineIpAddress "10.11.0.13" -externalSshPortNumber 40013

 136: New-AzureVM -ServiceName $servicename -VMs $vm1,$vm2,$vm3

 137:  

 138: #

 139: # 4. Delete the dummyVM

 140: #

 141: Remove-AzureVM -ServiceName $servicename -Name $dummyVM.RoleName -DeleteVHD

 142:  

 

 

 

 

 

Now the load balancer and the three VMs are created.

2. Install and configure Galera on the three VMs

We took the galera .deb packages from lounchpad.net:

https://launchpad.net/codership-mysql/5.6/5.6.16-25.5/+download/mysql-server-wsrep-5.6.16-25.5-amd64.deb and https://launchpad.net/galera/3.x/25.3.5/+download/galera-25.3.5-amd64.deb 

In these packages, we found a few minor glitches that collided with the Ubuntu 14.04 LTS we installed them on. The first glitch was that mysql-server-wsrep-5.6.16-25.5-amd64.deb has a configured dependency on mysql-client. And Ubuntu sees this satisfied with the mysql-client-5.5 package it uses as default, but this creates a version conflict. So I downloaded the .deb and modified its dependency to point to mysql-client-5.6 by following http://ubuntuincident.wordpress.com/2010/10/27/modify-dependencies-of-a-deb-file/. The second glitch was the fact that the default my.cnf contains the path /var/log/mysql/error.log which does not exist on Ubuntu. This created the strange situation that the server process would not start but just create two mysterious entries in syslog. Running strace on the server process showed the path it was trying to access, and once I created it everything worked fine. Another glitch in the package was that is was missing an upstart script for mysql, instead it had just a classic /etc/init.d shell script which confused upstart.  So I took one from a standard mysql-server-5.6 package and everything worked out well.

The steps to set up Galera were:

$ apt-get install mysql-client-5.6
$ apt-get install libssl0.9.8
$ dpkg -i galera-25.3.5-amd64.deb
$ dpkg –force-depends -i mysql-server-wsrep-5.6.16-25.5-amd64.modified.deb
$ mkdir /var/log/mysql
$ chown mysql /var/log/mysql

and put the standard upstart script from mysql-server-5.6 into the upstart config directory.

The next part was to configure the galera cluster function. As you can see in the script above, we have created three machines with the internal IP addresses 10.11.0.11, 10.11.0.12 and 10.11.0.13. For this, we need to set a few things in the default my.cnf

Binlog_format=row
Default_storage_engine=InnoDB
Innodb_autoinc_lock_mode=2
Query_cache_type=0
Query_cache_size=0
Innodb_flush_log_at_trx_commit=0
Bind_address=0.0.0.0
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name=”<your cluster name here>”
Wsrep_sst_method=rsync

These settings are the same in all three machines. On each of the machines, we can now set a human readable node name, eg.g

wsrep_node_name=’Node A’

In the next step, we configured the actual clustering, i.e., we told each machine where to find the replication partners.

On machine 10.11.0.11, we set the following line in my.cnf:

wsrep_cluster_address=”gcomm://”

This allows this database node to come up even if there is no replication partner.

Then we started the server on 10.11.0.11.

Then we set the following line in my.cnf on 10.11.0.12:

wsrep_cluster_address=”gcomm://10.11.0.11,10.11.0.13″

and started the server on 10.11.0.12

Then we set the following line in my.cnf on 10.11.0.13:

wsrep_cluster_address=”gcomm://10.11.0.11,10.11.0.12″

and started the server on 10.11.0.12.

Now we went back to 10.11.0.11 and changed the line to:

wsrep_cluster_address=”gcomm://10.11.0.12,10.11.0.13″

and restarted the server. Now the galera cluster was configured.

Instead of changing the configuration of the initial node twice, one can also directly start the server process and add the configuration setting to the command line, e.g. mysqld_safe wsrep_cluster_address=”gcomm://”. This is a good workaround if for whatever reason the cluster was fully shut down and needs to be brought up manually again.

Since the internal load balancer was already configured before, we can now use the ILB input IP address to connect to the cluster. So the clients use 10.11.0.10:3306 to connect to the cluster. And with each new TCP connection, the load balancer chooses one of the running nodes and connects the client to it.

There is one additional issue that may confuse clients in one specific situation. Imagine that one of the nodes just failed and is about to start up again. In this state, the database server can be accessed but does not yet have data replicated from the other nodes. In this state, although the clients can connect, all database commands will fail. If clients aren’t prepared to handle this situation, this may show up as database errors in applications. But there’s a solution: FromDual has implemented a small shell script that uses the Linux iptables firewall to deny access to the server while it is in this state. The load balancer then finds it cannot access the TCP port and reroutes the request to another running cluster node.

To run the script whenever a replication state change occurs, another line is added to my.cnf:

wsrep_notify_cmd = /usr/local/bin/block_galera_node.sh

The script and the instructions for setting this up can be found here:  http://fromdual.com/galera-cluster-for-mysql-and-hardware-load-balancer/ Don’t be alarmed by the fact that it talks about hardware load balancers, it works the same with the (software-based) Azure internal load balancer.

Hope this helps,

H.


Source: msdn

Posted in Microsoft | Leave a comment

Azure from the Linux command line (part 2)

Hi!

About a month ago, I wrote the first blog post of this series where I have shown how to set up the xplat-CLI (Cross Platform CLI) on Linux and I described how to create IAAS VMs on Azure.

But the approach described had one important drawback: It creates the VMs with a default user and password, but not with an SSH key set up to login.

So let me fix this here.

When you’re familiar with SSH on unix platforms, the usual pattern is to use ssh-keygen to create a key pair, then push the public key into the ~/.ssh/authorized_keys file on the remote host and keep the private key in your ~/.ssh/id_rsa file. When using the same user name on both sides, the command ssh <remotehost> then just works without entering a password. And so does scp, sftp and (in case you have set the rsync_rsh environment variable to ssh in your login script) rsync.  And as you have probably used an empty keyphrase for the secret key, this works nicely from scripts. (And of course I don’t recommend using that empty keyphrase in general, especially not for privileged accounts) 

On Microsoft Azure, we have an internal key deployment mechanism that is used for multiple things, it can deploy keys into Windows and Linux VMs, into PAAS roles and so on. And this mechanism is also used to deploy your ssh public key into your IAAS VMs. But in order to work, it needs the keys in a common universal file format. So just generating the keys using ssh-keygen won’t work. Instead, you can use openssl  to generate the private and public key files in x.509 der format.

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout myPrivateKey.key -out myCert.pem
$ chmod 600 myPrivateKey.key
$ openssl  x509 -outform der -in myCert.pem -out myCert.cer

The first line generates the key pair, as you have probably guessed from the command line parameters, it’s a 2048 bit RSA keypair with a lifetime of 365 days. Again, you can create this key without a passphrase, but that might be a security risk.

Remember the bash script line to create a VM from part one:

$ azure vm create -e -z extrasmall -l “West Europe” $1 $IMAGENAME azureuser “$PASSWORD”

Now let’s modify this to use the newly generated key in addition to the password:

$ azure vm create -e -t myCert.pem -z extrasmall -l “West Europe” $1 $IMAGENAME azureuser “$PASSWORD”

This creates the VM, but this time, azureuser gets a pre-configured authorized_key.

There is one difference when doing a ssh into this VM: you need to specify the key to use as authorization and the remote user name:

$ ssh -i myPrivateKey.key <cloudservicename>.cloudapp.net

And now you’re not asked for a password anymore.

The -i option also works for scp and sftp. For rsync, you can use

$ export RSYNC_RSH=”ssh -i /path/to/myPrivateKey.key”

or use the rsync –rsh “ssh -i /path/to/myPrivateKey.key” command line option to specify the remote shell and identity file to use.

Hope it helps,

H.

 


Source: msdn

Posted in Microsoft | Leave a comment

Azure from the Linux command line (part 1)

Hi!

Since i’ve been gowing up IT-wise with a unix command shell, I tend to do a lot of things with it. Also managing my Azure deployments since there’s the great Azure command line interface or cross platform (“xplat”) CLI.

(If you’re interested in the details, this is all open source, released under an Apache license, and on github:  https://github.com/WindowsAzure/azure-sdk-tools-xplat.)

This blog post documents a few tricks I’ve been using to get up and running fast.

First: You need to connect the xplat cli to your azure subscription. To do that simply run

$ azure download

after installing the cli. If you’re on a remote machine via ssh, this will simply give you an URL to launch in your browser. Make sure you’re already logged into the azure portal, otherwise you will need to login first when going to this URL.

The website will now give you a .publishsettings file for download. The same file is used when setting up a connection between Visual Studio and an Azure subscription.

Now get this file to your linux box (and make sure you keep it safe in transit, this file contains a management certificate key that can manage your subscription!) and import it into xplat cli:

$ azure account import <publishsettingsfile>

And now you’re all set.

Now let’s look around

$ azure help

info:    Executing command help
info:             _    _____   _ ___ ___
info:            /_  |_  / | | | _ __|
info:      _ ___/ _ __/ /| |_| |   / _|___ _ _
info:    (___  /_/ _/___|___/|_|____| _____)
info:       (_______ _ _)         _ ______ _)_ _
info:              (______________ _ )   (___ _ _)
info:
info:    Windows Azure: Microsoft’s Cloud Platform
info:
info:    Tool version 0.7.4
help:
help:    Display help for a given command
help:      help [options] [command]
help:
help:    Open the portal in a browser
help:      portal [options]
help:
help:    Commands:
help:      account        Commands to manage your account information and publish settings
help:      config         Commands to manage your local settings
help:      hdinsight      Commands to manage your HDInsight accounts
help:      mobile         Commands to manage your Mobile Services
help:      network        Commands to manage your Networks
help:      sb             Commands to manage your Service Bus configuration
help:      service        Commands to manage your Cloud Services
help:      site           Commands to manage your Web Sites
help:      sql            Commands to manage your SQL Server accounts
help:      storage        Commands to manage your Storage objects
help:      vm             Commands to manage your Virtual Machines
help:
help:    Options:
help:      -h, –help     output usage information
help:      -v, –version  output the application version

That does not look to bad after all. Just remember azure help <command>,this is your first stop whenever you get stuck.

So let’s set up a linux VM. First let’s check what pre-configured linux images are available.

$ azure vm image list

Now you should see a lot of images. When I just ran this, I got more that 200 lines of output. Image names look like this:

 b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20140226-en-us-30GB

Now we could copy this name to our clipboard and paste it into the next command, but let’s have the shell do that for us, here’s the idea:

#!/bin/bash
IMAGENAME=`azure vm image list |grep -i Ubuntu-13_10-amd64-server |tail -1 | awk ‘{print $2}’`

Get the list of VM images, just select the ones we’re interested in, then select the last (i.e. the most recent one) of that list and just give me the second string which is the image name. Easy, right? Note the back single quotes in the beginning and the end of that line, this is shell syntax for “take the output of that command and store it in that shell environment variable”.

To use the VM, we need to login, so let’s use a password for now:

PASSWORD=”AtotallySECRET!PA55W0RD”
echo Password is $PASSWORD

Next, let’s create the VM:

azure vm create -e -z extrasmall -l “West Europe” $1 $IMAGENAME azureuser “$PASSWORD”

Here’s the output of running this shell script:

$ bash create_ubuntu_vm.sh contosolinux

Password is AtotallySECRET!PA55W0RD
info:    Executing command vm create
+ Looking up image b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20140226-en-us-30GB
+ Looking up cloud service
+ Creating cloud service
+ Retrieving storage accounts
+ Creating VM
info:    vm create command OK

And after about two minutes I can ssh into contosolinux.cloudapp.net as “azureuser” with that super secret password.

Hope it helps,

H.

ps: to get rid of the VM again, I just type azure vm delete -b contosolinux

pps: in case that’s too harsh, azure vm shutdown contosolinux, azure vm start contosolinux and azure vm restart contosolinux work as well. And azure vm list shows you what Azure thinks your VMs are doing right now.

ppps: And in case you were wondering why there was no root password set: just run sudo bash from this initial user account.

 


Source: msdn

Posted in Microsoft | Leave a comment