Friday, 31 July 2009

appreciation

Happy Sysadmin day to me and any other sysadmins who read this.

http://www.sysadminday.com/

Today was spent driving to west sussex in the early hours to deal with something distinctly non-technical and definitely more hairy, which i put in a big metal box and sent to Europe.

Then i went to work, popped into the DC and installed PFsense to play with, then went and wrote a bash script and cronjob.

Go me.

Tuesday, 28 July 2009

Ugh

Writing puppet modules is probably not the best thing to be doing when one is suffering a bout of flu.

ugh. ugh. ugh.

Friday, 24 July 2009

but its not windows!!!

I dont get people sometimes. Recently, on LQ theres been a spate of newbie Linux users posting large and unnecessary rants about how they cant get their computer to work. Invariably, this is always a PEBKAC error, but the $lusers dont want to hear it , they just want to know why it doesnt work like windows!

Clue: Its not Windows.

Change is difficult for some people. I get that. And i think its brill generally that more people are trying out Linux. But Linux isnt windows, it doesnt run the same as windows, and theres some good reasons WHY it doesnt.

For example:

"There has not yet been a single widespread Linux malware threat of the type that Microsoft Windows software currently faces; this is commonly attributed to the malware's lack of root access and fast updates to most Linux vulnerabilities."

I hope this snippet of common sense helps the next $luser who wants to log and and run as root because its 'easier'.


==

ION i am approaching the vinegar strokes in terms of getting this bloody server fixed. Everything but the plastic case has been replaced, and i narrowed the problem down to both disk and CPU. Kickstarted the 0S, and spent yesterday rebuilding the application.( cannot wait to get this into SVN and puppet )

Only one more problem to figure out and im home free.

Wednesday, 15 July 2009

sweet tool

Been using this as a sweet little tool which places stress on linux systems. hats sweet about is that you can stress IO, memory, CPU cycles, and/or disk.

its also good for testing monitoring/alert software such as cacti/munin/etc and of course testing scalability.

http://freshmeat.net/projects/stress

ive been using it overnight to test the machine that went over on me last week. It seems my fears were not unfounded as its now had SATA disk,ram replacement and a new kickstarted OS and its still crashing, but thanks to stress we've managed to narrow it down t to disk. Surprisingly the machine ran quite sweetly all night with quite a high load avg.

Today ive been playing with puppet, which $bossman and i have installed quite recently. Today im chuffed to get puppet to deploy my heartbeat configuration and postfix for my MTA's in our dev environment. But i still have to figure out how to change the iptables to let heartbeat through ( this has been configured manually, just need puppet to do it ) and then ill be (re)deploying the configuration into production. The MTA's will be our first 2 machines running off puppet. Im actually quite excited about this prospect!

Tomorrow i am offically $oldfart. And ill be spending the day cutting cables (literally) in the datacentre, which were put up by 2 very sleep-depped sysadmins and are therefore a bit useless.

Friday, 10 July 2009

going Chrome

$bossman and I must be living under a rock. We only heard about Google's new OS last night

Pretty exctiting stuff though, ive got real high hopes for Chrome OS as a direct rival to MS, despite only being available on netbooks and the like in the first instance. Hopefully, it will do more to put Open Source on the map to the wider public.

Thursday, 9 July 2009

you never stop learning

I just had to write about todays antics - quite sad i know, but i had one of those days that just started off horrible, but then had a lovely breakthrough around midday which made sitting on my arse waiting for a non-existent engineer to turn up all that much more easier.

So, DL140's. Interesting piece of kit, which, by all accounts, likes to go down more times a week than Jordan. Luckily, i am blessed with only a few in our cage, and what we do have are inherited DL140 G3's. I must confess, i largely followed the fule of 'if it aint broke, dont fix it' and only gave the configuration a cursory glance before getting on with more Interesting Things.

Bad Me. Of course, server decides to fall over. Of course, this means i have to wait $millions of years for $supplier engineer to turn up and change the dodgy SATA controller.

But it still doesnt want to play with me. Its resoloutely *refusing* to do anything post-POST. Its almost like a sulking lover, not giving me an inch, and silently blinking its baleful cursor.

So in goes the rescue disk...interesting note for anyone whos wondering why they cant 'chroot /mnt/sysimage' and they get a '/bin/sh exec format error' - if you use a OS installation disc 1 as a rescue disk, it has to be the same architecture as the current operating system, ie: i386/x86_64. ( I am the queen of assumption when it comes to this kind of stuff and will throw any old thing is, assuming its bootable!)

So chrooted the root filesystem and everything looks gravy. Brilliant, so im starting to think perhapos i wont be reinstalling after all. Ah, but fdisk is reporting that there are no valid partitions on md0...

md0? have i gone mad? wtf? quick call to $bossman...

Me: why are we using Software raid on a production machine with hardware raid controller?
Him: Whut!?!wtf? etc etc
Me: I thought as much...

So looks like this install is older than i thought, tho still on a fairly recent OS/Kernel, it managed to predate both myself and $bossman. And neither of us thought to do more than scrape the surface in regards to processes relevant to our application and nothing more..

I havent used software raid in about 4 years, and even then it was on suitable shaky software at an old music/mobile place i was learning the ropes at.

Given that the raid specified was raid 1 and i could still access the filesystem, this meant that we had somehow failed onto the mirror. Lovely, so the problem was in the booting.

Linux software Raid1 doesnt seem to mirror the boot sector, so there was no grub to boot off.
Interestingly this kind of problem only happens if you lost the primary drive. If you lost the secondary drive it will still boot. Of course, whoever set this up should have copied it over but i guess they were busy or something ;)

A quick and dirty man (hurr hurr) showed me how to copy it over:

run 'grub' from the command line

device (hd0) /dev/sda
root (hd0,0)
setup (hd0)
device (hd1) /dev/sdb
root (hd1,0)
setup (hd1)


Pretty nifty and neither $bossman or i had ever heard of it before. But we dont use software raid normally. But im sure someone out there will find this useful.

Rebooted, and system comes up lovely, fixed all the partition tables, but mdstat was still showing a degraded array, so rebuilt that and machine has been happily whirring away ever since.

Of course i cant help myself checking the Crackberry every few hours for an alert in case it has died again... im going to change the disks just to be sure, but it really was an excellent learning experience.

Aha!

Welcome to my blog!

Ive been considering this for a while now, you know..cute.. girl..geek..blogging.. success? who knows.

But in the meantime, enjoy! This blog will be centered on my life as a female sysadmin, and all the interesting things that happen in my chosen career.