Wednesday, September 15, 2010

Dumping a Postgresql database remotely with SSH

I ran into a rare problem recently with a large Postgresql database that was filling up the local disks of a server. The database was large, over 100 GB and about 300 million records. There was a lot of churn and it had not been vacuumed in a long time. When I manually ran a vacuum on it, there was not enough working disk space to complete the operation, creating a bind.

What I decided to do instead of using vacuum was to dump it to a remote backup location, then drop the database and restore it from the remote dump. I used SSH to run the remote commands.

Dump a remote Postgresql database to the local machine
ssh user@remote-database-server 'pg_dump database-name -t table-name' > table-name.sql

Restore a remote Postgresql database dump to the local database server
ssh user@backup-machine 'cat table-name.sql' | psql -d database-name

Note that the dump command is run from the backup machine and the restore command is run from the database server.

Also note the single quotes around certain parts of the command.

Thursday, August 19, 2010

The Google Chrome experiment

When Google first announced their Chrome browser, packed with a revamped JavaScript engine (V8) and support for offline web apps, I thought I would give it a spin. The first couple of releases were for Windows and Linux -- no Mac version. Those early versions were a little clunky and appeared to offer no better performance than other popular browsers. So I moved on.

When the Mac version became available, it was a much more polished browser. Another theoretical selling point was that each tab ran as a separate process so one crashed tab would not crash the whole browser. I decided to give Chrome a serious work out on my Mac at home.

Things started out well enough and performance was good. I perused the help and learned some of the short cuts. After about three good weeks, something went wrong. I don't know if it was an update, a growing cache, or what, but it started slowing down. Then, it started having problems loading pages from web sites that worked fine in other browsers. It is possible that it even caused wireless network issues, though that is just speculation at the moment. I need to do some more research to see if the problems were related to Chrome.

For now, I am sticking with Firefox as my main browser on the Mac. I'll come back and try Chrome out after the next major release.

Thursday, July 1, 2010

Finding IPs connected to your web server

On Mac OS X
note: this also shows outgoing connections from web browsers

Get all IPs connected to your web server:
netstat -nat | sed -n -e '/ESTABLISHED/p' | awk '{print $5}' | sed 's/\./ /g' | awk '{print $1"."$2"."$3"."$4}' | sort

Get all unique IPs connected to your web server:
netstat -nat | sed -n -e '/ESTABLISHED/p' | awk '{print $5}' | sed 's/\./ /g' | awk '{print $1"."$2"."$3"."$4}' | sort | uniq -c | sort -n

On Linux
Get all IPs connected to your web server:
netstat -ntu | sed -e 's/::ffff://g' | awk '{print $5}' | cut -d : -f1 | sort -n

Get all unique IPs connected to your web server:
netstat -ntu | sed -e 's/::ffff://g' | awk '{print $5}' | cut -d : -f1 | sort | uniq -c | sort -n

Sunday, May 16, 2010


The Last in Line

Monday, March 22, 2010

The mysterious Data Center Technical Specialist certification

On March 4, I received an email from Novell Technical Training that I had received a Novell certification for "Data Center Technical Specialist". This came as a surprise to me because I had not applied for this certification, taken any tests for this certification, not had I even heardof this certification.

Due to a cross marketing agreement with the Linux Professional Institute, I had applied for and received the Novell Certified Linux Administrator certification a few weeks prior. This seemed legitimate to me, as I had extensive experience with SUSE Linux and my Linux skills are still sharp. I continue to perform Linux server administration as part of my daily work.

However, I am not quite sure what the Data Center Technical Specialist is supposed to represent. Confused, I wrote to Novell Training asking what the certification meant. I received this equally mysterious reply:
Thank you for contacting Novell Training Services. You have received the certification as part of some changes we have made recently to our partner requirements. As part if these changes, some of the exams/certifications you have now count toward the new certification.

I searched the official Novell Certification web site, and this certification does not appear anywhere. I suspect, but can't confirm that it may be part of the Solution Provider program. As such, it is probably more of a value to Novell sales than to an individual technician. I remain somewhat baffled.

Wednesday, March 3, 2010

Fuzzy string matching in PostgreSQL

A recent project required me to use fuzzy string matching, or sound alike matching, in an application that searched a list of names. It turns out there is a contrib module for the PostgreSQL database called fuzzystrmatch that provides several different matching algorithms.

The task at hand involved rewriting a legacy application, originally in PICK, in Ruby on Rails. The PICK application used a soundex search to find names of people that sounded like the search string.

Three algorithms are available as PostgreSQL functions (after installation of the fuzzystrmatch module). They are soundex(), levenshtein(), and metaphone().

Both soundex and metaphone convert a string into character codes. Soundex uses 4 characters and metaphone uses a configurable number of characters. Levenshtein directly compares two strings and returns an integer indicating how well the two strings match.

After some trial and error, I found that metaphone produced better results than soundex. I didn't test the Levenshtein function.

To improve the results, I added a classic substring search using ILIKE. The combination of ILIKE and metaphone gave me a broad, but reasonably accurate fuzzy string search.

Friday, February 26, 2010

Linux: fuser to find processes on TCP ports

Note: This is for Linux only. The Mac (BSD) version of fuser does not handle TCP/UDP ports.

Once or twice a year, I run into a problem where a process is using a TCP port and I need to find out which one. I am documenting it here for the next time so I don't have to look it up in man pages.

To see all processes, run fuser as root or with sudo.

To list all processes connected to TCP port 22:

fuser -n tcp 22

Thursday, January 21, 2010

Rails 2.x scaffolding field types

Dude, where's my CRUD?

One of the powerful features of Rails 1.x was the ability to generate CReate, Update, and Delete (CRUD) admin screens automatically using the scaffolding script built into Rails.

The original scaffolding read the database models and created basic, but usable screens to let you add and edit database records. When the 2.x release of Rails came out, scaffolding lost that power. Now, you have to manually specify each table field and type on the command line when running scaffold. If you don't, the generated screens will be empty.

Worse, a basic reference to all valid field types was missing. Here are all the valid types I have been able to dig up:

text (long text, up to 64k, often used for text areas)
decimal (for financial data)

A mapping of the scaffolding types to data types in corresponding databases can be found on Overooped.

Here is an example of using 2.x scaffolding with data types, run from the Rails application root directory:

ruby script/generate scaffold Modelname name:string title:string employed_on:date remarks:text

Here is an example of using rails 3.x scaffolding with data types, run from the Rails application root directory:

ruby script/rails generate scaffold Modelname name:string title:string employed_on:date remarks:text