blogit ergo sum

Digital Signal Processing MOOC

2013-12-27T20:52:00.000+11:00

I just finished Paolo Prandoni's and Martin Vetterli's Digital Signal Processing course on Coursera.

I did not spend much time on this course unlike the MOOCs I had studied before. I have worked on various types of signal processing in my career and I as I progressed through this course I found most of the material was familiar to me.

Digital Signal Processing is a diverse collection of subjects defined is Wikipedia as being the mathematical manipulation of an information signal to modify or improve it in some way, including e the following laundry list: audio and speech signal processing, sonar and radar signal processing, sensor array processing, spectral estimation, statistical signal processing, digital image processing, signal processing for communications, control of systems, biomedical signal processing, seismic data processing, etc!

The course syllabus was

Discrete time signals
Vectors spaces
Fourier Analysis
Linear Filters
Interpolation and Sampling
Stochastic Signal Processing and Quantization
Image Processing
Digital Communication Systems

The lectures were based on a very thorough online book that was so easy to read, I did not watch many of the lectures. The lectures seemed good for an introduction to the field.

Assessment was by quiz. I would have preferred programming assignments

My quiz results suggested that I understood all the material in the course expect design of ADSL.

Discrete Optimization Course Finished

2013-08-26T20:58:00.001+10:00

I recently finished Pascal Van Hentenryck's Discrete Optimization online course.

The course differed from other tertiary courses and MOOCs I had taken in the past in several ways.

It started with the video on the right instead of the dry introductions I was used to in math and computing courses.
The format of the course was to solve several instances of one NP-hard problem per week.
Little guidance were given. Students were left to choose from the methods give in lectures.
There were no quizzes!

Discrete Optimization as the name suggests is about optimizing things that take on discrete (whole number) and usually positive values. That is much of the real world. e.g. items in a knapsack, cities on a route, vehicles in a delivery schedule, so it is widely applicable. e.g. The introductory video told us that Australia spends 10-20% of GDP on logistics, and logistics is made up of discrete things like numbers of packages and vehicles, so discrete optimization is useful.

The Course

The basic ideas of the course were to:

Learn the main techniques of discrete optimization.

Basics: e.g. randomization, greedy search, dynamic programming, branch and bound.
Constraint programming
Local search
Mixed integer programming

Solve a series of problems using any of the techniques learned in combination or alternative methods.

Knapsack
Graph Coloring
Traveling Salesman
Warehouse Location
Vehicle Routing
Puzzle Challenge (optional)

In the words of the Course Syllabus

The course has an open format. At the start of the course all of the assignments and lectures are available and each student is free to design their own plan of study and proceed at their own pace. The assessments in the course consist of five programming assignments and one extra credit assignment. In the programming assignments, students experience the challenges of real world optimization problems such as selecting the most profitable locations of retail stores (warehouse location) and the design of package delivery routes (vehicle routing).

This sounded practical on one hand, after all I usually want to learn how to solve real world problems in computer courses. On the other hand it does not make it clear how to start.

Lacking a plan, I watched all the lectures then started solving the problems. This was probably about the worst way to do the course. Later on I noticed there was as study guide that gave a good way of doing the course. Oh well.

I watched about 40 lectures that contained some deep insights and lots of practical advice about discrete optimization.

Introduction, branch and bound, dynamic programming.
Constraint programming
Local search
Linear programming and mixed integer programming.
Advanced topics, column generation, disjunctive global constraints, limited discrepancy search and large neighborhood search.

Then I started on the problem sets.

The problem sets were in increasing order of difficulty and each set contained instances that were in increasing size and therefore difficulty. The size was important as all the problems were NP-hard which means roughly that they have no known exact solutions that run faster than a^n where a > 1.0 and n is the size of the problem (e.g. number of cities in the travelling salesman problem).

Solving the Problems

My problem solving went like this.

Knapsack

Used branch and bound. Spent a lot of time getting tight bounds using a technique that was something like a genetic algorithm and developed a special data structure for it (a fixed size sorted deque). Spent way too much time on this.

Graph Coloring

Used local search. Spent time developing efficient moves and more time developing metaheuristics. Learnt why my original metaheuristic that I took so long to develop is not in wide use. Spent too much time on this.

Traveling Salesman

Used local search. Spent time developing a visualization and moves. Spent most of my time tuning well-known metaheuristics and experimenting with search strategies. Learnt why my original search strategies are not in wide use.

Warehouse Location

Used a MIP solution. Spent most of my time getting scip working and formulating the MIP problems.

Vehicle Routing

Had run out of time when I started this. Sorted vehicle trips by angle with warehouse, jiggled them back to feasibility and ran my TSP solution on each vehicle trip.

Puzzle Challenge

Had hours left when I started this so I used whatever method came in to my head first.

e.g. For the N queens problem, I could remember MIP, local search and constraint programming solutions. Constraint propagation seemed the most natural way to do it so I wrote a simple python script for a constraint based solution. This script was too slow to get a high score (valid solution for high N) so I made a few improvements.

Randomized the order of searching rows for new queens as searching every column in increasing row order is very slow. I didn't have time to figure out what a good order was but random seemed unlikely to be bad, and it wasn't/
Coded the inner loop propagate() in Numba which is the quickest way to make python code run at C speeds.

With those improvements my script could solve for about 900 queens but no more because it was recursive and had reached a python call depth limit. I didn't have time to convert it from a recursive to iterative so I stopped there.

Conclusion

This seemed like a good way to learn. I went about it an inefficient way but I learned first hand why the best methods don't work like the ones I invented myself. I also got to see

what was important for each method. e.g. tight bounds for branch and bound, efficient moves for local search
what scales to large N. e.g. not my TSP local search
how much time is involved learning the packages that can be used for discrete optimization, e.g. scip

You can read other students' comments on the course here.

Twitter Bot

2012-08-12T14:44:00.002+10:00

I recently wrote a Twitter bot to offer sympathy to people who have suffered paper cuts. You can see it here.

The Twitter bot link in the last paragraph has all the code. There was nothing particularly complex in it. The hardest part was classifying tweets as being from people who suffered paper cuts or not. This was done with a very simple naive Bayes classifier.

The code from the previous link was trained on 3,225 tweets to give the following 10-fold cross-validation results. The columns are the numbers of tweets containing "paper cut" that the classifier predicts are from people with paper cuts. The rows are the number of tweets that a human (me) says are from people with paper cuts.

===== = ===== = =====
      | False |  True
----- + ----- + -----
False |  1978 |    59
----- + ----- + -----
 True |   308 |   880
===== = ===== = =====
Total = 3225

===== = ===== = =====
      | False |  True
----- + ----- + -----
False |   61% |    2%
----- + ----- + -----
 True |   10% |   27%
===== = ===== = =====
Precision = 0.937, Recall = 0.741, F1 = 0.827

This precision value of 0.937 means that 0.063 or about 1/16 of the tweets that the Twitter bot classifies as being about paper cuts and replies to are not about paper cuts..

There is some additional filtering and to reduce the number of inappropriate replies. Filtering increases the precision to 0.95 so the final false positive rate is 1/20. The false positives after filtering are here. Examining them gives a feel for the cases where the simple Bayesian n-gram classifier breaks down.

The most influential n-grams are given here where you can also find a link to the full list of n-grams.

You can get some idea of the how well the classification works in the real world from the Twitter responses to it.

UPDATE 17 September 2012. Twitter warned the OwwPapercut account for sending automated replies which seemed odd given @StealthMountain. OwwwPapercut has run in non-replying mode since then.

NLP Class Completed

2012-06-03T20:42:00.000+10:00

The Stanford NLP class finished about a week ago and I got my certificate of accomplishment over the week-end.

NLP Class "Attrition"

Professors Dan Jurafsky and Chris Manning also sent out some course statistics that I saved here and graphed on the right using the Python code in the previous link.

From about the third week of the course onwards I felt like I did not have time to complete watching the lectures and doing the problem sets and programming exercises. Therefore I was wondering what it was like for other people like me with jobs and families.

The graph on the right shows the number of students completing problem sets and programming exercises each week of the course. I have called this "attrition" which cannot be completely right as the number of students completing programming exercises increased from week 6 to week 7.

I have not studied university course attrition rates but I suspect they would differ from NLP-class because of incentives like costs and grade penalties that would lead to students leaving at the start of the course. The somewhat smooth and mostly monotonic curves in the above graph seem more natural for a free class like NLP-class where the main benefits are learning.

So why were people leaving the course all the time? I can only answer for me. In order to learn the skills I wanted, I felt I had to complete the assignments and the assignments were long and tended to get longer through the course. Programming assignment 6 took me the longest and cut into some family time. That seems to be reflected in the graph.

Stanford NLP Class continued

2012-03-25T20:29:00.001+11:00

In one of the early classes one of the lecturers showed some Unix tools for performing low-level tasks such tokenizing and word counting. You can see these in the course notes.

I prefer not to use such tools because I already use more tools than I want to and it is possible to do all the things taught in lectures with one of the tools I use a lot, Python.

Here is some Python code that does what I remember the lecturer demonstrating.

Tokenizing on non-alphabetic boundaries

import re

_RE_TOKEN = re.compile('[^a-zA-Z]+')

def tokenize(text):

  return [x for x in _RE_TOKEN.split(text) if x]

Finding unique words
The lecturers call unique words word types.

 tokens = tokenize(text)     

types = set(tokens)

Finding unique words, ignoring case variants

normalized_types = set(tokenize(text.lower()))

Finding word types that differ only in case

# variants[x] = set of all case variants of x in types

variants = {} 

for w in types:

  variants[w.lower()] = variants.get(w.lower(), set([])) | set([w])



# types that occur in multiple type variants

multi_case = dict([(k,v) for (k,v) in variants.items() if len(v) > 1])



# number of types that are case variants of other types

num_case_only = sum([len(v) - 1 for v in multi_case.values()])



# this had better be true

assert(num_case_only == len(types) - len(normalized_types))

That's it. Basic test manipulations done without having to leave one of the small numbers of programming environments I spend a lot of time in.

Stanford NLP Class

2012-03-10T14:09:00.001+11:00

These are my notes on Stanford's online NLP class.

The first few lectures said that a lot of the hard work in NLP, notably in tokenizers, is done with regular expressions. This was not entirely surprising as a good fraction of the string processing I have done in my professional career has been done with regular expressions.

Programming exercises can be done in Python or Java. I chose Python as I have found it well suited to simple string manipulation programs in the past.

The first programming exercise is to extract phone numbers and email addresses from web pages. A training set of Stanford computer science faculty home pages was supplied along with some starter code to show the required formatting. The starter code helpfully computed lists of true positives, false positives and false negatives.

My experience with problems like these is to

get the test samples to pass, by

loosening matches and adding more detection to detect all the addresses and phone numbers
tightening matches to avoid false positives

while taking care to make decisions that are likely to generalize well to as yet unseen samples

2 takes some judgement as it is not clear what will generalize well. e.g In the samples " DOT " was used to mask "." in email addresses. It seems wises to match on all cases of " DOT " but then I found that " DOM " was used for the "." alias by one faculty member . The question was then whether to generalize from " DOT " and " DOM " to " DO<any character> " or treat " DOM " as a one-off as it had been observed only once.

My Web Browsing Turned into a Newspaper

2011-01-15T10:09:00.004+11:00

My life is an open book.
What the machine learning people I follow on Twitter are reading.
And a custom paper.

The Effect of Test Set Selection on Classification Accuracy

2010-10-18T21:42:00.009+11:00

I was looking at some prediction results for the UCI Michalski and Chilausky soybean data set and wondered how they depended on test set selection. Some had classification accuracy as high as 93.1% accuracy on a 25% training set and 97.1% on 290 training and 340 test instances.

A few weeks ago I had been asked to find the best classifier for the soybean data set based on prediction accuracy on a test set of 20% of the data. The remaining 80% could be used for training. That gave 306!/(245!x61!) = 1.3 x 10^65 possible splits of the 306 data points into training and test sets. Could some of these splits lead to better results than others for the classifiers I was about to use?

The WEKA data mining package was used for classification. WEKA has many classifiers that can be run on a data set and their performance to be compared.

WEKA also has a programming interface so I used it to write some Jython toolsto explore the performance of a range of classifiers.

One of these tools was run on the soybean data to find the training/test splits with best and worst classification accuracy. The results were

Classifier	Best Accuracy	Worst Accuracy
Naive Bayes	100%	70.5%
Bayes Net	100%	75.4%
J48 (C4.5)	95%	69%
JRip (RIPPER)	98.4%	70.5%
KStar	96.7%	65.6%
Random Forest	95%	62.3%
SMO (support vector machine)	96.7%	82%
MLP (neural network)	100%	77%

Fig1. Best and worst accuracies for selected WEKA classifiers run on different training/test splits

That was quite a range of test set accuracies for different training/test splits. My simple genetic algorithm may not have found the extremes of the distributions so the actual range may have been higher.

When I ran the test set selection scripta second time (Fig 2) it also found a 100% SMO accuracy. The second test was set up to find a single training/test set split that gave best results for all classifiers at once. It also had a slightly different pre-processing. The 4 duplicate instances were removed and the troublesome single 2-4-5-t sample was left in. Therefore I expected it to give worse results than the pre-processing used for the results in Fig 1.

Classifier	Correct (out of 60)	Percent Correct
Naive Bayes	57	95 %
Bayes Net	59	98.3 %
J48	58	96.7 %
JRip	60	100 %
KStar	60	100 %
Random Forest	59	98.3 %
SMO	60	100 %
MLP	60	100 %

Fig2. Best accuracies for selected WEKA classifiers all run on the same training/test split

Both the above results were for the default settings of each of the WEKA classifiers. The WEKA classifiers all have parameters that can be tuned and it is possible to select subsets of attributes so they can give better and much worse results than the defaults. However the default parameters are usually close to the best so they may be good indicators of the best possible accuracies.

It appears that the training/test split of a data set can change classification accuracy by more than 30%. This was observed on a well-known and widely used classification data set.

Watching Percipo

2010-09-07T22:21:00.000+10:00

Wrote a Blog Post

2010-08-13T21:43:00.003+10:00

I have not posted here recently but I wrote blog post for PaperCut last week.

Blogger supports logical symbols

2010-03-20T09:50:00.005+11:00

e.g. ¬(A ∨ B) ⇒ ¬A ∧ ¬B

All symbols: ¬, ∧, ∨, ⇒, ⇔

C++ Continues to Surprise

2010-01-04T22:49:00.005+11:00

Someone was asking questions about const_cast<>() a few days ago. I was not quite sure how it would work because I try to use as little of the C++ language as possible and it possible to get by in C++ without const_cast<>(). To find out exactly how it worked I tried it out with a test case. The following code gave the same output on g++ on Vista and OS X.

int i = 3;

const int* ptr = &i;

*const_cast<int*>(ptr) = 11;

if (&i == ptr && i != *ptr) {

std::cout << "Cannot happen: &i=" << &i << " == ptr=" << ptr << " but i=" << i << " != *ptr=" << *ptr << std::endl;

}

The output in both cases was Cannot happen: &i=0x22fe6c == ptr=0x22fe6c but i=3 != *ptr=22

How can a single memory address hold two different values?

The disassembly was

push %ebp

mov %esp,%ebp

sub $0x18,%esp

int i = 3;

movl $0x3,0xfffffffc(%ebp) (i in bp-4)

const int* ptr = &i;

lea 0xfffffffc(%ebp),%eax (&i in eax)

mov %eax,0xfffffff8(%ebp) (ptr in bp-8)

*const_cast<int*>(ptr) = 11;

mov 0xfffffff8(%ebp),%eax (ptr in eax)

movl $0xb,(%eax) (*ptr set to 11)

if (&i == ptr && i != *ptr)

lea 0xfffffffc(%ebp),%eax

cmp 0xfffffff8(%ebp),%eax

jne 0x403214

mov 0xfffffff8(%ebp),%eax

mov (%eax),%eax

cmp 0xfffffffc(%ebp),%eax

je 0x403214

The disassembly matches the C++ code. i is stored at bp-4 and ptr is stored at bp-8 so the C++ code should work. The observed behaviour does not match the disassembly.

This cannot be right. I guess I found a bug in g++.

My Time in Sweden

2009-10-27T06:26:00.000+11:00

I lived in Sweden from 1988 to 1991. Here is a map which showed where I lived in Stortorget in Gamla Stan in Stockholm.

View Larger Map

I lived in the red building in the two left photos below which are taken from Stortorget. The photo on the right is of the same building taken from Kåkbrinken, the alley to the left the red building.


The photo on the left below is the main street in Gamla Stan and the photo on the centr is of the Grand Hotel as seen from the shore of Gamla Stan and the photo on the right is Karloniska Hospital where I worked.


After I left Stockholm I moved to Umeå which is shown on the left below. When I lived there I used to visit Vaasa in Finland shown on the right.

When I lived in Sweden I took vacations in Norway including Lofoten on the left and Tromsø in the centre of the row of photos below. I also took the Hurtigruten

Photo Credits

Machine Learning While I Work

2009-10-13T13:55:00.009+11:00

I am setting up Postfix so I have spare time as I try things out. This post is about the things I am reading or watching in the background.

Taskforce on Context-Aware Computing
I went to a lecture called Open Mobile Miner (OMM): A System for Real Time Mobile Data Analysis. There is a video here, a description of OMM here and lecture slides here (pdf).

Shonali Krishnaswamy's group are making software that does some analysis of data on a smart phone before uploading it, thereby reducing the phone's power consumption by reducing communications. Their examples include ECG output, traffic congestion metrics and taxi location data. The data in their examples is scalar and sampled at 0.5 Hz or less so it is hard to see why a simple store-and-forward scheme would not achieve much the same thing. I guess I need to read their publications more deeply.

Statistical Learning as the Ultimate Agile Development Tool by Peter Norvig is an overview of modern practical machine learning. The summary is focus on the data, not the code.

Learning Theory by Mark Reid was an introduction to some theoretical aspects of machine learning presented in a summer school in Canberra in January 2009.

Now some videos of how machine learning can be applied to models of the face.

Changes of facial features on the of dominance, trustworthiness and competence dimensions in a computer model developed by Oosterhof & Todorov (2008). Your browser does not support iframes.

Now it is time to start watching a video on distributed computing

Swarm: Distributed Computation in the Cloud from Ian Clarke on Vimeo.

My First Upside Down Post

2009-09-18T06:40:00.000+10:00

˙ʇǝsdıɥɔ pǝsɐq ɯɹɐ s’ǝןɐɔsǝǝɹɟ 'uoƃɐɹpdɐus ɯɯoɔןɐnb 'ɯoʇɐ ןǝʇuı ˙sɹɐʍ ɹossǝɔoɹd ˙sʞooqʇǝu puɐ sǝuoɥd ʇɹɐɯs ɟo ǝɔuǝƃɹǝʌuoɔ ǝןqıssod ˙ǝƃuɐɥɔ ǝʌıɹp ןןıʍ ǝɔuǝɹǝɟɟıp ǝɔıɹd 000'1$ ˙sʌ 052$ ǝɥʇ ˙pǝʇıns-ןןǝʍ os ʇou sǝop pɹoʍ ʇɟosoɹɔıɯ ǝןıɥʍ pnoןɔ ǝɥʇ ɯoɹɟ ןןǝʍ sʞɹoʍ ʎpɐǝɹןɐ ǝɹɐʍʇɟos ɹǝɥʇo puɐ uozɐɯɐ 'ɯoɔ˙ǝɔɹoɟsǝןɐs 'ǝןƃooƃ ˙ǝɹɐʍʇɟos ʇɟosoɹɔıɯ ǝɥʇ doʇdɐן ǝɔɐןdǝɹ ʎɐɯ pnoןɔ + ʞooqʇǝu os ƃuıʇndɯoɔ pnoןɔ oʇ pǝʇıns-ןןǝʍ ǝɹɐ sʞooqʇǝu ˙ʇɟosoɹɔıɯ puɐ ןǝʇuı oʇ ʇsoɔ ʇɐǝɹƃ ʇɐ sǝןɐs doʇdɐן 0001$sn ǝzıןɐqıuuɐɔ ʎɐɯ sʞooqʇǝu 052$sn ɯɹǝʇ ɹǝƃuoן ǝɥʇ uı ʇnq ǝıd ɔd ǝɥʇ ƃuıʍoɹƃ ǝɹɐ sʞooqʇǝu ʎןʇuǝɹɹnɔ ¿uıɐɥɔ ǝnןɐʌ doʇdɐן puɐ ɔd ǝɥʇ oʇ op sʞooqʇǝu ןןıʍ ʇɐɥʍ ˙sǝןɐs ƃuıʇsıxǝ ɟo uoıʇɐzıןɐqıuuɐɔ pnoןɔ ǝɥʇ oʇ ʇuǝɯǝʌoɯ ɟo sǝɔuǝnbǝsuoɔ ǝɯos sɹoʇıʇǝdɯoɔ ɹıǝɥʇ ɹǝʌo ǝƃɐʇuɐʌpɐ ƃuıɔıɹd ɐ ǝʌǝıɥɔɐ ןןıʍ sıɥʇ ǝpıʌoɹd uɐɔ oɥʍ sɹǝıɹɹɐɔ uoıʇɐɔıunɯɯoɔǝןǝʇ ǝɥʇ ˙suoıʇɔǝuuoɔ ʞɹoʍʇǝu ǝןqɐʇɹod puɐ ǝןqɐıןǝɹ 'ʇsɐɟ sǝɹınbǝɹ ƃuıʇndɯoɔ pnoןɔ ʎɥdɹnɯ ˙ɹɯ sʎɐs „'ɹǝʇʇǝq ʇnq ƃuıɥʇou uǝǝq s,ʇı 'dɯnɥ ƃuıuɹɐǝן ǝɥʇ ɹǝʌo ʇoƃ ǝʍ ǝɔuo ˙sǝƃɐʇuɐʌpɐsıp ןɐǝɹ ʎuɐ ɥʇıʍ dn ƃuıɯoɔ pǝssǝɹd-pɹɐɥ ǝq p,ı ˙ǝɹoɯ ʎuɐ suoıʇɐןןɐʇsuı ǝɹɐʍpɹɐɥ ןɐɔısʎɥd ɹoɟ ƃuıʇıɐʍ sʎɐp puɐ sɹnoɥ puǝds ʇ,uop ǝʍ„ ˙sɹǝʇndɯoɔ uʍo sʇı ɟo ǝuou ɥʇıʍ ʎuɐdɯoɔ ʇuǝɯdoןǝʌǝp qǝʍ ɐ ǝɯoɔǝq sɐɥ ʇı - ǝɹɐʍpɹɐɥ ןɐɔısʎɥd ɹoɟ ʇno ƃuıʞɹoɟ pǝddoʇs sɐɥ ʇı suɐǝɯ ɥɔıɥʍ 'ǝɔıʌɹǝs ןɐnʇɹıʌ ǝɥʇ ɹoɟ ɹnoɥ ɹǝd sʇuǝɔ 08 oʇ 01 sʎɐd ǝƃuɐɹo ʎɔınɾ ˙ǝɯoɔǝq sɐɥ ƃuıʇndɯoɔ pnoןɔ ɯɐǝɹʇsuıɐɯ ʍoɥ sǝʇɐɹʇsnןןı (9002 qǝɟ 02 ǝƃɐ ǝɥʇ) ɯɐǝɹʇsuıɐɯ ǝɥʇ spuǝɔsɐ ƃuıʇndɯoɔ pnoןɔ ”˙ƃuıʍoɹƃ ןןıʇs sı ןɐɹǝuǝƃ uı ƃ3“ ˙sʇdǝɔuoɔ pɹɐʍɹoɟ ʇɐ ןɐdıɔuıɹd 'ssnɐɹʇs ןןıʍ pıɐs ”'sʇods ʇɥƃıɹq ǝɹɐ ǝsǝɥʇ“˙˙˙ ˙pǝʌɹǝsqo ǝʌɐɥ sʇsʎןɐuɐ 'ǝɔuɐɥɔ ƃuıʇɥƃıɟ ɐ ǝʌɐɥ sɯǝpoɯ ƃ3 puɐ sdıɥɔ ɥʇooʇǝnןq puɐ ıɟ-ıʍ 'sdƃ ǝʞɐɯ oɥʍ sɹopuǝʌ puɐ — ɥʇʍoɹƃ ʇɐɥʇ ɟo ɥʇƃuǝɹʇs ǝɥʇ uo ʎɹɐʌ sʇsʎןɐuɐ — ɹɐǝʎ sıɥʇ ʍoɹƃ oʇ pǝʇɔǝɾoɹd ǝɹɐ sǝןɐs ǝuoɥdʇɹɐɯs ˙pɐǝɥɐ ɹɐǝʎ ǝɥʇ uı sʇods ʇɥƃıɹq ƃuıʞǝǝs puɐ sɥʇƃuǝɹʇs ǝɹoɔ oʇ ƃuıʞooן ǝɹɐ sɹopuǝʌ dıɥɔ ʇsoɯ 'ǝןıɥʍuɐǝɯ˙˙˙ ˙sʇǝʞɹɐɯ ʍǝu uǝdo oʇ ʎɐןd ǝuoɥd ʇɹɐɯs ɐ ƃuıɹɐdǝɹd ǝq oʇ pǝɹoɯnɹ sı ˙ɔuı ןןǝp ɹǝʞɐɯ ɔd ˙ǝuoɥdı s’˙ɔuı ǝןddɐ ʎq pǝʌɹǝs ʇǝʞɹɐɯ ǝɥʇ ɟo ǝɯos qɐɹƃ oʇ ƃuıdoɥ 'ʍoןs sǝʇɐɹ sǝןɐs ɔd sɐ spıɯ pǝʇǝƃɹɐʇ sɐɥ — ɹǝʞɐɯ dıɥɔ ʇsǝƃɹɐן s’pןɹoʍ ǝɥʇ — ˙dɹoɔ ןǝʇuı sɐ 'uoıʇıʇǝdɯoɔ ǝʌɐɥ ןןıʍ ɯɯoɔןɐnb˙˙˙spןǝɥpuɐɥ puɐ sdoʇdɐן uǝǝʍʇǝq dɐƃ ǝɔıɹd ǝɥʇ ǝƃpıɹq sʞooqʇǝu sɐ ǝƃɹns oʇ pǝʇɔǝɾoɹd ʎɹoƃǝʇɐɔ ɹǝʇʇɐן ǝɥʇ ɥʇıʍ 'sǝɔıʌǝp ʇǝuɹǝʇuı ǝןıqoɯ puɐ sʞooqʇǝu 'sʞooqǝʇou ɹoɟ ʇǝsdıɥɔ uoƃɐɹpdɐus sʇı uo sısɐɥdɯǝ pǝɔɐןd sɐɥ ɯɯoɔןɐnb 'ǝןıɥʍuɐǝɯ sdıɥɔ :ʇsɐɔǝɹoɟ ssǝןǝɹıʍ 9002 ssǝןǝɹıʍ ɹɔɹ sǝuıɥɔɐɯ dx sʍopuıʍ ɹo nʇunqn ʎןןɐnsn ǝɹɐ ʇɐɥʍ ƃuoɯɐ ǝɥɔıu ɐ puıɟ ןןıʍ ɯɹoɟʇɐןd ǝןıqoɯ ǝɔɹnos-uǝdo s’ǝןƃooƃ ʇɐɥʇ ƃuıʇʇǝq sı - ʎɐpoʇ ʇǝʞɹɐɯ ǝɥʇ uo sʞooqʇǝu ɟo ʎʇıɹoɾɐɯ ʇsɐʌ ǝɥʇ uı punoɟ ɹossǝɔoɹd 072u ɯoʇɐ ןǝʇuı ǝɥʇ ɹoɟ ǝןqısuodsǝɹ - ɹǝʞɐɯdıɥɔ ǝɥʇ ʇɐɥʇ sʇsǝƃƃns oɥʍ '”ǝɔɹnos ǝןqɐıןǝɹ“ s’ʇɐǝqǝɹnʇuǝʌ oʇ ƃuıpɹoɔɔɐ s’ʇɐɥʇ ˙ǝɹɐʍpɹɐɥ ʇǝsdıɥɔ ǝןqɐʇıns ɥʇıʍ sɹǝɹnʇɔɐɟnuɐɯ ʇɹoddns oʇ ƃuıɹɐdǝɹd sı puɐ '0102 ʇnoɥƃnoɹɥʇ puɐ 9002 ǝʇɐן uı sʞooqʇǝu pǝsɐq-pıoɹpuɐ ɟo ʎɹɹnןɟ ɐ ƃuıʇɔǝdxǝ sı ןǝʇuı sʞooqʇǝu pǝsɐq-pıoɹpuɐ ɟo ǝsıɹ ɹoɟ ƃuıʎpɐǝɹ ןǝʇuı ʞooqʇǝu pǝsɐq-pıoɹpuɐ ɟo ǝsıɹ ɹoɟ ƃuıʎpɐǝɹ ןǝʇuı ʎoʎ %6˙12– puɐ bob %7˙12– pǝuıןɔǝp sʇuǝɯdıɥs ʇıun ɹossǝɔoɹd ɔd ǝpıʍpןɹoʍ 'ɯoʇɐ ʇnoɥʇıʍ ˙ǝuıןɔǝp ɔıʇɐɯɐɹp pıoʌɐ ʇǝʞɹɐɯ ǝɥʇ dןǝɥ oʇ ɥƃnouǝ ʇou ʇnq ǝɔuɐɯɹoɟɹǝd ʇǝʞɹɐɯ ןןɐɹǝʌo ǝɥʇ uı ǝɔuǝɹǝɟɟıp ǝןqɐʇou ɐ ǝʞɐɯ oʇ pǝnuıʇuoɔ (,,sʞooqʇǝu,, sןןɐɔ ןǝʇuı ɥɔıɥʍ) sɔd ʞooqǝʇou-ıuıɯ ɹoɟ ɹossǝɔoɹd ɯoʇɐ s,ןǝʇuı ˙˙˙ ؛(ʎoʎ) ɹɐǝʎ ɹǝʌo ɹɐǝʎ %4˙11– puɐ (bob) ɹǝʇɹɐnb ɹǝʌo ɹǝʇɹɐnb %0˙71– pǝuıןɔǝp sʇuǝɯdıɥs ʇıun ɹossǝɔoɹd ɔd ǝpıʍpןɹoʍ '80b4 uı 80b4 uı sʞooqʇǝu oʇ sdoʇdɐן sʍopuıʍ ɯoɹɟ ǝʌoɯ sıɥʇ ʇɹoddns oʇ sǝıɹoʇs ǝɯos ǝɹɐ ǝɹǝɥ ˙ʎʇıןıqoɯ puɐ ʎɹʇsǝɔuɐ ǝuoɥd ɹıǝɥʇ ɯoɹɟ sǝɯoɔ ʇɐɥʇ uoıʇdɯnsuoɔ ɹǝʍod ʍoן ɟo sǝƃɐʇuɐʌpɐ ןɐuoıʇıppɐ ǝɥʇ ǝʌɐɥ ʎǝɥʇ ˙sƃuıɹǝɟɟo pnoןɔ ɹǝɥʇo puɐ uoıʇɐzıןɐnʇɹıʌ 'sɐɐs 'sǝɔıʌɹǝs qǝʍ ɹoɟ sʇuǝıןɔ ǝʇɐnbǝpɐ ǝʞɐɯ ǝuoɥd ʇɹɐɯs ǝןqɐdɐɔ ʎɹǝʌ puɐ sʞooqʇǝu ƃ3 ˙sǝıƃǝʇɐɹʇs ƃuıʇndɯoɔ doʇʞsǝp pǝnsɹnd ʇou ǝʌɐɥ puɐ sǝɔıʌɹǝs ɟo sǝdʎʇ ǝsǝɥʇ uo ʎןǝɹıʇuǝ sǝssǝuısnq ɹıǝɥʇ pǝsɐq ǝʌɐɥ oɥʍ ɯoɔ˙ǝɔɹoɟsǝןɐs puɐ ǝɹɐʍɯʌ 'ǝןƃooƃ sɐ ɥɔns sǝıuɐdɯoɔ ʎq uǝʌıɹp uǝǝq sɐɥ ʇuǝɯdoןǝʌǝp ɹıǝɥʇ ˙sɹɐǝʎ 01 ʇsɐן ǝɥʇ uı ǝʌıʇɔǝɟɟǝ ʎןɥƃıɥ ǝɯoɔǝq ǝʌɐɥ ǝsǝɥʇ ɟo ʇsoɯ ˙ suoıʇɐɔıןddɐ qǝʍ puɐ sɐɐs 'uoıʇɐzıןɐnʇɹıʌ ƃuıpnןɔuı sǝɔıʌɹǝs pnoןɔ ɟo sǝdʎʇ ʎuɐɯ ǝɹɐ ǝɹǝɥʇ ˙uoıʇɐzıuɐƃɹo uɐ ɥƃnoɹɥʇ pǝʇɐɔıןdǝɹ ǝq oʇ pǝǝu ʇou sǝop ǝƃɐɹoʇs ʞsıp ʎʇıןıqɐıןǝɹ-ɥƃıɥ sɐ ɥɔns ǝɹnʇɔnɹʇsɐɹɟuı ǝʌısuǝdxǝ ʇɐɥʇ ʇıɟǝuǝq ןɐuoıʇıppɐ ǝɥʇ sı ǝɹǝɥʇ ˙ʇuǝıɔıɟɟǝ ǝɹoɯ ɥɔnɯ sı ɹǝʌɹǝs ןɐɹʇuǝɔ ɐ uo ǝɹɐʍʇɟos ǝɥʇ ƃuıuunɹ ǝɹoɟǝɹǝɥʇ ˙%01 uɐɥʇ ssǝן ɥɔnɯ 'ʍoן ʎɹǝʌ sı ǝɹɐʍʇɟos sıɥʇ ɟo ǝƃɐsn ǝɔɹnosǝɹ ɹǝʇndɯoɔ ǝƃɐɹǝʌɐ ǝɥʇ ˙ǝʌısuodsǝɹ ǝq oʇ ǝɔɐɟɹǝʇuı ɹǝsn ǝɥʇ ʇuɐʍ noʎ puɐ suoıʇɐʇndɯoɔ ǝsuǝʇuı ǝɥʇ sǝop ʇı uǝɥʍ ʎɐp ɹǝd sǝʇnuıɯ ʍǝɟ ǝɥʇ 'ǝƃɐsn ʞɐǝd ʇɹoddns oʇ ƃuıʎɐd ǝɹɐ noʎ 'ʇı ʇɹoddns oʇ ǝɹɐʍpɹɐɥ ɔd ǝʌısuǝdxǝ ʎnq noʎ uǝɥʍ ˙ǝɯıʇ ǝɥʇ ɟo ʇsoɯ ƃuıɥʇou sǝop ʇı ˙sǝןɔʎɔ ʎʇnp ʍoן ʎɹǝʌ sɐɥ 'sɔıɥdɐɹƃ ʎʇıןɐnb ɥƃıɥ ǝʞıן ǝɹɐʍʇɟos ǝʌısuǝʇuı ʎןןɐuoıʇɐʇndɯoɔ uǝʌǝ 'ǝɹɐʍʇɟos ʇuǝıןɔ ʇsoɯ ˙sɹǝʇndɯoɔ ןɐuosɹǝd s,ǝןdoǝd ʎuɐɯ uo ʇı ƃuıop uɐɥʇ ɹǝısɐǝ sı uoıʇɐɔoן ןɐɔısʎɥd ǝuo uı ƃuıuunɹ ǝɹɐʍʇɟos ƃuıpɐɹƃdn puɐ ƃuıuıɐʇuıɐɯ ǝsnɐɔǝq uoıʇɐzıןɐɹʇuǝɔ ɥʇıʍ ʎןןɐɔıʇɐɯɐɹp sǝsɐǝɹɔǝp (oɔʇ) dıɥsɹǝuʍo ɟo ʇsoɔ ןɐʇoʇ ˙sʇuǝıןɔ ɹǝןןɐɯs ǝʌɐɥ puɐ ƃuıʇndɯoɔ ǝzıןɐɹʇuǝɔ-ǝɹ oʇ sı ʇı ǝsıɹdɹǝʇuǝ uı puǝɹʇ ɹoɾɐɯ ʇuǝɹɹnɔ ɐ

What I need from a 3G Netbook

2009-09-14T07:01:00.000+10:00

I could use one know for

working on the train
working at cafes while waiting for the kids
working in the country while visiting friends and family

My PC and laptop seem like overkill for researching on the web, emailing, writing reports and building a few models in a spreadsheet. They also use a lot of electricity and take up space.

To be an effective replacement for the PC and laptop, a netbook would need to have

Be reasonably priced. $200 would be nice.
Have a reasonably priced connection plan.
Have access to cheap or free software

Web browser
Word processor
Drawing tools
Spreadsheet

Reliable connection with good coverage.
Good battery life. 6 hours would be nice

Basic set of applications

Gmail with tasks
Google calender
Google Docs
GIT for source code management
YUML
Ubuntu or Windows XP with cygwin
Gnu tools
VNC or WRD
ssh

That would get me started. It would be nice to have Eclipse and a local word processor but running these over a remote shell would be more than adequate. I worked that way with all my heavy tools on VMware instances for years and it worked well. The VMware instance were hosted in a data center and backed up regularly

Electronic Medical Records Bonanza?

2009-09-10T05:58:00.002+10:00

Big Bucks in Health IT!, quoting from http://www.healthcareitnews.com/news/global-market-hospital-it-systems-pegged-35b-2015 , says

SAN JOSE, CA – The global hospital information systems market will climb past $35 billion by 2015, according to a new forecast by Global Industry Analysts. The United States represents the largest market in the world. The U.S. hospital information system market is experiencing an increase in acceptance of customized technology such as laboratory information systems and radiology information systems, the report notes. The market is also a promising ground for electronic medical record systems.

The Asia-Pacific region (excluding Japan) represents the fastest growing hospital information systems market, exhibiting a compounded annual growth rate of 11.5 percent over the next few years, according to analysts. Despite being a smaller market in terms of revenue, the Asia-Pacific promises excellent growth opportunities for hospital information systems, they said.

The global vendors profiled in the report include McKesson , Cerner , Allscripts-Misys Healthcare Solutions, Eclipsys, Computer Programs and Systems, Siemens Medical Solutions USA, QuadraMed, Medical Information Technology, Healthland, GE Healthcare, iSOFT Group, Agfa-Gevaert, Brunie-Software, IBA Health and Integrated Medical Systems.

The full release is here: Global Hospital Information Systems Market to Cross $35 Billion by 2015, According to New Report by Global Industry Analysts, Inc. Increasing awareness among medical service patrons on the benefits of using Information Technology in the healthcare sector, coupled with growing demand for affordable-yet quality healthcare services is forcing hospitals and other medical centers to adopt IT in their daily operations. Subsequently, Healthcare IT systems such as the Hospital Information Systems witnessed a great demand in the healthcare services sector. Adoption of HIS in hospitals is increasingly being encouraged and promoted by the Governments world over. http://www.prweb.com/releases/2009/02/prweb2021984.htm

bit.ly custom URLs

2009-09-09T10:21:00.000+10:00

Where does http://bit.ly/peterwilliams direct to?

Is it the same web page as http://linkd.in/PeterWilliams ?

Open Goverment Made Simple

2009-08-16T10:56:00.006+10:00

There has been a lot of talk about Open Government recently, including this from Peter Williams

"Australian governments should adopt international standards of open publishing as far as possible. Material released for public information by Australian governments should be released under a creative commons licence." or in simple terms "make public data open and free".

That is useful, clear and straightforward.

So how to do it?

My experience in organising company data over the last 10 years is that the teams I have worked in have tried many content management systems (CMSs) and none of them were satisfying to use (though some were interesting to implement.) Inevitably the document taxonomies that made sense to the site administrators did not work for most of the users and the users soon gave up trying to find things through the CMS.

Then one day someone in the company I was working at purchased a Google Search Appliance (GSA) and indexed most of our intranet with it. After that everybody could find all the documents they knew existed on the intranet and discovered useful ones they did not know existed.

To be fair, things were not quite that simple. Most companies need reliable storage, decent version tracking, access control and many other things that CMSs provide. However people need to be able to find documents much more than they need these other things. Very few people need version tracked, access controlled documents that they cannot find in the first place.

So why don't goverments just make their data visible to internet search engines and store it somewhere secure with some simple versioning system now, and then do the fancy stuff later? Why are they are investing in CMSs like Sharepoint?

The reason we did not do this in the companies I worked in was that many of the features in the CMSs we used were useful and the people who implemented the systems decided they needed all these features. Finding documents was just one of several check-boxes on their requirements documents. They were acting as implementers and experts, not users. The systems they ended up with made perfect sense to everyone except the users.

The interesting thing about this was the implementers were users in most cases. They were aware of the limitations of CMSs but they had to follow either the direction of their users who had had not used CMSs enough and to understand how badly they would work in practice or the direction of their managers and key stakeholders who had heard that CMSs were good. The person who got the GSA was an IT guy who just went out and tried it without surveying users or bringing in CMS vendors to talk to his key stakeholders.

For a different perspective on Open Government, read some Tim O'Reilly.

When Words Fail

2009-07-16T10:37:00.006+10:00

A while back I worked at a company who made software+hardware products in a maturing market. The company found it needed to deliver higher quality products with more features and was struggling to do so from an old codebase. It had become clear to the management team that late-stage serious defects were the major cause of schedule/quality issues but they had been able to fix this problem.

The codebase management team had a lot of ideas about what the causes were and how to fix them. They had discussed "technical debt", "silo-ing" and other causes. However in the end they settled on two key priorities: taking extreme care with code changes and sticking with established QA processes to minimise the number of introduced bugs.

Eventually the project was given to me to manage. One of the (many) things the development team had done well was to document each bug and cross-reference bug fixes against the source code. I analysed about 100 recently fixed serious software bugs, looked up their fixes in the SCM and then looked up the date at which the code changes causing the bug were checked in. This showed that most of the bugs being found had been introduced months before they were discovered. It was clear that the late-stage defects were dominated by latent bugs being unmasked by changes, not by bugs introduced by changes.

Some changes to the development process were needed. The development group was responsible for creating code without introducing bugs and the QA group was responsible for finding the bugs the development team missed. However the QA process was unsuited to discovering latent bugs fast because it had a long cycle based on testing user scenarios. Therefore I got small teams of developers and QAs to work closely together to find, fix and verify bugs and I took some developers away other work to develop a system to find and fix (and eventually prevent the introduction of more) latent bugs. This work is described here. With these changes in place, code stability improved rapidly and late-stage serious bugs essentially ceased to be found.

That was a fairly straightforward technical solution to a fairly straightforward technical problem. So why had the very capable management team who had known the underlying causes (technical debt and silo-ing) not been able to fix the problem for so long?

Change is known to be difficult in organisations and there is an industry built around dealing with this. However our immediate problem was not an inability to persuade people to change. In fact consultation and review had been distracting people from doing the experimentation required to find the underlying causes of the problem was and how to fix them. The more people talked about the problem the further they got from the solution (hence this post's title).

The situation reminded me of Uncle Bob Martin's Agile Smagile

As I said before, going meta is a good thing. However, going meta requires experimental evidence. Unfortunately the industry has latched on to the word "Agile" and has begun to use it as a prefix that means "good". This is very unfortunate, and discerning software professionals should be very wary of any new concept that bears the "agile" prefix. The concept has been taken meta, but there is no experimental evidence that demonstrates that "agile", by itself, is good.

The deeply ingrained practices in the organisation I worked in had grown out of ideas that had worked well in the past. They had been good enough to cover a wide range of development scenarios for a long while and were clearly based on experimental evidence from past development. However somewhere along the way people had stopped experimenting and modifying the rules, and started just following the rules. This is what Uncle Bob called "going meta". The problem for our organisation was that the set of rules it had got to when it stopped experimenting were not universally true, they were only true for the type of the development they were doing when they stopped changing the rules.

The changes I made to detect and fix latent bugs (high-coverage automated system testing, static analysis with Klocwork and refactoring with unit tests) were adopted across the development organisation and became part of the standard development process, at least for the time I was there. That was good but I wondered if those practices would become a fixed part of the new development process because they had worked some time in the past. And I wondered whether they would prevent the company from addressing problems that arose in the future, just as the practises that had worked well in the past had come to do.

Minimal Non-C++Programmer Bamboozling C++ Question

2009-06-22T13:03:00.002+10:00

I recently read a stream of blog posts about why developers don't like C++ for general purpose programming. This post typifies much of the criticism of C++'s complexity. It includes an interview question about creating a C++ class that behaves like a class in a high level language such as Java. The author says that he uses this to weed out job applicants who haven't used C++ for real work.

It strikes me that tripping up developers with C++'s many oddnesses is much easier than that. Here is a simple question that I believe will confuse many non-C++ programmers:

What is the output of this program?

#include <string>
#include <iostream>
using namespace std;
class Parent {
public:
string func() { return "parent"; }
virtual string vfunc() { return "parent+virtual"; }
};
class Child : public Parent {
string func() { return "child"; }
virtual string vfunc() { return "child+virtual"; }
};
string test1(Parent parent) {
return parent.func() + " - " + parent.vfunc();
}
string test2(Parent& parent) {
return parent.func() + " - " + parent.vfunc();
}
int main() {
Child child;
cout << "test1: " << test1(child) << endl;
cout << "test2: " << test2(child) << endl;
}

I have seen C++ interviewers ask questions like this but only show test1 then ask what is on the stack when test1 is invoked.

Movement to the Cloud

2009-06-02T10:14:00.005+10:00

A current major trend in enterprise IT is to re-centralize computing and have smaller clients. Total cost of ownership (TCO) decreases dramatically with centralization because maintaining and upgrading software running in one physical location is easier than doing it on many people's personal computers.

Most client software, even computationally intensive software like high quality graphics, has very low duty cycles. It does nothing most of the time. When you buy expensive PC hardware to support it, you are paying to support peak usage, the few minutes per day when it does the intense computations and you want the user interface to be responsive. The average computer resource usage of this software is very low, much less than 10%. Therefore running the software on a central server is much more efficient. There is the additional benefit that expensive infrastructure such as high-reliability disk storage does not need to be replicated through an organization.

There are many types of cloud services including virtualization, SaaS and web applications . Most of these have become highly effective in the last 10 years. Their development has been driven by companies such as Google, VMware and Salesforce.com who have based their businesses entirely on these types of services and have not pursued desktop computing strategies.

3G netbooks and very capable smart phone make adequate clients for Web Services, SaaS, Virtualization and other cloud offerings. They have the additional advantages of low power consumption that comes from their phone ancestry and mobility.

Here are some stories to support this

Move from Windows Laptops to Netbooks in 4Q08 In 4Q08, worldwide PC processor unit shipments declined –17.0% quarter over quarter (QoQ) and –11.4% year over year (YoY); ... Intel's Atom processor for mini-notebook PCs (which Intel calls ''Netbooks'') continued to make a notable difference in the overall market performance but not enough to help the market avoid dramatic decline. Without Atom, worldwide PC processor unit shipments declined –21.7% QoQ and –21.6% YoY
Intel readying for rise of Android-based netbook Intel readying for rise of Android-based netbooks Intel is expecting a flurry of Android-based netbooks in late 2009 and throughout 2010, and is preparing to support manufacturers with suitable chipset hardware. That’s according to VentureBeat’s “reliable source”, who suggests that the chipmaker - responsible for the Intel Atom N270 processor found in the vast majority of netbooks on the market today - is betting that Google’s open-source mobile platform will find a niche among what are usually Ubuntu or Windows XP machines
RCR Wireless 2009 Wireless Forecast: Chips Meanwhile, Qualcomm has placed emphasis on its Snapdragon chipset for notebooks, netbooks and mobile Internet devices, with the latter category projected to surge as netbooks bridge the price gap between laptops and handhelds...Qualcomm will have competition, as Intel Corp. — the world’s largest chip maker — has targeted MIDs as PC sales rates slow, hoping to grab some of the market served by Apple Inc.’s iPhone. PC maker Dell Inc. is rumored to be preparing a smart phone play to open new markets. ...Meanwhile, most chip vendors are looking to core strengths and seeking bright spots in the year ahead. Smartphone sales are projected to grow this year — analysts vary on the strength of that growth — and vendors who make GPS, Wi-Fi and Bluetooth chips and 3G modems have a fighting chance, analysts have observed. ...“These are bright spots,” said Will Strauss, principal at Forward Concepts. “3G in general is still growing.”
Cloud computing ascends the mainstream (The Age 20 Feb 2009) illustrates how mainstream cloud computing has become. Juicy Orange pays 10 to 80 cents per hour for the virtual service, which means it has stopped forking out for physical hardware - it has become a web development company with none of its own computers. "We don't spend hours and days waiting for physical hardware installations any more. I'd be hard-pressed coming up with any real disadvantages. Once we got over the learning hump, it's been nothing but better," says Mr. Murphy

Cloud computing requires fast, reliable and portable network connections. The telecommunication carriers who can provide this will achieve a pricing advantage over their competitors

Some Consequences of Movement to the Cloud

Cannibalization of existing sales. What will netbooks do to the PC and laptop value chain? Currently netbooks are growing the PC pie but in the longer term US$250 netbooks may cannibalize US$1000 laptop sales at great cost to Intel and Microsoft. Netbooks are well-suited to cloud computing so netbook + cloud may replace laptop the Microsoft software. Google, salesforce.com, Amazon and other software already works well from the cloud while Microsoft Word does not so well-suited. The $250 vs. $1,000 price difference will drive change.
Possible convergence of smart phones and netbooks.
Processor wars. Intel Atom, Qualcomm Snapdragon, Freescale’s ARM based chipset.

Trying Out Wolfram Alpha

2009-05-19T10:13:00.003+10:00

Wolfram Alpha, the new computational knowledge engine from Wolfram Research has been getting a good run in the press this week so I decided to try it out.

It gave pretty good answers for most of the following queries. Try them out yourself.

Software Engineers were on the Right Track

2009-05-05T11:46:00.000+10:00

It is often sad when software engineers are shown to be correct. One of my first posts was on monetizing social search

This innovative use of social networking could have a short-term payoff. It turns out to be one of the few successes to date in making money from social networking When I ran a straw poll on making money from social networking with some software engineers, wannabe entrepeneurs and friends, they all came up with essentially the same idea: Mining the users' personal details and finding some way to make them pay to keep this information confidential.

My panel of experts shared the cynicism of the sysadmins who say

NEVER anthropomorphize lusers.

Intellius 's recent acquisition of Spock appears to have proven my panel of experts were on the right path.

Michael Arrington warned last week

But one company may have bitten and are close to buying the company. Sources are saying that the infamous Intelius (founded by the equally infamous Naveen Jain ), a people search engine that charges users to access data, may be buying Spock soon. If these rumors are accurate, God help Spock. Not only is Intelius embroiled in all kinds of legal and ethical disputes, but they also have a shaky history when it comes to acquisitions.

Then Ajit Jaokar summed up

So, let me see if I get this right

a) Spock trawls the web looking for our data

b) It creates a profile about us in their site without approval

c) It encourages us to enrich that information

d) It charges us to access our own information

e) And ultimately .. it sells that same information to a background check company ..

For instance, my 'harnessed' profile (i.e. I did not create it) says .. Ajit Jaokar is an Indian-born British author and Web 2.0 specialist. He is the founder and CEO of the publishing company Futuretext. He is also the... and the rest you have to pay for :) (Don't bother .. that information is freely available on my blog .. and lots more .. so you can hopefully make your own judgements about me .. )

Can I delete my own information in Spock?.. Now it gets MUCH more interesting .. Spock says on deletion of information ..

If you'd like to remove yourself from Spock, please read the following information and click the link below.

Before requesting removal, please make sure the original source of the information Spock found for you has been removed or made private (MySpace, blog, Friendster, etc). This will prevent you from being re-indexed on the site. Please note that you can only request removal for your Spock search result.

When filling out your information please make sure to include your name, e-mail, a link to your Spock Search Result (i.e. http://www.spock.com/Tiger-Woods), and the reason why you'd like to be removed. The Spock Support Team will review your claim and get back to you within 24-48 hours.

So, I have to ensure that the Original sources of information that they got the profiles from should also be made private(i.e. my blog, my facebook profile etc etc) ..(else they will 'harness' me again!)

The Future of ICT

2009-04-26T21:28:00.003+10:00

It is worth thinking about the future from time to time. It helps us craft investment strategies and career paths that match the major trends in the world. So what is the future of ICT?

My guesses are

Simplification.
Movement to the cloud.
Fixed/mobile convergence.
Integration of simple cloud services. (1+2)

Modern ICT systems are insanely complex while the most productive computer users I know all use simple tools.

Most of the things we do we with computers are much simpler than the popular packages are capable of. e.g. Editing some text does not require a full blown desktop publishing program like MS Word, yet MS Word is the most popular text editor in the world. Likewise keeping track of some customers and inventory does not require a gigantic package like SAP, yet SAP is the biggest selling ERP software package in the world.

The costs of learning these immensely complex packages are considerable in terms on time lost. There is probably a much higher cost in working as slaves to these packages which distracts from finding the best solutions to an enterprise's problems. A current trend in corporate ICT is to use "best of breed" packages with the minimum possible customization because the payback from customizing is much less than the cost. (BTW, this does not seem to be true for ERP). This means that enterprises that enterprises are paying the cost of not solving their ICT problems as well as they could. This cost has to be a significant fraction of their ICT budgets.

This article explains why "best of breed" software packages sell well. It boils down to the promise of lower total cost of ownership (TCO) through using a single vendor for all services and a mega-brand that makes buyers feel safe.

SAP ERP systems effectively implemented can have huge cost benefits. Integration is the key in this process. "Generally, a company's level of data integration is highest when the company uses one vendor to supply all of its modules." An out-of-box software package has some level of integration but it depends on the expertise of the company to install the system and how the package allows the users to integrate the different modules.

Movement to the Cloud

Centralized computing is much more efficient than desktop-centric computing. TCO decreases dramatically with centralization because maintaining and upgrading software running in one physical location is far easier than on many people's personal computers.
Most client software, even computationally intensive software like high quality graphics, has very low duty cycles. It does nothing most of the time. When you buy expensive PC hardware to support it, you are paying to support peak usage, the few minutes per day when it does the tricky computations and you want the user interface to be responsive. The average computer resource usage of this software is very low, much less than 10%. Therefore running the software on a central server is much more the 10x more efficient.
Expensive infrastructure such as high-reliability disk storage does not need to be replicated through an organization. Virtualization, SaaS etc only became effective in the last few years so many software and hardware vendors built their (then efficient) businesses around powerful client PCs running software locally.

Fixed Mobile Convergence

When simple applications and cloud computing become dominant, the requirements for terminals become much less. Smart phones and 3G netbooks are already very capable and are becoming more so. They also use little power and are portable.

The next level of usability is to have one device for fixed and mobile work. That device should be able to work with WiFi and 3G networks and move seamlessly between them. The technology for this is maturing.

From Wikipedia

A clear trend is emerging in the form of fixed and mobile telephony convergence (FMC). The aim is to provide both services with a single phone, which could switch between networks ad hoc. Several industry standardisation activities have been completed in this area such as the Voice call continuity (VCC) specifications defined by the 3GPP. Typically, these services rely on Dual Mode Handsets, where the customers' mobile terminal can support both the wide-area (cellular) access and the local-area technology (for VoIP). However, an alternative approach achieves FMC over 3G mobile networks - eliminating the requirement for Dual Mode. This approach, broadly termed cellular FMC, is in trials by telecoms operators including BT.

An alternative approach to achieve similar benefits is that of femtocells .

Integration of simple cloud services
When cloud computing is widespread and simple cloud services are widely available, integration companies will be able to assemble tools to meet the needs of businesses. This should be a vast business since it competes with the mega-apps Microsoft, Oracle, SAP, Siebel etc and the mega-glue Tibco etc.

If the recent history of software development is a guide, nimble companies will start to build effective suites and grow rapidly to form a foundation for this industry, then they will be followed by specialist companies who will take care to make their software inter-operable. This will evolve into a software ecosystem and sales channels will emerge. With fixed mobile convergence in the mix, application stores may be used for sales, removing the need for sales and marketing teams in the startup companies that start this new business category.

At this time the setups of hyper-productive software users will be easy to replicate in the cloud. User applications will be available to users as simple serices on simple devices like 3G netbooks. Enterprise applications will run in the cloud with simple interfaces. Business outsourcing will be simple because the software will run in the cloud with well-defined APIs.

The Consequences

These changes will result in a dramatic increase in productivity that will boost economies world-wide.
Software will be simple so ICT staff will not be slaves to the machines of gigantic software packages.
This will free up ICT staff's time to add business value which will increase productivity even more.

Photo Credits
threesixtyfive | day 244 by Sybren A. Stüvel.
What is ERP anyway? (MS&T ERP Center, 01/29/2009) by MS&T Center for ERP.
Skype Crashing on iPhone Fix by theleetgeeks.
Modern Times by jampa.