Mastodon

Global vs Local Optimization

One of my recent assignments included writing an Excel report with Apache POI. This report has nine worksheets that cover different aspects of one business concept. In other words, each sheet gives another view of the business case. After finishing my work on the report, I noticed I made a huge mistake: I didn’t optimize local algorithms, which was a good decision. But I also didn’t optimize the whole approach on the problem. That turned out to be bad.

Because I pretty much rushed into the assignment, I didn’t take the time to analyse the whole problem beforehand. Time, as usual, was short and deadlines had to be met. So I decided to learn on the road and implemented one of the nine worksheets as I saw fit. I wrote a DAO, loaded all needed objects, extracted information for this sheet. That worked fine, so I took on the next sheet. I used my DAO, loaded all needed objects, extracted information for the sheet. That also worked out pretty well. I think it was at the fifth or sixth sheet that I noticed a problem: I always used my DAO to load all needed objects. ALL needed objects. Every time. As I said before, the whole report is about one business case, so nearly all of the objects are needed in every sheet. As if this isn’t bad enough, I combined these objects with each other to get the information needed. That means I looped over the repeatedly loaded objects. This is a failure in the global algorithm. I could optimize the local algorithm, for example by using special SQL/HQL statements. But that wouldn’t change the fact that the whole approach is wrong: I have to load all objects that are used in the nine sheets once, extract the needed information, convert it to be used in the nine sheets.

My missing analysis caused the report to fire massive amounts of HQL-statements multiple times on the database. This takes nearly two minutes, only to be followed by another 20 or 30 seconds of copying data. I didn’t refactor this (yet), but I guess this report will run in less than a minute after being optimized.

It pretty much took me by surprise that I made such an obvious error. The only explanation I can find is that I let the deadline intimidate me, so I didn’t take the time to think the task through. I don’t want to say that deadlines are to be ignored. But if I have to choose again between a met deadline or clean code, I’ll choose the later.