2010-12-30

Do not do this, please: this instanceof

Since I do like refactoring I decided that I would document code patterns that I stumble upon.
Consider following code (Java). First snippet shows
It is uneasy for me to explain why this is not a good solution because it is so obvious to me. It's like explaining why anecdote is funny to someone who does not get it. But since I met such code in more than one project written by different people there must be something worth explaining.
First, look at better version of this code:
This rewrite eliminates runtime check, removes dependency of base class on it's subclasses and removes temptation of junior developers who will modify this code after you to insert more insanceof checks in same method. And after all, in Java subclassing is language's idiomatic way do do such things.
I have written what wanted on this case but it is hard to me to stop here, so I continue.
If I would do refactoring I would stop after this rewrite. But looking at the methods I would scent that something else needs improvement: call to "bark" suggests that it belongs not to absrtact animal but to a dog. So if indeed we have only two subclasses Cat and Dog, I would further rewrite the code as:
I would advocate using composition over subclassing if everything else is equal. So if after these changes there's no real code left in Animal I would convert it to interface:
They tell me "ship it" and here I stop.

2010-09-27

Index optimization: the code.

Below is an implementation of index optimization algorithm implemented in Clojure. Perhaps it is not an example of good Clojure style but it leaves less ambiguities than plain words description from my previous post. The GitHub's gist is below, I hope that it is self-explanatory:

2010-09-17

Optimizing set of SQL indexes.

In my current project we generating lots of code including DDL for MySQL database. To make generator code cleaner and DB happier I have added optimization procedure that takes set of indexes that cover all necessary query cases and outputs only those indexes that are really needed. For example there is no need  to add two indexes with same specification, if you have unique and non unique index with same fields then unique index is sufficient and so on. One more observation is that  if you have index on fields [a, b, c, d] then you do not need separate indexes [a, b, c],  [a, b] and [a]. There are lso 3 kinds of indexes that are important here

  1. Plain, normal index
  2. Unique index (is also a plain index)
  3. Primary key index (is also a unique index)

So the algorithm that optimizes indexes uses

  1. Some partial order on index specifications, that can be implemented as predicate that says "this index is covered by other"/"this index adds new query optimizations compared to other"
  2. Procedure that finds "supremum" of given index set

There is one further optimization that I have not implemeneted because improvements from it would be minimal for our project. This optimizaation would re-order fields to try to reduce number of indexes. For example if there is laready indexes [a, b, c, d] and [a, d] then one may leave [a, d, b, c] instead.

Note that if you you use indexes not only for query/filtering but also for ordering queries then reordering filelds in indexes can be not appropriate. In this case one should also consider "ascending"/"descending" property of indexes (and do not mix them).

2010-09-07

Showing dependency cycle in Eclipse.

Here is tiny plug-in that gathers complete build path dependency tree of a Java project in Eclipse. Eclipse tells only that there is some build path cycle but does not tell which projects are involved. If your workspace is crowded this could be a problem. The plugin allows to automatically find out cycles in project dependencies.

2010-08-26

JDK7 - little better

My gripe about closing streams is addressed in latest (not released yet) JDK7. This version allows to write following (see http://mail.openjdk.java.net/pipermail/coin-dev/2009-February/000011.html):
try (BufferedReader br = new BufferedReader(new FileReader(path)) {
           br.readLine();
}
And it actually compiles :) My complain is still intact however: instead of allowing me to write abstractions that I need Java requires me to for next release version and pray that it will have programming features I need. Why I still complain? Because JDK7 still does not have closures (I hope - yet).

2010-08-22

XML query with Clojure

I tried to write some script that would allow me to look for particular values in set of xml files. I found out that Clojure supports XML parsing, query and editing of XML out of box (thanks to clojure-contib). The script follows (it's for Clojure 1.1):
(ns main.xquery
  (:require
    clojure.xml       
    [clojure.zip :as zz]
    [clojure.contrib.zip-filter :as zf])
  (:use
    clojure.contrib.zip-filter.xml))
 
(defn dir [dirname]
  (seq (map #(.getCanonicalFile %) (.listFiles (java.io.File. dirname)))))
 
(defn xml-file? [file]
  (and (.. file (getName) (endsWith ".xml")) (.isFile file)))

(defn query [input-file query]
  (let [xml (zz/xml-zip (clojure.xml/parse input-file))]
    (apply xml-> xml query)))

(let [dirname "/home/petr/workspace/CloTry/"]
  (filter #(not (empty? %))
    (map #(query % [zf/children :item (attr= :type "green") (attr :name)])
      (filter xml-file? (dir dirname)))))
This script scans all XMLs in given directory and returns list of matched "name" attribute values. Interesting work is in "query" function that is called with XML file and XPath-like query expression:
[zf/children [:item (attr= :type "green")] (attr :name)]
The query expression can be read as: take children of root element, retain ones whose attribute "type" has value "green" and take value of "name" attribute of these elements. Note usage of clojure.zip functions that are generic way for traversing of editing hierarchical data structures, XML being one of them. xml-zip function takes parsed representation of XML file and returns it's representation that can be used with generic functions. There was one problem: function "children" is defined in both clojure.zip and clojure.contrib.zip-filter namespaces. Both functions have same semantics but former gives NPE for some reason so I explicitly referred to one from clojure.contrib.zip-filter namespace. Given that a XML file has this content:
<doc>
 <section name="everything"/>
 <item name="first" type="black"/>
 <item name="second" type="green"/>
 <item name="third" type="yellow"/>
 <item name="fourth" type="green"/>
</doc>
The script outputs:
(("second" "fourth"))
Update: If you are interested to know file which produced result the above solution is OK. If you need to know just elements that matched then this would be neater
(let [dirname "/home/petr/workspace/CloTry/"]
  (apply concat
    (map #(query % [zf/children :item (attr= :type "green") (attr :name)])
      (filter xml-file? (dir dirname)))))
Update 2: Or even shorter :)
(let [dirname "/home/petr/workspace/CloTry/"]
    (mapcat #(query % [zf/children :item (attr= :type "green") (attr :name)])
      (filter xml-file? (dir dirname))))

2010-07-29

Stored procedures. I love to hate them.

Why should it be so damn involved?
declare done int default 0;
declare continue handler for not found 
    set done = 1;            
open carrots;
repeat
    fetch carrots into weight, location;
    do_something(weight, location);
until done end repeat;
close carrots;
Instead of something like:
for weight, location in carrots do
    do_something(weight, location)
SQL match functional approach so nicely but storead procedures are horrible procedural mess. And it feels like programming in fortran or assembly.

2010-05-25

Java sub lists

This feaute is there for a while but I did not noticed it. It turns out that java.util.List#subList allows not only read but also manipulate original list just like list slice manipulation in Python but with clumsy syntax:
java.util.List l = new java.util.LinkedList();
 l.add("a");
 l.add("b");
 l.add("c");
java.util.List s = l.subList(1, 2);
 s.add("*");
System.out.println(l); // prints [a, b, *, c]
 s.remove(0);
System.out.println(l); // prints [a, *, c]
java.util.List r = l.subList(2,3);
 r.addAll(s); 
System.out.println(l); // prints [a, *, c, *]

2010-04-23

Again about jobs and workspace resources

There is one more caveat about resources manipulation in Eclipse's workspace. Scheduling a job with a resource scheduling rule helps to avoid unwanted intersections between different tasks. Say one job wants to modify some resource but other wants ro read some consistent state or modify same resoruce. This has two problems. First, there is no robust way to ensure jobs order, and second there is chance of lockups when using jobs for resource manipulation.

It turned out that this does not play well if your job is scheduled as part of bigger transaction. Say, Eclipse started some rename refactoring, your hob invoked as part of "save resources" procedure and wants to modify things so resources are consistent with saved editor state. And then you get into deadlock. Refactoring locked your resource for modifications, your job want to lock it for modifications too and there's no way to tell that your job is "nested" transaction. Oops.

Scheduling one job after other is tricky too. Even Eclipse's code waits for build completion before launch by polling job manager to ensure that there are no active jobs that tagged as "build job".

I found that in most cases (if your requirements are simple) better approach to modifying resources in Eclipse's workspace is to consistently use IResource's (IFile, IFolder, IProject) methods throughout your code instead of using plain java.io package. This would ensure that Eclipse is aware of your changes and you do not need to refresh resources after every modification to make Eclipse aware of it.

2010-02-18

I want hyper continuations.

A situation: I debug a problem in program and paused it at some breakpoint. At this moment by examining state of program I find out that this problem can be fixed by other developer and want to re assign the bug to him. Now: how do I describe the state of program? Describe reproduction steps? Add stack trace from logs? Call him to come at my desk so he can debug further?
That the real options. Unreal but wanted one is: I serialize state of whole program, attach it to the bug report, that other developer loads state from program and can examine everything he needs.
That's what I would call a hyper continuation - you send it to other part of the world and do not care about details.

2010-02-17

Stream close template.

I wanted to DRY my Java code by introducing common code for patterns like this:
InputStream input = new FileInputStream(file);
try {
  doSomethingWithStream(input);
} finally {
  if (null != input) input.close();
}
So I had two approaches to try out both having two functions: one for opening stream, other for doing actual work. First solution was with template utility function that accepts 2 function-like objects, second one used template class.

Code for first approach looks like following:
interface Ctor {
  A get();
}
interface F {
  void put(A arg);
}

class Utils {
  static  void withCloseable(Ctor streamConstructor, F block) throws IOException {
    C c = null;
    try {
      c = streamConstructor.get();
      block.put(c);
    } finally {
      if (c != null)
        c.close();
    }
  }
Use case might look like this:
Utils.withCloseable(
  new Ctor() {
    InputStram get() { 
      return new FileInputStream(file);
    }
  }, 
  new F(InputStream in) {
      doSomethingWithStream(in); // Whatever
  }
);
Second approach that uses template class expects used to extend class to provie necessary methods:
abstract class WithCloseable {
  protected abstract C open() throws IOException;
  protected abstract void runWith(C c) throws IOException;

  public T exec() throws IOException {
    C c = null;
    try {
      c = open();
      runWith(c);
    } finally {
      if (c != null)
        c.close();
    }
  }
} 


Alas, while both approaches allow to make closing streams more regular they also made the code look involved and bloated. The situation only worsened when I tried to modify the code to it returns value from function that works with stream or try to nest several templates if I want to work with more than stream (for example, one for input other for output).


It is frustrating to watch every time how Java resist being more consice. I am looking forward for using Clojure or Scala in projects at my job I have no doubt that clojures if they ever appear in Java will be yet another half-solution made with compatibility as sole requirement (first one was generics).

2010-02-06

How to refresh resources in Eclipse's workspace

Since we add own file system implementation to EFS there's need to refresh different resources in workspace to provide better feedback to user.

Naive usage of IResource.refreshLocal(Monitor) caused cryptic exceptions a about conflicting rules. This for example happens when you first start refreshing a file and while it is processed you request to refresh it's folder. It took some time and several re-opened bugs in project I am working to figure out proper way to do refreshes. The fact that IResource also implements ISchedulingRule also adds to confusion. First problem-less implementation used Workspace.getRefreshManager().refresh(resource). Refresh manager maintains queue for refresh requests that run consequently so they do not conflict. Unfortunately Workspace is not part of public API so it lead me to look for other solutions. Parameter "monitor" in IResource.refreshLocal() lead me to inspect Job class that is used for long running background tasks. It turned out that Job has method Job.setRule(ISchedulingRule). So I created a job and set resource being refreshed as scheduling rule, this still caused refresh conflicts. So finally working code looks like this:
public static void refresh(final IResource resource) {
        final Job job = new Job("Refreshing " + resource) {
            @Override
            protected IStatus run(final IProgressMonitor monitor) {
                try {
                    resource.refreshLocal(IResource.DEPTH_INFINITE, monitor);
                    return new Status(IStatus.OK, "my.shiny.plugin", "Refreshed " + resource);
                } catch (final CoreException e) {
                    return new Status(IStatus.ERROR, "my.shiny.plugin", "Error refreshing "
                            + resource, e);
                }
            }
        };
        job.setRule(ResourcesPlugin.getWorkspace().getRuleFactory().refreshRule(resource));
        job.schedule();
    }
Last tar pit you might step into is names of rule factory. "createRule" returns rule for resource creation operation (although it worked in my case). The method that returns rule for refresh is "refreshRule".

Fetching a web page in Clojure

To automate part of my daily reporting routine I experimented with Clojure.
Here's a prototype script that fetches web page and extracts HTML's title of a web-page. Limitation of is stems from the fact that it parses HTML to DOM. So it should at least be well-formed (blogger.com's page is not).
Dependencies are Apache http client, Apache io commons and XML libraries from JDK.

The part I especially like is
(map #(.getNodeValue (.item nodes %)) (range (.getLength nodes)))

It extracts implicit collection of DOM nodes in form of two methods "item(index)" and "getLength" into a convenient list.

The script is below:

(ns getReport
 (:import
 (java.net URL)
 (java.io ByteArrayInputStream InputStream)
 (org.apache.http.client ResponseHandler HttpClient)
 (org.apache.http.client.methods HttpGet)
 (org.apache.http.impl.client BasicResponseHandler DefaultHttpClient)
 (java.util.regex Pattern Matcher)
 (javax.xml.parsers DocumentBuilderFactory)
 (org.w3c.dom Document)
 (org.apache.commons.io IOUtils)
 (javax.xml.xpath XPathFactory XPathConstants)))

(defn getResource [url]
 (let [client (DefaultHttpClient.)
   request (HttpGet. url)
   body (.execute client request (BasicResponseHandler.))]   
  (.. client getConnectionManager shutdown)
  body))

(defn makeDocFactory []
 (let [factory (DocumentBuilderFactory/newInstance)]
  (doto factory
   (.setValidating false)
   (.setExpandEntityReferences false)
   (.setXIncludeAware false)
   (.setSchema nil))
   (let [dBuilder (.newDocumentBuilder factory)]
   ; This entity resolver disables fetching DTD (w3c will be happy)
   (.setEntityResolver dBuilder
    (proxy [org.xml.sax.EntityResolver] []
     (resolveEntity [publicId systemId]          
      (new org.xml.sax.InputSource (new java.io.StringReader "")))))  
   dBuilder)))

(defn parseDom [html]
 (let [builder (makeDocFactory)]
  (.parse builder (ByteArrayInputStream. (.getBytes html)))))

(defn queryDoc [dom query]
 (let [factory (XPathFactory/newInstance)
   xpath (.newXPath factory)
   expr (.compile xpath query)
   nodes (.evaluate expr dom XPathConstants/NODESET)]
  (map #(.getNodeValue (.item nodes %)) (range (.getLength nodes)))))

(defn parseReport [htmlText]
 (queryDoc (parseDom htmlText) "/html/head/title/text()"))

(println (parseReport (getResource "http://twitter.com/")))

On security

My VPS recently got banned for spam which surprised me since none of my soft there sending email. So my first thoughts were that this is a...