DRY your code

2015-04-25

On security

My VPS recently got banned for spam which surprised me since none of my soft there sending email. So my first thoughts were that this is a mistake (e.g. due to IP address spoofing) or some vulnerability in that VPS is exploited. Since I could not verify or affect first and second one is more dangerous I started investigation.
So now I assume that someone broke into VPS. It could be 'traditional' breach with gaining login or transient one without leaving traces on disk.
Inspected my services: top; netstat; lsof -i; df; docker ps -a
See logs: /var/log/syslog*, /var/log/auth*
And there are continuous attempts to brute-force ssh passwords - someone who already got in would not to do that. Miss.
Checked for extra users or installed email servers - none.
Syslog contained multiple messages like

TCP: TCP: Possible SYN flooding on port 8080. Sending cookies. Check SNMP counters.
nf_conntrack: table full, dropping packet
net_ratelimit: 20 callbacks suppressed

then

ziproxy invoked oom-killer

So now I get a clue. Now I get something that looks like DOS but no 8080 port should be visible outside... It turns out that syslog shows port of docker container not host's.
One of slightly surprising things of docker is that it uses LXC wchich are "chroot on steroids". So any process running "in docker" is actually normal process running on host's kernel and there's no "container" process. That said, ziproxy that runs in Docker was accessed from outside what caused it to send emails. Now ziproxy is a web proxy. Was there some overflow that injected outside code or proxy was used as is?
I found no known vulnerabilities for ziproxy, so it's probably normal operation. Since complaints were for SMTP it should have been CONNECT method. Looking at ziproxy sources I found out that CONNECT is supported by ziproxy. To check this I tried:

$ telnet localhost 8888
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
CONNECT my-vps-domain.net:80 HTTP/1.1
HTTP/1.0 200 Connection established
GET /favicon.ico
301 Moved Permanently
nginx/1.2.1Connection closed by foreign host.

(lines in bold is what I typed)

Inspecting syslog I found that first "Possible SYN flooding" was exactly at the moment of spam message sent. And there are no similar events before that.

Now most likely situation was that I accidentally exposed ziproxy (it was intended to be visible from localhost only) and someone used CONNECT method to relay email sending. Far simpler than I was afraid of.
To verify this I started ziproxy again and there's "Possible SYN flooding" again.

The actual problem with my setup was that I misunderstood defaults of "-P" and "-p" Docker switches. "-P" exposes all ports even without parameters. "-p", given ports only exposes ports on all interfaces while I needed only "loopback".

I used "nmap" to find opened ports onl VPS and that helped to fix Docker settings.
And I also installed fail2boot. It's default settings are reasonable to throttle DOS and brute-forces.

2012-08-10

Receipe of the day: gradle with eclipse config.

I use Eclipse STS (Spring Tools Suite) with Spring support. To help editing gradle.build file two plugins should be installed:

Groovy eclipse plugin http://groovy.codehaus.org/Eclipse+Plugin
SpringSource Tool Suite Gradle Integration (is included into STS)

To generate eclipse project file from build coniguration add following to build.gradle:

apply plugin: 'eclipse' 
eclipse {    
  project {
    natures 'com.springsource.sts.gradle.core.nature',
            'org.eclipse.jdt.groovy.core.groovyNature',
            'com.springsource.sts.gradle.core.nature'
  }            
}

2012-05-15

I remove about 40 lines of code a day

Two days of work:

422  ++++++++---------
248  +++++-----
84   ++--
141  ++----
85   ++--
36   +-
480  ++++----------------
9    +-
58   ---
453  +++++++++---------
126  +++--
71   ++-
44   +-
72   +--
79   ++--
109  +++--
111  ++---
12   +-
16   -
20   +-
30   +-
32   --
16   +-
40   +-
21   +-
1    -
15   +-
16   +-
16   -
12   +-
39   +-
3    -
11   -
52   ---

AND several bugs fixed in code that was changed.

2011-12-16

It is a destiny

Every project that I take from previous developers I have to answer the same question: "Was developer who wrote this brain-damaged, just sloppy or this horrifying spaghetti mess reflects some business logic that is actually needed".

The worst part is that there is never easy answer to this. So it always ends up with tedious ongoing refactoring and meticulous analysis.

2011-09-28

Parentheses in Lisp: let's count them.

Parentheses are the first thing any newcomer notices about Lisp program. This is something that put off many who does not want to learn the language just by looking at the syntax. However, If they think that there are too many parentheses, do they actually count them? Let's look at a fragment of java code:

public static void closeSocket(final Socket socket) {
        try {
            socket.close();
        } catch (IOException e) {
            LOG.error("Error closing socket " + socket, e);
        }
}

Total: 14 braces and parentheses.

Now, below is how equivalent Lisp code could look like (let it be in Clojure as an example)

(defn closeSocket [^:Socket socket]
  (try 
    (.close socket)

      (catch IOException e
        (LOG/error (str "Error closing socket " socket) e)))))

Total: 14 braces and parentheses.

Now I would say that those who reject a language judging by it's syntax do not even look at syntax closely.

2011-09-22

More on concise code

"How many lines of code have you deleted in one day?"
My answer is "~2150 of perfectly working code and not a single feature broke".

And one more essay by Michael Feathers: The Carrying-Cost of Code: Taking Lean Seriously

2011-09-16

Why programming language design is hard

It is harder to write reusable code because it is used in more than one context. In particular it is the case with unit-testable code, because you have to write code for use in tests and in production. But programming language is probably an extreme: sooner or later every feature you have will be used in every possible combination.

Quote from Neal Gafter

There's more to my previous post. The quote is actually not about unnecessary code but about unnecessary features but still pertinent http://www.infoq.com/articles/neal-gafter-on-java :

And when you add something that someone doesn't benefit from, it's actually a negative for them. Even though they don't have to use it, or look at it, or care about it, it makes the system more complicated for them.

2011-09-04

Concise code

This subject has been bothering me for a long time. So I decided to collect my arguments in single place, here.
I insist on that good code is concise code (but not necessarily otherwise, to be precise). Objections to this that I heard most often so far are
1. The code is being read more times than it is written. Denser code is hard to read.
2. The code is for human beings and that is why it should be clear. Denser code is harder to read.
3. Copy-past is not bad since in future requirements may change and then copied parts would naturally diverge. Extra code in the project is just extra code - it sits there and costs nothing.

And all of them ARE WRONG.
Yes code is being read more often than it is written. And that is why short code matters. There are numerous observations that human brain can manipulate by limited number of symbols at a time. If you present whole program in a way that brain can grasp at once you get a bonus understanding what is going on here. This is instead of working as meat-based decompiler trying to understand pieces of system, then trying to put pieces together, then trying to understand how this system works as a whole. By writing program that fits in your brain you turn computer into your friend that has interface compatible with you. This is surprising how may program systems where not even one developer understands. Developers working on them just get used to navigating symbol browsers in their IDE without even trying to understand parts of code that are not directly related to task at hand.
Yes code is being read more often than it is written. To an extent. New code almost always needs refactoring as requirements and more appropriate designs become more apparent. The less code is there the more chances rewriting it would be feasible the better chances design would be sound in future. In that very future rewrite is often impossible. So keeping code compact is one of ways to better software. I have seen developers who were reluctant to rewrite code that was first written yesterday even though every one understand that rewrite would significantly decrease complexity. You see? Even yesterday's code might be already too old for rewrite.
Every middle or large size project I participated in contained many hundreds or even thousands lines of unnecessary code. Worse then it's redundancy is often not apparent since only some parts of it have textual similarity. You need time to understand that this large chunk of code is completely unnecessary.
And anyone who will dig this code will have to expend an effort to understand these parts of code. One have to understand code before changing it, right? Instead of immediately seeing that this and that parts of system are essentially the same thing with minor tweak in the middle that is passed as parameter, I have to work as overpaid diff program that scans two large chunks of code for that single dot that is in different place. And after finally finding it still wondering if it is bug introduced by sloppy maintenance or a deliberate change in functionality. This is part of increased support cost due to unnecessary code. Unnecessary code is not jut sitting there on a hard drive and bothers no one. Instead it constantly sucks team's time and energy to support it.

They say that when requirements change those copy-pasted pieces can be changed separately. The problem is that you never know how those requirement will change. And there is big chance at least, say 50%, that requirements will change in a way that would require both of those pieces. And the large project the higher chances that required changes will be cross-cutting. You ain't gonna need it.

Yes, in less denser code it is easier to understand one isolated line of code. But is there a point in understanding line "i++;" or "}"? Is there a point in understanding separate lines of code if you still do not understand what is going on? You need to understand what the program is doing if not as whole but at least a substantial piece of it. Yes in denser code you probably need some times more mental effort to understand it but is it a problem since you need in proportion less lines of code that does the same thing? And remember that you will have better chances to understand and maintain the code.

I have one more observation related to good design. To write concise code you have to understand deeply what your program should do. And this is only way to do it. Often people who unconsciously equate boated code to more approachable one imagine some big system written as 1 megabyte regexp as example of brief code. No you cannot do that. At most you could cram what you have into a few percent less lines of code indeed impairing readability. To have better code one should understand it.

Several years I have been working with Java. And there is one thing that is related to it. Some tools (programming languages) just do not have adequate means for code structuring. Often trying to implement in Java approaches that reduce code significantly in other languages lead to more bloated code. Java just resist being concise. So it is not just about design.

One last piece is somehow related to Java is use of design patterns. As any good idea that gets into popular culture (popular programming culture in this case) it becomes a parody on itself. I find this gem by Sarah A. Sheard to be most memorable text on the subject. They forget that every pattern has not only benefits but also an area of applicability and drawbacks. So one have to ponder if code would be better if a pattern is applied. They think instead that the more patterns used the better. They forget that patterns is not something devised by demigods and given to us mortals to be unquestioningly used. Patterns are extracted from real systems. And ones system might need some pattern that is not in GoF bible. And the best way to extract relevant patterns is refactoring. Seeing how often patterns are overused is upsetting since most of them increase amount of code. How often have you seen factory is used where a plain constructor would suffice? Sigh.

2011-07-10

Logging and bug reports

A useful bug report has all three parts: how to locate or reproduce it, what is actually there and what should there be.
I would extend this to the all error reporting in software. Consider, for example, logging. I have many times seen and written code where log messages are written in ad-hoc way just to help with code debugging. Then such system goes production and something terrible happens. And then you discover that error message was written but there's tiny missing piece of information, say identifier of troubling object, that would help you to diagnose the problem. Developer would not notice this before because he could just start debugger with a breakpoint, and look into actual state to find all interesting values. But in production one usually can not afford this.
So here is a rule I have now: if you write error log message put yourself in a position of someone who needs to diagnose or resolve this problem in a production environment. This normally means that error message is a brief bug report:

You say where it happened so developer might find code in question and sysadmin might find configuration problems. And both may find data that caused problem if there is such thing.
Error message says what's wrong
Error message says what is expected if that is not obvious from above. For example, if some value is inappropriate, say also what are allowed values. Not just "Salary value out of range", but "New salary value 1000 is out of range. Allowed range is [1000000, 5000000).".

PS This classic text is somehow on topic.

2010-12-30

Do not do this, please: this instanceof

Since I do like refactoring I decided that I would document code patterns that I stumble upon.
Consider following code (Java). First snippet shows
It is uneasy for me to explain why this is not a good solution because it is so obvious to me. It's like explaining why anecdote is funny to someone who does not get it. But since I met such code in more than one project written by different people there must be something worth explaining.
First, look at better version of this code:
This rewrite eliminates runtime check, removes dependency of base class on it's subclasses and removes temptation of junior developers who will modify this code after you to insert more insanceof checks in same method. And after all, in Java subclassing is language's idiomatic way do do such things.
I have written what wanted on this case but it is hard to me to stop here, so I continue.
If I would do refactoring I would stop after this rewrite. But looking at the methods I would scent that something else needs improvement: call to "bark" suggests that it belongs not to absrtact animal but to a dog. So if indeed we have only two subclasses Cat and Dog, I would further rewrite the code as:
I would advocate using composition over subclassing if everything else is equal. So if after these changes there's no real code left in Animal I would convert it to interface:
They tell me "ship it" and here I stop.