I tried to write some script that would allow me to look for particular values in set of xml files. I found out that Clojure supports XML parsing, query and editing of XML out of box (thanks to clojure-contib). The script follows (it's for Clojure 1.1):
(ns main.xquery
(:require
clojure.xml
[clojure.zip :as zz]
[clojure.contrib.zip-filter :as zf])
(:use
clojure.contrib.zip-filter.xml))
(defn dir [dirname]
(seq (map #(.getCanonicalFile %) (.listFiles (java.io.File. dirname)))))
(defn xml-file? [file]
(and (.. file (getName) (endsWith ".xml")) (.isFile file)))
(defn query [input-file query]
(let [xml (zz/xml-zip (clojure.xml/parse input-file))]
(apply xml-> xml query)))
(let [dirname "/home/petr/workspace/CloTry/"]
(filter #(not (empty? %))
(map #(query % [zf/children :item (attr= :type "green") (attr :name)])
(filter xml-file? (dir dirname)))))
This script scans all XMLs in given directory and returns list of matched "name" attribute values. Interesting work is in "query" function that is called with XML file and XPath-like query expression:
[zf/children [:item (attr= :type "green")] (attr :name)]
The query expression can be read as: take children of root element, retain ones whose attribute "type" has value "green" and take value of "name" attribute of these elements.
Note usage of clojure.zip functions that are generic way for traversing of editing hierarchical data structures, XML being one of them. xml-zip function takes parsed representation of XML file and returns it's representation that can be used with generic functions.
There was one problem: function "children" is defined in both clojure.zip and clojure.contrib.zip-filter namespaces. Both functions have same semantics but former gives NPE for some reason so I explicitly referred to one from clojure.contrib.zip-filter namespace.
Given that a XML file has this content:
<doc>
<section name="everything"/>
<item name="first" type="black"/>
<item name="second" type="green"/>
<item name="third" type="yellow"/>
<item name="fourth" type="green"/>
</doc>
The script outputs:
(("second" "fourth"))
Update: If you are interested to know file which produced result the above solution is OK. If you need to know just elements that matched then this would be neater
(let [dirname "/home/petr/workspace/CloTry/"]
(apply concat
(map #(query % [zf/children :item (attr= :type "green") (attr :name)])
(filter xml-file? (dir dirname)))))
Update 2: Or even shorter :)
(let [dirname "/home/petr/workspace/CloTry/"]
(mapcat #(query % [zf/children :item (attr= :type "green") (attr :name)])
(filter xml-file? (dir dirname))))
No comments:
Post a Comment