SAXにてXMLの階層構造を扱う際、あるタグの内部に入ったことをフラグにて判断する方法があると思う。例えば以下のようにフラグを用いることで、hogeタグの内部に入ったのか入っていないのかを判断できる。
class SimpleHandler(ContentHandler): def __init__(self): self.inHoge = False def startElement(self, name, attrs): if.name == "hoge": self.inHoge = True def endElement(self, name): if.name == "hoge": self.inHoge = False
ただこれ、もうちょっとスマートに出来ないものかと、ContentHandlerを動的に変更することができないかどうか試してみた。それが出来れば、フラグを立てる代わりにhandlerを入れ替えてしまうという方法が取れる。
import sys from xml.sax import ContentHandler from xml.sax import make_parser class FirstHandler(ContentHandler): def __init__(self, parser, handler): self.parser = parser self.handler = handler def startElement(self, name, attrs): self.printString("FirstHandler", "Start of " + name) def endElement(self, name): self.printString("FirstHandler", "End of " + name) if name == "change": self.parser.setContentHandler(self.handler) def printString(self, title, text): print "[" + title + "] : " + text class SecondHandler(ContentHandler): def startElement(self, name, attrs): self.printString("SecondHandler", name + " starts!") def endElement(self, name): self.printString("SecondHandler", name + " ends!") def printString(self, title, text): print "[" + title + "]-> " + text if __name__ == "__main__": parser = make_parser() secondHandler = SecondHandler() firstHandler = FirstHandler(parser, secondHandler) xmlFile = open(sys.argv[1], 'r') parser.setContentHandler(firstHandler) parser.parse(xmlFile)
このプログラムで下記のXMLファイルを読み込んだ際の実行結果を示す。
<?xml version="1.0"?> <os> <windows company="microsoft" price="not free"> </windows> <unix> </unix> <change/> <linux price="free"> <debian/> <fedora/> <redhat/> </linux> <macos company="apple"> </macos> </os>
実行結果。
[FirstHandler] : Start of os [FirstHandler] : Start of windows [FirstHandler] : End of windows [FirstHandler] : Start of unix [FirstHandler] : End of unix [FirstHandler] : Start of change [FirstHandler] : End of change [SecondHandler]-> linux starts! [SecondHandler]-> debian starts! [SecondHandler]-> debian ends! [SecondHandler]-> fedora starts! [SecondHandler]-> fedora ends! [SecondHandler]-> redhat starts! [SecondHandler]-> redhat ends! [SecondHandler]-> linux ends! [SecondHandler]-> macos starts! [SecondHandler]-> macos ends! [SecondHandler]-> os ends!
追記 :
あ、こうすればいいだけか。簡単なことだった。
import sys from xml.sax import ContentHandler from xml.sax import make_parser class FirstHandler(ContentHandler): def setParser(self, parser): self.parser = parser def setNextHandler(self, handler): self.nextHandler = handler def startElement(self, name, attrs): self.printString("FirstHandler", "Start of " + name) def endElement(self, name): self.printString("FirstHandler", "End of " + name) if name == "change": self.parser.setContentHandler(self.nextHandler) def printString(self, title, text): print "[" + title + "] : " + text class SecondHandler(ContentHandler): def setParser(self, parser): self.parser = parser def setNextHandler(self, handler): self.nextHandler = handler def startElement(self, name, attrs): self.printString("SecondHandler", name + " starts!") def endElement(self, name): self.printString("SecondHandler", name + " ends!") if name == "change": self.parser.setContentHandler(self.nextHandler) def printString(self, title, text): print "[" + title + "]-> " + text if __name__ == "__main__": parser = make_parser() firstHandler = FirstHandler() secondHandler = SecondHandler() firstHandler.setParser(parser) firstHandler.setNextHandler(secondHandler) secondHandler.setParser(parser) secondHandler.setNextHandler(firstHandler) xmlFile = open(sys.argv[1], 'r') parser.setContentHandler(firstHandler) parser.parse(xmlFile)
これで
実行結果の例。
[FirstHandler] : Start of os [FirstHandler] : Start of windows [FirstHandler] : End of windows [FirstHandler] : Start of change [FirstHandler] : End of change [SecondHandler]-> unix starts! [SecondHandler]-> unix ends! [SecondHandler]-> change starts! [SecondHandler]-> change ends! [FirstHandler] : Start of linux [FirstHandler] : Start of debian [FirstHandler] : End of debian [FirstHandler] : Start of fedora [FirstHandler] : End of fedora [FirstHandler] : Start of redhat [FirstHandler] : End of redhat [FirstHandler] : End of linux [FirstHandler] : Start of change [FirstHandler] : End of change [SecondHandler]-> macos starts! [SecondHandler]-> macos ends! [SecondHandler]-> os ends!