Java と XML と空白と (6)：テキストノードの正規化（JDOM 編）

前回、「テキストの正規化」として Nux JavaDoc API に定義されている空白の扱いを見ました。

で、ふと JDOM の JavaDoc を見ると、似たような「テキストの正規化」が定義されていました・・・　しかも、こちらの方がなじみやすそうな命名で(^ ^;)　ってことで、今回は JDOM に定義されている「テキストの正規化」を見ていきます。

^{org.jdom.output}Format.TextMode

JDOM で「テキストの正規化」を定義しているのは、^{org.jdom.output}Format.TextMode クラスです(JDOM JavaDoc API)　種類は4つ：

PRESERVE
TRIM_FULL_WHITE
TRIM
NORMALIZE

以下、次のサンプル XML 文書についての適用結果と共に処理方法を見ていきましょう（前回と同じ）：

<root>
  <text>
    If at first an idea does not sound absurd,
    then there is no hope for it.
        - ALBERT EINSTEIN
  </text>
</root>

テキストの処理

★PRESERVE（保存する）★

All content is printed in the format it was created, no whitespace or line separators are are added or removed.

何もしません。　@xml:space 属性が付いているのと同じです。　大抵の場合、これがアプリケーションのデフォルト処理になっています。

<root>
  <text>
    If at first an idea does not sound absurd,
    then there is no hope for it.
        - ALBERT EINSTEIN
  </text>
</root>

= ^nux.xom.pool XOMUtil.Normalizer.PRESERVE

★TRIM_FULL_WHITE（全てが空白のテキストを切り取る）★

Content between tags consisting of all whitespace is not printed. If the content contains even one non-whitespace character, it is printed verbatim, whitespace and all.

空白のみからなるテキストを取り除き、それ以外のテキストはそのままにします。

<root><text>
    If at first an idea does not sound absurd,
    then there is no hope for it.
        - ALBERT EINSTEIN
  </text></root>

= ^nux.xom.pool XOMUtil.Normalizer.STRIPE

★TRIM（切り取る）★

Same as TrimAllWhite, plus leading/trailing whitespace are trimmed.

先頭と後尾の空白を取り除きます。

<root><text>If at first an idea does not sound absurd,
    then there is no hope for it.
        - ALBERT EINSTEIN</text></root>

= ^nux.xom.pool XOMUtil.Normalizer.TRIM

★NORMALIZE（正規化する）★

Same as TextTrim, plus addition interior whitespace is compressed to a single space.

空白の連なりをスペース (' ') で置き換え、先頭と後尾の空白を String#trim() メソッドで取り除きます。　値が単語の XML プロパティファイルまたは XHTML に使うことが多いと思います。

<root><text>If at first an idea does not sound absurd, then there is no hope for it. - ALBERT EINSTEIN</text></root>

= ^nux.xom.pool XOMUtil.Normalizer.COLLAPSE

倭マン's BLOG

くだらない日々の日記書いてます。　たまにプログラミング関連の記事書いてます。　書いてます。

Java と XML と空白と (6)：テキストノードの正規化（JDOM 編）

^{org.jdom.output}Format.TextMode

テキストの処理

★PRESERVE（保存する）★

★TRIM_FULL_WHITE（全てが空白のテキストを切り取る）★

★TRIM（切り取る）★

★NORMALIZE（正規化する）★

org.jdom.outputFormat.TextMode

テキストの処理

★PRESERVE（保存する）★

★TRIM_FULL_WHITE（全てが空白のテキストを切り取る）★

★TRIM（切り取る）★

★NORMALIZE（正規化する）★

^{org.jdom.output}Format.TextMode