• tdom encoding

    From saito@saitology9@gmail.com to comp.lang.tcl on Mon Dec 16 19:01:02 2024
    From Newsgroup: comp.lang.tcl

    I am trying to see why tdom is failing on this json snippet.

    package req tdom
    set x {{"name":"Jeremi"}}
    dom parse -json $x

    error "JSON syntax error" at position 15
    "{"name":"Jeremi <--Error-- "}"


    If it doesn't get removed by the newsgroup editors, there is a weird
    character at the very end of x. It looks almost like "[]" but it is
    not. When you edit it, it acts as if it has multiple characters in it.


    Another problem is that tdom man page talks about a command "dom setResultEncoding ?encodingName?" but trying it results in an unknown
    command error.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From greg@gregor.ebbing@gmx.de to comp.lang.tcl on Tue Dec 17 03:13:14 2024
    From Newsgroup: comp.lang.tcl

    Am 17.12.24 um 01:01 schrieb saito:
    I am trying to see why tdom is failing on this json snippet.

    package req tdom
    set x {{"name":"Jeremi"}}
    dom parse -json $x

    error "JSON syntax error" at position 15
    "{"name":"Jeremi <--Error-- "}"


    If it doesn't get removed by the newsgroup editors, there is a weird character at the very end of x.  It looks almost like "[]" but it is
    not.  When you edit it, it acts as if it has multiple characters in it.


    Another problem is that tdom man page talks about a command "dom setResultEncoding ?encodingName?" but trying it results in an unknown command error.

    Hello,

    The unknown character is 007 or BELL.
    Probably not allowed as a char in string.
    Instead: \u0007

    Gregor


    package req tdom

    proc chr c {
    if {[string length $c] > 1 } {
    error "chr: arg should be a single char"
    }
    set v 0
    scan $c %c v
    return $v
    }

    # Check character types and provide additional information
    proc charInfo char {
    if {[string is control $char]} {
    return "control character"
    } elseif {[string is space $char]} {
    return "space character"
    } elseif {[string is digit $char]} {
    return "digit character"
    } elseif {[string is lower $char]} {
    return "lowercase alphabetic character"
    } elseif {[string is upper $char]} {
    return "uppercase alphabetic character"
    } elseif {[string is punct $char]} {
    return "punctuation character"
    } elseif {[string is graph $char]} {
    return "graphical character"
    } elseif {[string is print $char]} {
    return "printable character"
    } else {
    return "unknown character type"
    }
    }

    proc infochar {x} {
    puts $x
    set i 0
    while {$i<[string length $x]} {
    set c [string index $x $i]
    puts "$i is $c [charInfo $c] [chr $c] "
    incr i
    }
    }

    set x {{"name":"Jeremi"}}
    infochar $x
    catch {dom parse -json $x} mess
    puts "mess: $mess"

    set x {{"name":"Jeremi\u0007"}}
    set doc [dom parse -json $x]
    puts [$doc asXML]

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rich@rich@example.invalid to comp.lang.tcl on Tue Dec 17 04:20:54 2024
    From Newsgroup: comp.lang.tcl

    saito <saitology9@gmail.com> wrote:
    I am trying to see why tdom is failing on this json snippet.

    package req tdom
    set x {{"name":"Jeremi^G"}}
    dom parse -json $x

    error "JSON syntax error" at position 15
    "{"name":"Jeremi^G <--Error-- "}"

    Assuming the ^G that did come through properly represnts the
    character, then greg is right, it is an ASCII bell character, and per
    the JSON spec [1] raw control characters are not allowed to be part of
    a JSON string.

    Which is why Tdom is telling you 'error' at the ^G output.

    Are you on linux? If yes the hexdump, objdump, or xxd (xxd is easiest
    to use) commands will show you exactly what raw byte values exist in
    the file.


    [1] https://www.json.org/json-en.html
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From saito@saitology9@gmail.com to comp.lang.tcl on Mon Dec 16 23:51:11 2024
    From Newsgroup: comp.lang.tcl

    On 12/16/2024 9:13 PM, greg wrote:

    Hello,

    The unknown character is 007 or BELL.
    Probably not allowed as a char in  string.
    Instead: \u0007

    Gregor


    Thank you and Rich for the wonderful info and the code.

    The json data is what I receive from an api. I first thought it had to
    do with encoding issues. It happens frequently so I maybe I will ask
    them to be more careful with their json data generation.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rich@rich@example.invalid to comp.lang.tcl on Tue Dec 17 04:59:22 2024
    From Newsgroup: comp.lang.tcl

    saito <saitology9@gmail.com> wrote:
    On 12/16/2024 9:13 PM, greg wrote:

    Hello,

    The unknown character is 007 or BELL.
    Probably not allowed as a char in  string.
    Instead: \u0007

    Gregor


    Thank you and Rich for the wonderful info and the code.

    The json data is what I receive from an api. I first thought it had
    to do with encoding issues. It happens frequently so I maybe I will
    ask them to be more careful with their json data generation.

    If you are getting it from an API then you've found a bug if the API
    is /really/ sending raw control characters as part of a JSON string.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Wed Dec 18 15:04:07 2024
    From Newsgroup: comp.lang.tcl


    saito <saitology9@gmail.com> writes:
    I am trying to see why tdom is failing on this json snippet.

    package req tdom
    set x {{"name":"Jeremi"}}
    dom parse -json $x

    error "JSON syntax error" at position 15
    "{"name":"Jeremi <--Error-- "}"

    Rich already pointed out rightly that control characters are not allowed literally in JSON strings. As tDOM rightly complains your input is not
    JSON.

    [snip]
    Another problem is that tdom man page talks about a command "dom setResultEncoding ?encodingName?" but trying it results in an unknown
    command error.

    You obviously use a (very) old tDOM version. The dom method
    setResultEncoding is a relict out of the times as tDOM still supported
    Tcl 8.0 (and the functionality was only needed / useful if build/used
    with Tcl 8.0).

    The documentation and implementation of this method was removed with
    tDOM 0.9.1 (more than six years ago). Most recent version is 0.9.5.

    rolf
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From saito@saitology9@gmail.com to comp.lang.tcl on Wed Dec 18 14:57:11 2024
    From Newsgroup: comp.lang.tcl

    On 12/18/2024 9:04 AM, Rolf Ade wrote:

    You obviously use a (very) old tDOM version. The dom method setResultEncoding is a relict out of the times as tDOM still supported
    Tcl 8.0 (and the functionality was only needed / useful if build/used
    with Tcl 8.0).

    The documentation and implementation of this method was removed with
    tDOM 0.9.1 (more than six years ago). Most recent version is 0.9.5.


    Thanks for the info. I am using version 0.9.5 I downloaded from its
    official site some time ago. It comes with no documentation so I did an internet search. I guess that piece of info is from an outdated web
    page obviously, which I kind of guessed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Harald Oehlmann@wortkarg3@yahoo.com to comp.lang.tcl on Wed Dec 18 21:49:14 2024
    From Newsgroup: comp.lang.tcl

    Am 18.12.2024 um 20:57 schrieb saito:
    Thanks for the info. I am using version 0.9.5 I downloaded from its
    official site some time ago.  It comes with no documentation so I did an internet search.  I guess that piece of info is from an outdated web
    page obviously, which I kind of guessed.

    http://tdom.org/index.html/doc/trunk/doc/index.html
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From saito@saitology9@gmail.com to comp.lang.tcl on Wed Dec 18 17:29:54 2024
    From Newsgroup: comp.lang.tcl

    On 12/18/2024 3:49 PM, Harald Oehlmann wrote:
    Am 18.12.2024 um 20:57 schrieb saito:
    Thanks for the info. I am using version 0.9.5 I downloaded from its
    official site some time ago.  It comes with no documentation so I did
    an internet search.  I guess that piece of info is from an outdated
    web page obviously, which I kind of guessed.

    http://tdom.org/index.html/doc/trunk/doc/index.html

    Thanks, good to know.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Alan Grunwald@nospam.nurdglaw@gmail.com to comp.lang.tcl on Thu Dec 19 16:36:20 2024
    From Newsgroup: comp.lang.tcl

    On 17/12/2024 02:13, greg wrote:

    <snip>

    proc chr c {
      if {[string length $c] > 1 } {
        error "chr: arg should be a single char"
      }
      set v 0
      scan $c %c v
      return $v
    }

    # Check character types and provide additional information
    proc charInfo char {
      if {[string is control $char]} {
        return "control character"
      } elseif {[string is space $char]} {
        return "space character"
      } elseif {[string is digit $char]} {
        return "digit character"
      } elseif {[string is lower $char]} {
        return "lowercase alphabetic character"
      } elseif {[string is upper $char]} {
        return "uppercase alphabetic character"
      } elseif {[string is punct $char]} {
        return "punctuation character"
      } elseif {[string is graph $char]} {
        return "graphical character"
      } elseif {[string is print $char]} {
        return "printable character"
      } else {
        return "unknown character type"
      }
    }<snip>

    Many thanks from me too for the above procs, which have made their way
    (with acknowledgement) into my personal library of utility routines.

    Alan

    --- Synchronet 3.20a-Linux NewsLink 1.114