Difference between revisions of "Ar Tonelico III"

From Learning Languages Through Video Games
Jump to navigationJump to search
m
Line 1: Line 1:
 
Lots and lots of text and lots of obscure kanji! First I'll finish the game, then I can re-watch the cosmosphere events from the extra menu. I will probably do the cosmospheres (they're ridiculously funny), perhaps some talk events and if there should be interested, let's see.
 
Lots and lots of text and lots of obscure kanji! First I'll finish the game, then I can re-watch the cosmosphere events from the extra menu. I will probably do the cosmospheres (they're ridiculously funny), perhaps some talk events and if there should be interested, let's see.
 
== Translations ==
 
== Translations ==
* [[/jp-en|Japanese to English]]
+
* [[/ja-en|Japanese to English]]
 
(or should I translate to German??)
 
(or should I translate to German??)
  

Revision as of 03:02, 6 April 2011

Lots and lots of text and lots of obscure kanji! First I'll finish the game, then I can re-watch the cosmosphere events from the extra menu. I will probably do the cosmospheres (they're ridiculously funny), perhaps some talk events and if there should be interested, let's see.

Translations

(or should I translate to German??)

Tips

I used the Japanese IME "canna" under ubuntu, compiled from source, and changed kana-kanji dictionaries, so that kanji+furigana is written upon entering and converting Japanese te

Script

Now I managed to dump the script. Much better than having to write by hand:) And I also ripped the voice clips and bgm, character poses, the textures, I can view a few models (not the character models though)... The dump can be found here: [1] or [2] or [3] Now updated with speaker information! For anyone who's interested, what I found out...

AT3 ebd script files

• consists of EVENT_MESSAGE_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called DIAG from now on) and EVENT_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called CTRL from now on) 
• each DIAG corresponds do a CTRL file with the same NUMBER's
•  DIAG to contains the main dialogue lines, while CTRL is probably system-related
• DIAG files are also usually only a few hundred bytes long
• DIAG has a header of 4 bytes, then comes the main part

• the first 26 bytes of CTRL are as follows (decimal):
[#1] 000 000 000 000 000 000 000 005 000 000 000 110 097 109 101 000 005 000 000 000 144 224 150 190 000 [#2] [#3] 000 000 [#4] 000 [#5] [#6] [#7] [#8] [#8] 000 [#9] 000
whereas #[n] are 
-- #1takes many different values, 001 is very frequent (~50%)
-- #2 takes many different values
-- #3 mostly 000, a few times 001, 002,  4 times 003, 3 times 004
-- #4 mostly small bytes <=021, 021 and 00x frequently occur in adjacent files together, takes 044 in two instances
-- #5 either 000, 016, 049, or 064
-- #6 always either 113, 116, 117, 119, or 127
-- #7 bytes <= 025, either 00x or 021x with x<=5 except a handful of times
-- #8 almost always 000, except 10 and 7 files respectively
-- #9 either 000, 001, 002, 003, 004, 005, 017, 019, 021. 025 with the lower bytes much more common
• the byte of CTRL always seems to be <bh:7f>, the last 26 bytes only being somewhat similiar
• in general, CTRL displays a high ration of <bh:00>
• CTRL contains no UTF8 chars
• the main part of CTRL, apart from the man 0's, contains only ASCII chars, most of which are LATIN characters and punctuation, with a few special chars such as <bh:f4>, <bh:dc> (Ü)

• the main part of DIAG is in the following format, after the 3-byte header comes:
[SEPARATOR]  [UTF8-sequence][SEPARATOR][UTF8-sequence] ... [UTF8-sequence][SEPARATOR]
• as the text is Japanese, [UTF8-sequence] is usually a multiple of 3-byte blocks, each block representing a multi-byte for one Japanese character; it terminates on a zero-byte
• the main text may contain a ※削除※ line, [LEADING] is then <bh:ff>
• [SEPARATOR] always consists of 36 bytes, each byte smaller than <bd:192>,  with the only exception it may also contain <bh:ff>. Not counting the <bh:00> byte UTF8 terminating byte.
• [SEPARATOR]: most bytes are constant, except the following meaningful bytes
• the 25th byte: it is a [LEADING] number, counting the dialogue lines
• a [LEADING] byte <bh:ff> this line is outside the "normal" dialogue flow, ie a system message ("You got item..") or "Party member xyz joined." or "……。"  or "…!?" &c.
• the 13th byte: this indicates the [SPEAKER]. [SPEAKER] is <bh:ff> when there is no speaker
• the first byte indicates the [MODE]
<bh:00> - talk with speech bubbles at character's 3D models
<bh:01> - talk with 2D character portraits
<bh:02> - item get
TO SUMMARIZE
• dialogue in EVENT-MESSAGE file: [3 byte header][36-byte separator][UTF8 byte sequence, terminating on <bh:00>], repeat
• 13th byte [SEPARATOR] is speaker, 26th [SEPARATOR] marks "normal" spoken text

And here is an improved lua script I wrote that looks for valid UTF8 sequences in a file, works much better and doesn't need specific information on separators &c.:

--PARSE BYTES FOR VALID UTF8 string terminated sequences (except ASCII, ie the first bit non-zero)
--specify how what's between UTF8 should be interpreted, don't forget the newline!
function process_separator(sep)
 -- return "" --just use this if you only the UTF8 data
 if #sep>25 then
  return "(" .. sep[1] .. "・" .. sep[13] .. "・" .. sep[25] .. ")"
 else
  local tag = ""
  local to_number = {A=10, B=11, C=12, D=13, E=14, F=15, a=10, b=11, c=12, d=13, e=14, f=15}
  to_number["0"] = 0
  to_number["1"] = 1
  to_number["2"] = 2
  to_number["3"] = 3
  to_number["4"] = 4
  to_number["5"] = 5
  to_number["6"] = 6
  to_number["7"] = 7
  to_number["8"] = 8
  to_number["9"] = 9
  for _,v in ipairs(sep) do
   local h1=to_number[string.sub(v,1,1)]
   local h2=to_number[string.sub(v,2,2)]
   if h2 then
    tag = tag .. string.char(16*h1+h2)
   else
    tag = tag .. string.char(h1)
   end
  end
  return tag
 end
end

--true is interpreted as 1, nil or false as 0
function dec_to_8bit(dec,byte) --byte must point to an initialized table of wrong (false or nil) values
 local exp = 128
 for i=1,8 do
  if dec >= exp then
   byte[i] = true
   dec = dec-exp
  end
  exp = exp*.5
 end
end

function get_utf8(filename, outname)
 local infile = io.open(filename,"rb")
 if infile then
  print("Searching for valid UTF8 in file: " .. filename .. "...")
  out = io.open(outname,"a+") --change "w+" to "a+" to append to end of file, not deleting previous data
  out:write("#FILE: " .. filename .. "\n")
  local occ    = 0 --just count how many valid chars we found
  local cur_pos
  local len    = 0
  local len2
  local utf8   = ""
  local betw   = {} --what's between the utf8 sequences
  local betw2
  local insert_line_break = false
  local insert_sep        = false
  local file_len = infile:seek("end")
  infile:seek("set")
  repeat
   local dec  = string.byte(infile:read(1))
   cur_pos = infile:seek("cur")
   local byte = {}
   dec_to_8bit(dec,byte)
   if len >= 1 then
    if not byte[1] or byte[2] then
     --UTF8 multibyte chars MUST start with 10 except the first byte!
     utf8 = ""
     insert_sep = true
     table.insert(betw,string.format("%x",betw2))
     --return to where we wrongly assumed UTF8 started...
     cur_pos = cur_pos+len2-len-1
     infile:seek("set",cur_pos)
     len  = 0
    else
     --valid utf8 found, dumping...
     utf8 = utf8 .. string.char(dec)
     len = len - 1
     occ = occ + 1
     if len==1 then
      --utf8 sequence end
      len = 0
      if insert_sep and #betw>0 then
       out:write(process_separator(betw) .. utf8)
       betw = {}
       insert_sep = false
      else
       out:write(utf8)
      end
      insert_line_break = true
      utf8 = ""
     end
    end
   else
    if byte[1] and byte[2] then --we are not interested in ASCII chars... otherwise allow b2=="0"
     -- now determine byte length of glyph
     len = 2
     repeat
      len = len+1
     until not byte[len]
     len = len-1
     if len > 6 then
      --UTF8 only allows for 6byte chars at most
      table.insert(betw,string.format("%x",dec))
      len = 0
      insert_sep = true
     else
      utf8 = utf8 .. string.char(dec)
      len2 = len
      betw2= dec
     end
    else
     if (dec == 0) and insert_line_break then
      --zero terminated :)
      out:write("\n")
     else
      table.insert(betw,string.format("%x",dec))
     end
     insert_sep = true
    end
    insert_line_break = false
   end
  until cur_pos >= file_len
  out:write("\n")
  infile:close()
  out:close()
  print("Found " .. occ .. " valid UTF8 chars, except ASCII.\nWritten to "  .. outname .. ".\nDone.")
 end
 return occ
end

get_utf8(arg[1],arg[2])

Oh, and btw, the AT3 script contains 1026643 characters : )