Difference between revisions of "Ar Tonelico III"

From Learning Languages Through Video Games
Jump to navigationJump to search
(clean up and clarify, previous edit was by me)
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Lots and lots of text and lots of obscure kanji! First I'll finish the game, then I can re-watch the cosmosphere events from the extra menu. I will probably do the cosmospheres (they're ridiculously funny), perhaps some talk events and if there should be interested, let's see.
+
Lots and lots of text and lots of obscure kanji!
 
== Translations ==
 
== Translations ==
* [[/jp-en|Japanese to English]]
+
* [[/ja-en|Japanese to English]]
 
(or should I translate to German??)
 
(or should I translate to German??)
  
Line 7: Line 7:
 
I used the Japanese IME "canna" under ubuntu, compiled from source, and changed kana-kanji dictionaries, so that kanji+furigana is written upon entering and converting Japanese te
 
I used the Japanese IME "canna" under ubuntu, compiled from source, and changed kana-kanji dictionaries, so that kanji+furigana is written upon entering and converting Japanese te
 
==Script==
 
==Script==
Now I managed to dump the script. Much better than having to write by hand:) And I also ripped the voice clips and bgm, character poses, the textures, I can view a few models (not the character models though)... The dump can be found here:
+
Yeah, the script in Japanese with a complete English translation [http://www.2shared.com/file/6v-zJ7J4/joined.html][https://rapidshare.com/files/457743039/joined](3.2MB) :)
[http://www.2shared.com/file/0DfnYWrm/dumptar.html] or
+
Oh, and btw, the AT3 script contains 1026643 characters : )
[http://www.megaupload.com/?d=GZIBBGD0] or
 
[http://www.mediafire.com/download.php?t22jmi6xwn5slh3]
 
Now updated with speaker information!
 
For anyone who's interested, what I found out...
 
<pre>AT3 ebd script files
 
  
• consists of EVENT_MESSAGE_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called DIAG from now on) and EVENT_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called CTRL from now on)
+
[[Category:Game pages]]
• each DIAG corresponds do a CTRL file with the same NUMBER's
+
[[Category:PS3 games]]
•  DIAG to contains the main dialogue lines, while CTRL is probably system-related
 
• DIAG files are also usually only a few hundred bytes long
 
• DIAG has a header of 4 bytes, then comes the main part
 
 
 
• the first 26 bytes of CTRL are as follows (decimal):
 
[#1] 000 000 000 000 000 000 000 005 000 000 000 110 097 109 101 000 005 000 000 000 144 224 150 190 000 [#2] [#3] 000 000 [#4] 000 [#5] [#6] [#7] [#8] [#8] 000 [#9] 000
 
whereas #[n] are
 
-- #1takes many different values, 001 is very frequent (~50%)
 
-- #2 takes many different values
 
-- #3 mostly 000, a few times 001, 002,  4 times 003, 3 times 004
 
-- #4 mostly small bytes <=021, 021 and 00x frequently occur in adjacent files together, takes 044 in two instances
 
-- #5 either 000, 016, 049, or 064
 
-- #6 always either 113, 116, 117, 119, or 127
 
-- #7 bytes <= 025, either 00x or 021x with x<=5 except a handful of times
 
-- #8 almost always 000, except 10 and 7 files respectively
 
-- #9 either 000, 001, 002, 003, 004, 005, 017, 019, 021. 025 with the lower bytes much more common
 
• the byte of CTRL always seems to be <bh:7f>, the last 26 bytes only being somewhat similiar
 
• in general, CTRL displays a high ration of <bh:00>
 
• CTRL contains no UTF8 chars
 
• the main part of CTRL, apart from the man 0's, contains only ASCII chars, most of which are LATIN characters and punctuation, with a few special chars such as <bh:f4>, <bh:dc> (Ü)
 
 
 
• the main part of DIAG is in the following format, after the 3-byte header comes:
 
[SEPARATOR]  [UTF8-sequence][SEPARATOR][UTF8-sequence] ... [UTF8-sequence][SEPARATOR]
 
• as the text is Japanese, [UTF8-sequence] is usually a multiple of 3-byte blocks, each block representing a multi-byte for one Japanese character; it terminates on a zero-byte
 
• the main text may contain a ※削除※ line, [LEADING] is then <bh:ff>
 
• [SEPARATOR] always consists of 36 bytes, each byte smaller than <bd:192>,  with the only exception it may also contain <bh:ff>. Not counting the <bh:00> byte UTF8 terminating byte.
 
• [SEPARATOR]: most bytes are constant, except the following meaningful bytes
 
• the 25th byte: it is a [LEADING] number, counting the dialogue lines
 
• a [LEADING] byte <bh:ff> this line is outside the "normal" dialogue flow, ie a system message ("You got item..") or "Party member xyz joined." or "……。"  or "…!?" &c.
 
• the 13th byte: this indicates the [SPEAKER]. [SPEAKER] is <bh:ff> when there is no speaker
 
• the first byte indicates the [MODE]
 
<bh:00> - talk with speech bubbles at character's 3D models
 
<bh:01> - talk with 2D character portraits
 
<bh:02> - item get
 
TO SUMMARIZE
 
• dialogue in EVENT-MESSAGE file: [3 byte header][36-byte separator][UTF8 byte sequence, terminating on <bh:00>], repeat
 
• 13th byte [SEPARATOR] is speaker, 26th [SEPARATOR] marks "normal" spoken text</pre>
 
 
 
And here is an improved lua script I wrote that looks for valid UTF8 sequences in a file, works much better and doesn't need specific information on separators &c.:
 
<pre>--PARSE BYTES FOR VALID UTF8 string terminated sequences (except ASCII, ie the first bit non-zero)
 
--specify how what's between UTF8 should be interpreted, don't forget the newline!
 
function process_separator(sep)
 
-- return "" --just use this if you only the UTF8 data
 
if #sep>25 then
 
  return "(" .. sep[1] .. "・" .. sep[13] .. "・" .. sep[25] .. ")"
 
else
 
  local tag = ""
 
  local to_number = {A=10, B=11, C=12, D=13, E=14, F=15, a=10, b=11, c=12, d=13, e=14, f=15}
 
  to_number["0"] = 0
 
  to_number["1"] = 1
 
  to_number["2"] = 2
 
  to_number["3"] = 3
 
  to_number["4"] = 4
 
  to_number["5"] = 5
 
  to_number["6"] = 6
 
  to_number["7"] = 7
 
  to_number["8"] = 8
 
  to_number["9"] = 9
 
  for _,v in ipairs(sep) do
 
  local h1=to_number[string.sub(v,1,1)]
 
  local h2=to_number[string.sub(v,2,2)]
 
  if h2 then
 
    tag = tag .. string.char(16*h1+h2)
 
  else
 
    tag = tag .. string.char(h1)
 
  end
 
  end
 
  return tag
 
end
 
end
 
 
 
--true is interpreted as 1, nil or false as 0
 
function dec_to_8bit(dec,byte) --byte must point to an initialized table of wrong (false or nil) values
 
local exp = 128
 
for i=1,8 do
 
  if dec >= exp then
 
  byte[i] = true
 
  dec = dec-exp
 
  end
 
  exp = exp*.5
 
end
 
end
 
 
 
function get_utf8(filename, outname)
 
local infile = io.open(filename,"rb")
 
if infile then
 
  print("Searching for valid UTF8 in file: " .. filename .. "...")
 
  out = io.open(outname,"a+") --change "w+" to "a+" to append to end of file, not deleting previous data
 
  out:write("#FILE: " .. filename .. "\n")
 
  local occ    = 0 --just count how many valid chars we found
 
  local cur_pos
 
  local len    = 0
 
  local len2
 
  local utf8  = ""
 
  local betw  = {} --what's between the utf8 sequences
 
  local betw2
 
  local insert_line_break = false
 
  local insert_sep        = false
 
  local file_len = infile:seek("end")
 
  infile:seek("set")
 
  repeat
 
  local dec  = string.byte(infile:read(1))
 
  cur_pos = infile:seek("cur")
 
  local byte = {}
 
  dec_to_8bit(dec,byte)
 
  if len >= 1 then
 
    if not byte[1] or byte[2] then
 
    --UTF8 multibyte chars MUST start with 10 except the first byte!
 
    utf8 = ""
 
    insert_sep = true
 
    table.insert(betw,string.format("%x",betw2))
 
    --return to where we wrongly assumed UTF8 started...
 
    cur_pos = cur_pos+len2-len-1
 
    infile:seek("set",cur_pos)
 
    len  = 0
 
    else
 
    --valid utf8 found, dumping...
 
    utf8 = utf8 .. string.char(dec)
 
    len = len - 1
 
    occ = occ + 1
 
    if len==1 then
 
      --utf8 sequence end
 
      len = 0
 
      if insert_sep and #betw>0 then
 
      out:write(process_separator(betw) .. utf8)
 
      betw = {}
 
      insert_sep = false
 
      else
 
      out:write(utf8)
 
      end
 
      insert_line_break = true
 
      utf8 = ""
 
    end
 
    end
 
  else
 
    if byte[1] and byte[2] then --we are not interested in ASCII chars... otherwise allow b2=="0"
 
    -- now determine byte length of glyph
 
    len = 2
 
    repeat
 
      len = len+1
 
    until not byte[len]
 
    len = len-1
 
    if len > 6 then
 
      --UTF8 only allows for 6byte chars at most
 
      table.insert(betw,string.format("%x",dec))
 
      len = 0
 
      insert_sep = true
 
    else
 
      utf8 = utf8 .. string.char(dec)
 
      len2 = len
 
      betw2= dec
 
    end
 
    else
 
    if (dec == 0) and insert_line_break then
 
      --zero terminated :)
 
      out:write("\n")
 
    else
 
      table.insert(betw,string.format("%x",dec))
 
    end
 
    insert_sep = true
 
    end
 
    insert_line_break = false
 
  end
 
  until cur_pos >= file_len
 
  out:write("\n")
 
  infile:close()
 
  out:close()
 
  print("Found " .. occ .. " valid UTF8 chars, except ASCII.\nWritten to "  .. outname .. ".\nDone.")
 
end
 
return occ
 
end
 
 
 
get_utf8(arg[1],arg[2])</pre>
 
 
 
Oh, and btw, the AT3 script contains 1026643 characters : )
 

Latest revision as of 19:43, 16 April 2011

Lots and lots of text and lots of obscure kanji!

Translations[edit]

(or should I translate to German??)

Tips[edit]

I used the Japanese IME "canna" under ubuntu, compiled from source, and changed kana-kanji dictionaries, so that kanji+furigana is written upon entering and converting Japanese te

Script[edit]

Yeah, the script in Japanese with a complete English translation [1][2](3.2MB) :) Oh, and btw, the AT3 script contains 1026643 characters : )