Editing Ar Tonelico III

From Learning Languages Through Video Games
Jump to navigationJump to search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

Latest revision Your text
Line 8: Line 8:
 
==Script==
 
==Script==
 
Yeah, the script in Japanese with a complete English translation [http://www.2shared.com/file/6v-zJ7J4/joined.html][https://rapidshare.com/files/457743039/joined](3.2MB) :)  
 
Yeah, the script in Japanese with a complete English translation [http://www.2shared.com/file/6v-zJ7J4/joined.html][https://rapidshare.com/files/457743039/joined](3.2MB) :)  
 +
 +
For anyone who's interested, what I found out...
 +
<pre>AT3 ebd script files
 +
 +
• consists of EVENT_MESSAGE_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called DIAG from now on) and EVENT_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called CTRL from now on)
 +
• each DIAG corresponds do a CTRL file with the same NUMBER's
 +
•  DIAG to contains the main dialogue lines, while CTRL is probably system-related
 +
• DIAG files are also usually only a few hundred bytes long
 +
• DIAG has a header of 4 bytes, then comes the main part
 +
 +
• the first 26 bytes of CTRL are as follows (decimal):
 +
[#1] 000 000 000 000 000 000 000 005 000 000 000 110 097 109 101 000 005 000 000 000 144 224 150 190 000 [#2] [#3] 000 000 [#4] 000 [#5] [#6] [#7] [#8] [#8] 000 [#9] 000
 +
whereas #[n] are
 +
-- #1takes many different values, 001 is very frequent (~50%)
 +
-- #2 takes many different values
 +
-- #3 mostly 000, a few times 001, 002,  4 times 003, 3 times 004
 +
-- #4 mostly small bytes <=021, 021 and 00x frequently occur in adjacent files together, takes 044 in two instances
 +
-- #5 either 000, 016, 049, or 064
 +
-- #6 always either 113, 116, 117, 119, or 127
 +
-- #7 bytes <= 025, either 00x or 021x with x<=5 except a handful of times
 +
-- #8 almost always 000, except 10 and 7 files respectively
 +
-- #9 either 000, 001, 002, 003, 004, 005, 017, 019, 021. 025 with the lower bytes much more common
 +
• the byte of CTRL always seems to be <bh:7f>, the last 26 bytes only being somewhat similiar
 +
• in general, CTRL displays a high ration of <bh:00>
 +
• CTRL contains no UTF8 chars
 +
• the main part of CTRL, apart from the man 0's, contains only ASCII chars, most of which are LATIN characters and punctuation, with a few special chars such as <bh:f4>, <bh:dc> (Ü)
 +
 +
• the main part of DIAG is in the following format, after the 3-byte header comes:
 +
[SEPARATOR]  [UTF8-sequence][SEPARATOR][UTF8-sequence] ... [UTF8-sequence][SEPARATOR]
 +
• as the text is Japanese, [UTF8-sequence] is usually a multiple of 3-byte blocks, each block representing a multi-byte for one Japanese character; it terminates on a zero-byte
 +
• the main text may contain a ※削除※ line, [LEADING] is then <bh:ff>
 +
• [SEPARATOR] always consists of 36 bytes, each byte smaller than <bd:192>,  with the only exception it may also contain <bh:ff>. Not counting the <bh:00> byte UTF8 terminating byte.
 +
• [SEPARATOR]: most bytes are constant, except the following meaningful bytes
 +
• the 25th byte: it is a [LEADING] number, counting the dialogue lines
 +
• a [LEADING] byte <bh:ff> this line is outside the "normal" dialogue flow, ie a system message ("You got item..") or "Party member xyz joined." or "……。"  or "…!?" &c.
 +
• the 13th byte: this indicates the [SPEAKER]. [SPEAKER] is <bh:ff> when there is no speaker
 +
• the first byte indicates the [MODE]
 +
<bh:00> - talk with speech bubbles at character's 3D models
 +
<bh:01> - talk with 2D character portraits
 +
<bh:02> - item get
 +
TO SUMMARIZE
 +
• dialogue in EVENT-MESSAGE file: [3 byte header][36-byte separator][UTF8 byte sequence, terminating on <bh:00>], repeat
 +
• 13th byte [SEPARATOR] is speaker, 26th [SEPARATOR] marks "normal" spoken text</pre>
 +
 +
And here is an improved lua script I wrote that looks for valid UTF8 sequences in a file, works much better and doesn't need specific information on separators &c.:
 +
<pre>--PARSE BYTES FOR VALID UTF8 string terminated sequences (except ASCII, ie the first bit non-zero)
 +
--specify how what's between UTF8 should be interpreted, don't forget the newline!
 +
function process_separator(sep)
 +
-- return "" --just use this if you only the UTF8 data
 +
if #sep>25 then
 +
  return "(" .. sep[1] .. "・" .. sep[13] .. "・" .. sep[25] .. ")"
 +
else
 +
  local tag = ""
 +
  local to_number = {A=10, B=11, C=12, D=13, E=14, F=15, a=10, b=11, c=12, d=13, e=14, f=15}
 +
  to_number["0"] = 0
 +
  to_number["1"] = 1
 +
  to_number["2"] = 2
 +
  to_number["3"] = 3
 +
  to_number["4"] = 4
 +
  to_number["5"] = 5
 +
  to_number["6"] = 6
 +
  to_number["7"] = 7
 +
  to_number["8"] = 8
 +
  to_number["9"] = 9
 +
  for _,v in ipairs(sep) do
 +
  local h1=to_number[string.sub(v,1,1)]
 +
  local h2=to_number[string.sub(v,2,2)]
 +
  if h2 then
 +
    tag = tag .. string.char(16*h1+h2)
 +
  else
 +
    tag = tag .. string.char(h1)
 +
  end
 +
  end
 +
  return tag
 +
end
 +
end
 +
 +
--true is interpreted as 1, nil or false as 0
 +
function dec_to_8bit(dec,byte) --byte must point to an initialized table of wrong (false or nil) values
 +
local exp = 128
 +
for i=1,8 do
 +
  if dec >= exp then
 +
  byte[i] = true
 +
  dec = dec-exp
 +
  end
 +
  exp = exp*.5
 +
end
 +
end
 +
 +
function get_utf8(filename, outname)
 +
local infile = io.open(filename,"rb")
 +
if infile then
 +
  print("Searching for valid UTF8 in file: " .. filename .. "...")
 +
  out = io.open(outname,"a+") --change "w+" to "a+" to append to end of file, not deleting previous data
 +
  out:write("#FILE: " .. filename .. "\n")
 +
  local occ    = 0 --just count how many valid chars we found
 +
  local cur_pos
 +
  local len    = 0
 +
  local len2
 +
  local utf8  = ""
 +
  local betw  = {} --what's between the utf8 sequences
 +
  local betw2
 +
  local insert_line_break = false
 +
  local insert_sep        = false
 +
  local file_len = infile:seek("end")
 +
  infile:seek("set")
 +
  repeat
 +
  local dec  = string.byte(infile:read(1))
 +
  cur_pos = infile:seek("cur")
 +
  local byte = {}
 +
  dec_to_8bit(dec,byte)
 +
  if len >= 1 then
 +
    if not byte[1] or byte[2] then
 +
    --UTF8 multibyte chars MUST start with 10 except the first byte!
 +
    utf8 = ""
 +
    insert_sep = true
 +
    table.insert(betw,string.format("%x",betw2))
 +
    --return to where we wrongly assumed UTF8 started...
 +
    cur_pos = cur_pos+len2-len-1
 +
    infile:seek("set",cur_pos)
 +
    len  = 0
 +
    else
 +
    --valid utf8 found, dumping...
 +
    utf8 = utf8 .. string.char(dec)
 +
    len = len - 1
 +
    occ = occ + 1
 +
    if len==1 then
 +
      --utf8 sequence end
 +
      len = 0
 +
      if insert_sep and #betw>0 then
 +
      out:write(process_separator(betw) .. utf8)
 +
      betw = {}
 +
      insert_sep = false
 +
      else
 +
      out:write(utf8)
 +
      end
 +
      insert_line_break = true
 +
      utf8 = ""
 +
    end
 +
    end
 +
  else
 +
    if byte[1] and byte[2] then --we are not interested in ASCII chars... otherwise allow b2=="0"
 +
    -- now determine byte length of glyph
 +
    len = 2
 +
    repeat
 +
      len = len+1
 +
    until not byte[len]
 +
    len = len-1
 +
    if len > 6 then
 +
      --UTF8 only allows for 6byte chars at most
 +
      table.insert(betw,string.format("%x",dec))
 +
      len = 0
 +
      insert_sep = true
 +
    else
 +
      utf8 = utf8 .. string.char(dec)
 +
      len2 = len
 +
      betw2= dec
 +
    end
 +
    else
 +
    if (dec == 0) and insert_line_break then
 +
      --zero terminated :)
 +
      out:write("\n")
 +
    else
 +
      table.insert(betw,string.format("%x",dec))
 +
    end
 +
    insert_sep = true
 +
    end
 +
    insert_line_break = false
 +
  end
 +
  until cur_pos >= file_len
 +
  out:write("\n")
 +
  infile:close()
 +
  out:close()
 +
  print("Found " .. occ .. " valid UTF8 chars, except ASCII.\nWritten to "  .. outname .. ".\nDone.")
 +
end
 +
return occ
 +
end
 +
 +
get_utf8(arg[1],arg[2])</pre>
 +
 
Oh, and btw, the AT3 script contains 1026643 characters : )
 
Oh, and btw, the AT3 script contains 1026643 characters : )
  
 
[[Category:Game pages]]
 
[[Category:Game pages]]
 
[[Category:PS3 games]]
 
[[Category:PS3 games]]

Please note that all contributions to LLTVG are considered to be released under the wiki's copyright terms (see LLTVG:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that, unless you are providing the text from a video game, you wrote this yourself, or copied it from a public domain or similar free resource.

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)

(You don't have JavaScript enabled. If you did, you'd be able to see the handy-dandy fixers we have for reformatting text.)

The below button runs the selected formatting fixers for you.

To run the fixer, first select the text in the edit box that you wish to reformat, then click the button. The handaku/dakuten fixer should be fine to run on an entire page, but for the rest, be careful that only Japanese text is selected.

The fixer does not work in IE (as of IE 8), but it should work in other popular browsers. However, it works better in Chrome than in Firefox, because Firefox scrolls back to the top after running the fixer, and Chrome doesn't. Hence, if you're running the fixer on many small bits of text, you might prefer to use Chrome.