|
|
Line 8: |
Line 8: |
| ==Script== | | ==Script== |
| Yeah, the script in Japanese with a complete English translation [http://www.2shared.com/file/6v-zJ7J4/joined.html][https://rapidshare.com/files/457743039/joined](3.2MB) :) | | Yeah, the script in Japanese with a complete English translation [http://www.2shared.com/file/6v-zJ7J4/joined.html][https://rapidshare.com/files/457743039/joined](3.2MB) :) |
− |
| |
− | For anyone who's interested, what I found out...
| |
− | <pre>AT3 ebd script files
| |
− |
| |
− | • consists of EVENT_MESSAGE_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called DIAG from now on) and EVENT_SW[2digit-NUMBER]_[3digit-NUMBER].ebm (called CTRL from now on)
| |
− | • each DIAG corresponds do a CTRL file with the same NUMBER's
| |
− | • DIAG to contains the main dialogue lines, while CTRL is probably system-related
| |
− | • DIAG files are also usually only a few hundred bytes long
| |
− | • DIAG has a header of 4 bytes, then comes the main part
| |
− |
| |
− | • the first 26 bytes of CTRL are as follows (decimal):
| |
− | [#1] 000 000 000 000 000 000 000 005 000 000 000 110 097 109 101 000 005 000 000 000 144 224 150 190 000 [#2] [#3] 000 000 [#4] 000 [#5] [#6] [#7] [#8] [#8] 000 [#9] 000
| |
− | whereas #[n] are
| |
− | -- #1takes many different values, 001 is very frequent (~50%)
| |
− | -- #2 takes many different values
| |
− | -- #3 mostly 000, a few times 001, 002, 4 times 003, 3 times 004
| |
− | -- #4 mostly small bytes <=021, 021 and 00x frequently occur in adjacent files together, takes 044 in two instances
| |
− | -- #5 either 000, 016, 049, or 064
| |
− | -- #6 always either 113, 116, 117, 119, or 127
| |
− | -- #7 bytes <= 025, either 00x or 021x with x<=5 except a handful of times
| |
− | -- #8 almost always 000, except 10 and 7 files respectively
| |
− | -- #9 either 000, 001, 002, 003, 004, 005, 017, 019, 021. 025 with the lower bytes much more common
| |
− | • the byte of CTRL always seems to be <bh:7f>, the last 26 bytes only being somewhat similiar
| |
− | • in general, CTRL displays a high ration of <bh:00>
| |
− | • CTRL contains no UTF8 chars
| |
− | • the main part of CTRL, apart from the man 0's, contains only ASCII chars, most of which are LATIN characters and punctuation, with a few special chars such as <bh:f4>, <bh:dc> (Ü)
| |
− |
| |
− | • the main part of DIAG is in the following format, after the 3-byte header comes:
| |
− | [SEPARATOR] [UTF8-sequence][SEPARATOR][UTF8-sequence] ... [UTF8-sequence][SEPARATOR]
| |
− | • as the text is Japanese, [UTF8-sequence] is usually a multiple of 3-byte blocks, each block representing a multi-byte for one Japanese character; it terminates on a zero-byte
| |
− | • the main text may contain a ※削除※ line, [LEADING] is then <bh:ff>
| |
− | • [SEPARATOR] always consists of 36 bytes, each byte smaller than <bd:192>, with the only exception it may also contain <bh:ff>. Not counting the <bh:00> byte UTF8 terminating byte.
| |
− | • [SEPARATOR]: most bytes are constant, except the following meaningful bytes
| |
− | • the 25th byte: it is a [LEADING] number, counting the dialogue lines
| |
− | • a [LEADING] byte <bh:ff> this line is outside the "normal" dialogue flow, ie a system message ("You got item..") or "Party member xyz joined." or "……。" or "…!?" &c.
| |
− | • the 13th byte: this indicates the [SPEAKER]. [SPEAKER] is <bh:ff> when there is no speaker
| |
− | • the first byte indicates the [MODE]
| |
− | <bh:00> - talk with speech bubbles at character's 3D models
| |
− | <bh:01> - talk with 2D character portraits
| |
− | <bh:02> - item get
| |
− | TO SUMMARIZE
| |
− | • dialogue in EVENT-MESSAGE file: [3 byte header][36-byte separator][UTF8 byte sequence, terminating on <bh:00>], repeat
| |
− | • 13th byte [SEPARATOR] is speaker, 26th [SEPARATOR] marks "normal" spoken text</pre>
| |
− |
| |
− | And here is an improved lua script I wrote that looks for valid UTF8 sequences in a file, works much better and doesn't need specific information on separators &c.:
| |
− | <pre>--PARSE BYTES FOR VALID UTF8 string terminated sequences (except ASCII, ie the first bit non-zero)
| |
− | --specify how what's between UTF8 should be interpreted, don't forget the newline!
| |
− | function process_separator(sep)
| |
− | -- return "" --just use this if you only the UTF8 data
| |
− | if #sep>25 then
| |
− | return "(" .. sep[1] .. "・" .. sep[13] .. "・" .. sep[25] .. ")"
| |
− | else
| |
− | local tag = ""
| |
− | local to_number = {A=10, B=11, C=12, D=13, E=14, F=15, a=10, b=11, c=12, d=13, e=14, f=15}
| |
− | to_number["0"] = 0
| |
− | to_number["1"] = 1
| |
− | to_number["2"] = 2
| |
− | to_number["3"] = 3
| |
− | to_number["4"] = 4
| |
− | to_number["5"] = 5
| |
− | to_number["6"] = 6
| |
− | to_number["7"] = 7
| |
− | to_number["8"] = 8
| |
− | to_number["9"] = 9
| |
− | for _,v in ipairs(sep) do
| |
− | local h1=to_number[string.sub(v,1,1)]
| |
− | local h2=to_number[string.sub(v,2,2)]
| |
− | if h2 then
| |
− | tag = tag .. string.char(16*h1+h2)
| |
− | else
| |
− | tag = tag .. string.char(h1)
| |
− | end
| |
− | end
| |
− | return tag
| |
− | end
| |
− | end
| |
− |
| |
− | --true is interpreted as 1, nil or false as 0
| |
− | function dec_to_8bit(dec,byte) --byte must point to an initialized table of wrong (false or nil) values
| |
− | local exp = 128
| |
− | for i=1,8 do
| |
− | if dec >= exp then
| |
− | byte[i] = true
| |
− | dec = dec-exp
| |
− | end
| |
− | exp = exp*.5
| |
− | end
| |
− | end
| |
− |
| |
− | function get_utf8(filename, outname)
| |
− | local infile = io.open(filename,"rb")
| |
− | if infile then
| |
− | print("Searching for valid UTF8 in file: " .. filename .. "...")
| |
− | out = io.open(outname,"a+") --change "w+" to "a+" to append to end of file, not deleting previous data
| |
− | out:write("#FILE: " .. filename .. "\n")
| |
− | local occ = 0 --just count how many valid chars we found
| |
− | local cur_pos
| |
− | local len = 0
| |
− | local len2
| |
− | local utf8 = ""
| |
− | local betw = {} --what's between the utf8 sequences
| |
− | local betw2
| |
− | local insert_line_break = false
| |
− | local insert_sep = false
| |
− | local file_len = infile:seek("end")
| |
− | infile:seek("set")
| |
− | repeat
| |
− | local dec = string.byte(infile:read(1))
| |
− | cur_pos = infile:seek("cur")
| |
− | local byte = {}
| |
− | dec_to_8bit(dec,byte)
| |
− | if len >= 1 then
| |
− | if not byte[1] or byte[2] then
| |
− | --UTF8 multibyte chars MUST start with 10 except the first byte!
| |
− | utf8 = ""
| |
− | insert_sep = true
| |
− | table.insert(betw,string.format("%x",betw2))
| |
− | --return to where we wrongly assumed UTF8 started...
| |
− | cur_pos = cur_pos+len2-len-1
| |
− | infile:seek("set",cur_pos)
| |
− | len = 0
| |
− | else
| |
− | --valid utf8 found, dumping...
| |
− | utf8 = utf8 .. string.char(dec)
| |
− | len = len - 1
| |
− | occ = occ + 1
| |
− | if len==1 then
| |
− | --utf8 sequence end
| |
− | len = 0
| |
− | if insert_sep and #betw>0 then
| |
− | out:write(process_separator(betw) .. utf8)
| |
− | betw = {}
| |
− | insert_sep = false
| |
− | else
| |
− | out:write(utf8)
| |
− | end
| |
− | insert_line_break = true
| |
− | utf8 = ""
| |
− | end
| |
− | end
| |
− | else
| |
− | if byte[1] and byte[2] then --we are not interested in ASCII chars... otherwise allow b2=="0"
| |
− | -- now determine byte length of glyph
| |
− | len = 2
| |
− | repeat
| |
− | len = len+1
| |
− | until not byte[len]
| |
− | len = len-1
| |
− | if len > 6 then
| |
− | --UTF8 only allows for 6byte chars at most
| |
− | table.insert(betw,string.format("%x",dec))
| |
− | len = 0
| |
− | insert_sep = true
| |
− | else
| |
− | utf8 = utf8 .. string.char(dec)
| |
− | len2 = len
| |
− | betw2= dec
| |
− | end
| |
− | else
| |
− | if (dec == 0) and insert_line_break then
| |
− | --zero terminated :)
| |
− | out:write("\n")
| |
− | else
| |
− | table.insert(betw,string.format("%x",dec))
| |
− | end
| |
− | insert_sep = true
| |
− | end
| |
− | insert_line_break = false
| |
− | end
| |
− | until cur_pos >= file_len
| |
− | out:write("\n")
| |
− | infile:close()
| |
− | out:close()
| |
− | print("Found " .. occ .. " valid UTF8 chars, except ASCII.\nWritten to " .. outname .. ".\nDone.")
| |
− | end
| |
− | return occ
| |
− | end
| |
− |
| |
− | get_utf8(arg[1],arg[2])</pre>
| |
− |
| |
| Oh, and btw, the AT3 script contains 1026643 characters : ) | | Oh, and btw, the AT3 script contains 1026643 characters : ) |
| | | |
| [[Category:Game pages]] | | [[Category:Game pages]] |
| [[Category:PS3 games]] | | [[Category:PS3 games]] |