1
Translation project / Re: [1.34] Translation Release Thread
« on: June 24, 2015, 10:47:48 AM »
What exactly are you doing to translate the .exe? Hex-editing the file itself for every attempted translation and checking if 'things still work'?
Unfortunately there's something called the Halting problem that makes hand-editing x86 machine code a rather arduous process.
I know a bit about this I can share so that maybe more can help with translating;
First thing you'll have to do is identify the strings used. As a developer, I could be evil and concatenate my strings by induvidual character, or use a RNG to select from multiple variables containing the same text, or some other weird formula, but it's mostly logical to make the units as big as possible, so you have as few of them to work with as you can to keep things simple, and reuse variables. So basically the "it was super effective" message is likely to be a single variable somewhere in the program's memory, and thus you'll find it somewhere in the program. Then the task is to find the thing.
One thing you can do to help your hacker is see if you can create a list of possible byte sequences for every string to be translated, and a list of replacements. (Due to the way Japanese language works there's multiple possible ways of making certain characters through radicals), to make it easier to find these strings. You can find them with any hex editor by searching for the codes. MSFT has information on byte sequences for SHIFT_JIS encoding (codepage 932), and Wiki can be used to find UTF-8 info. E.g. the yen symbol is encoded as 0x818F in JIS, 0xC2A5 in UTF-8, and 0x00A5 in UTF-16. The executable may mix various encodings, we don't know which encoding any particular string is in in memory. You can of course ask the developer for information about the details (but there's a chance they don't know either, as libraries and compilers of programming languages can change this, and many developers don't need or want to know the details about encoding and have libraries 'deal with it').
Now finding the strings is one thing, but changing them is another. When searching short strings, you may find multiple matches. Then trial-and-error is needed to identify which match(es) are actually the strings to be translated, and the matches that are actually other parts of the program. The game is likely to crash when making a mistake, or at least won't show you the translated string.
Machine code can do anything with variables, even re-use it as machine code, so whatever you try to do is not guaranteed to work. But it's highly likely that the string is just read and passed for rendering, which means you can replace it with any other character sequence, that is the same length. Thus you can only use 'less' characters than the JP version (by using control characters to fill out a string), never more, at least not without a substantial increase in difficulty (reverse engineering). Depending on the encoding used and the character, you can fit 1, 2, 3, or 4 ASCII for each japanese character in your translation on a char-by-char basis. In the end only the total length of the variable matters.
Actually asking the developer (to export these texts outside of the codebase and into the resource files with a patch) might even be the more convenient option here.
Unfortunately there's something called the Halting problem that makes hand-editing x86 machine code a rather arduous process.
I know a bit about this I can share so that maybe more can help with translating;
First thing you'll have to do is identify the strings used. As a developer, I could be evil and concatenate my strings by induvidual character, or use a RNG to select from multiple variables containing the same text, or some other weird formula, but it's mostly logical to make the units as big as possible, so you have as few of them to work with as you can to keep things simple, and reuse variables. So basically the "it was super effective" message is likely to be a single variable somewhere in the program's memory, and thus you'll find it somewhere in the program. Then the task is to find the thing.
One thing you can do to help your hacker is see if you can create a list of possible byte sequences for every string to be translated, and a list of replacements. (Due to the way Japanese language works there's multiple possible ways of making certain characters through radicals), to make it easier to find these strings. You can find them with any hex editor by searching for the codes. MSFT has information on byte sequences for SHIFT_JIS encoding (codepage 932), and Wiki can be used to find UTF-8 info. E.g. the yen symbol is encoded as 0x818F in JIS, 0xC2A5 in UTF-8, and 0x00A5 in UTF-16. The executable may mix various encodings, we don't know which encoding any particular string is in in memory. You can of course ask the developer for information about the details (but there's a chance they don't know either, as libraries and compilers of programming languages can change this, and many developers don't need or want to know the details about encoding and have libraries 'deal with it').
Now finding the strings is one thing, but changing them is another. When searching short strings, you may find multiple matches. Then trial-and-error is needed to identify which match(es) are actually the strings to be translated, and the matches that are actually other parts of the program. The game is likely to crash when making a mistake, or at least won't show you the translated string.
Machine code can do anything with variables, even re-use it as machine code, so whatever you try to do is not guaranteed to work. But it's highly likely that the string is just read and passed for rendering, which means you can replace it with any other character sequence, that is the same length. Thus you can only use 'less' characters than the JP version (by using control characters to fill out a string), never more, at least not without a substantial increase in difficulty (reverse engineering). Depending on the encoding used and the character, you can fit 1, 2, 3, or 4 ASCII for each japanese character in your translation on a char-by-char basis. In the end only the total length of the variable matters.
Actually asking the developer (to export these texts outside of the codebase and into the resource files with a patch) might even be the more convenient option here.