Thank you to the guys at HEGE supporting Badcaps [ HEGE ] [ HEGE DEX Chart ]

Announcement

Collapse
No announcement yet.

Converting hex to Chinese.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #21
    Re: Converting hex to Chinese.

    Originally posted by Curious.George View Post
    I'm just making general statements about how one internationalizes a program....
    I understand what you where saying now. But even that wouldn't be a problem, for the most part, with the setup.exe program at least.. For example, let's say they have a pop-up window that displays garbled text. They call the Windows MessageBoxA() API to display messages. I thought you where talking about the physical code, not how it's physically displayed on the screen. The APis for text boxes change their height and width automatically, I do believe. But worse case, I just call another API that allows me to set the height / width. This is where a code-cave would come in handy.
    Originally posted by Curious.George View Post
    Your example isn't pedantic, enough...
    My example was meant to show one of the problems with modifying the actual executable. By adding bytes, we change everything. jump statements all need to be changed, ect. This is why I think a code-cave would have been the best approach.

    Originally posted by Curious.George View Post
    When you internationalize a program, you have to consider the requirements of each representation..
    The setup.exe appears to be the only problem. The idea was, once I understood the code, I just recreate it, instead of using wise, I use Inno, which handles all the dirty work.

    Originally posted by Curious.George View Post
    That's hard to know from external observation..
    I don't know how you could know from external observation what's what. I mean, you have to actually look at the code and see what it does. That's why I use a disassembler / debugger. It converts the machine code back into assembly. And although it's not as pretty as it'd be as if the program was written directly in assembly, there's tools and plugins that help us along. I was using x64dbg. I used to use Ollydbg, but x64dbg has come along way, can debug 64-bit code and 32-bit code, and has an extensive plugin library that anyone can contribute too.

    Some programs (like Polderbits) use a lot of "tricks" to make modifying the program very hard. There's ways to detect a debugger. Polderbits has codes like that. There's plugins to hide the debugger, but the Polderbits developer was smart and used old school tricks that you don't see much anymore, but also used those functions in his actual code. So Polderbits is much, much harder to modify than this setup.exe program.

    Originally posted by Curious.George View Post
    For example, many programs have easter eggs buried within...
    That's the whole point of using the disassembler / debugger, and why this has taken me so long. Because I had to "step" through the code and understand it. They stripped the function symbols, so I don't see calls to something like setup_crc32table. Instead, I see calls to sub_435404. I step into sub_435404 (which is a function created by the programmers, or in this case, by the Wise Setup Studio program). And then I watch the assembly code and figure out what it does. I watch the registers, I watch what functions are being called, I write it down in gedit (notepad for Linux), I keep track, and when I finally understand the function and what it does, I label it, so instead of it saying sub_435404, it nows says setup_crc32table, just as an example.

    If there's any easter eggs, I'd see them before over writting code. I'm not just gonna randomly pick a piece of code and over-write it. I would make sure it's actually unneeded. There's multiple ways to do this with x64dbg.

    When I was trying to analyze the first bytes the program read from the itself (I suspected some sort of header file), I manipulated the data. It would read 4 bytes, for example, and then example the last byte to see what it was. If it wasn't a 0, it'd jump to a function. That function would then read some stuff from the program and then call some functions that create a unique serial type number. This number is then compared to some code in the program, and if it's not the same, it displays a message on the screen saying something along the lines of The demo version of this program can only be used to create setup files that are ran on the computer they where created on.

    That code has no easter eggs. For the code that generates the serial number, I can search through the entire program, and all the DLLs it's loaded, including the Windows ones, everything, and actually look for references to that sub-function, and I only see that one, so we know it's only used if the Wise Setup Studio program used to create the exe was a demo version, not a registered version.

    Originally posted by Curious.George View Post
    Try replacing strings with numeric strings: "S01", "S02", "S03", ...
    We can actually just search for and find all references to that string now-a-days. A lot has changed since the days of WinICE or whatever it was called. Things become a lot easier. I use a plugin which attempts to tell me what compiler was used. In this case, it was the Microsoft Visual C++ Compiler, which uses the CDECL calling convention. I didn't need to determine the compiler used to determine the CDECL calling convention was being used. I could see it in the code. Before an Windows API was called, stuff was being pushed onto the stack. According to Wikipedia:
    Code:
    cdecl
    
    The cdecl (which stands for C declaration) is a calling convention that originates from the C programming language and is used by many C compilers for the x86 architecture. In cdecl, subroutine arguments are passed on the stack. Integer values and memory addresses are returned in the EAX register, floating point values in the ST0 x87 register. Registers EAX, ECX, and EDX are caller-saved, and the rest are callee-saved. The x87 floating point registers ST0 to ST7 must be empty (popped or freed) when calling a new function, and ST1 to ST7 must be empty on exiting a function. ST0 must also be empty when not used for returning a value.
    
    In the context of the C programming language, function arguments are pushed on the stack in the reverse order. In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary when calling a function (previous versions only required a 4-byte alignment.)
    I have some nice plugins that know about most of the Windows API functions and can show me what parameters are being passed in the comments section to the API. I can also right click on the API call and have it search google, MSDN, etc, for that API function and then I can read how it works.

    For example, when I see a call to the user32.MessageBoxA windows API, one of my plugins will show look at the push statements and show which parameters are being passed to the function and tell me what they are, so I don't have to remember that MessageBox API takes four parameters:
    Code:
    int WINAPI MessageBox(
     _In_opt_ HWND  hWnd,
     _In_opt_ LPCTSTR lpText,
     _In_opt_ LPCTSTR lpCaption,
     _In_   UINT  uType
    );
    Sometimes though, it's nice to know exactly what lpText and lpCaption or uType is. So I have another plugin which actually searches what I have it configured to search (MSDN right now) for those functions, and it'll pull up the MSDN function call library, with the function in question, and I can read and study it, see how it works.

    Originally posted by Curious.George View Post
    Keep in mind, when replacing multibyte characters, that you are replacing CHARACTERS, not BYTES. So, if you replace a 2 byte character, make sure you replace it with 2 ASCII characters. If, instead, you replace it with just one, then the second byte of the 2-byte character will be interpreted as the first byte of a new character. Depending on its actual value, this might be some bizarre/unexpected ASCII character or the first of a bogus 2-byte character (the second byte of which is actually the first byte of some other character)
    No, because I'm using a debugger, I'm replacing bytes that (with the language stuff) represent characters. The code-page determines how those bytes are handled. If I could convert the code page from 1033 to the Chinese one, then the two-byte character representations would have been displayed properly, as Chinese characters, instead of garbage. That was the original goal, then once I accomplished that, I was going to work on converting or writing a new setup.exe using InnoSetup but I just don't have the energy anymore. So I abandoned the project. If the program's code page was set to Chinese, then it would have interpreted the code as a double-byte character set (DBCS), which it didn't. That's why I get garbage. The actual text is a double-byte character set in the program, but the Windows APIs don't know that because code page is set to 1033 (US), and treats each byte as a character. With InnoSetup, I'd have used Unicode for one, which I think is now the standard.

    Originally posted by Curious.George View Post
    Sorry to hear that.
    It sucks, but it is what it is, and I try to keep a positive mind. I always know it can be much worse. Still alive, still got my limbs, and although I cannot remember things one day, other days, I can remember things real good. I couldn't remember code-caves, if I came across the term, I would have been wtf is that! I even asked earlier on in one of the posts in this thread if I could append characters in the program or if that'd throw it all off. Today, and yesterday, to me, it was like of course not! We need to write a code-cave if the words we're trying to convert the exe's Chinese to English, or better yet, just watch what the EXE does, translate the Chinese to English, and recreate the EXE using InnoSetup for personal use.
    -- Law of Expanding Memory: Applications Will Also Expand Until RAM Is Full

    Comment


      #22
      Re: Converting hex to Chinese.

      Originally posted by Per Hansson View Post
      If it picks English obviously your operating system is telling the program that your system language is English.
      Otherwise you'd get Chinese!

      The screenshot I showed above of how to change the Language for non-Unicode programs in Windows 7 is pretty much the same in Windows 10 too.
      They have just split Region & Language into separate entities.
      The path for 7 and 10 is: Control Panel > Region (& Language) > Administrative > Change System Locale
      I've done that then. It's the program that's telling the OS that the language is English. Using a resource editor, I can see where it only has the 1033, nothing else. If it was a multi-language program, wouldn't I see more resources for different languages? Like 1033 and 2052 for each resource? I don't. There's only 1033.

      If you right click on the file description, it shows the name as Chinese, but I don't think there's a code-page for File Descriptor. I think that's handled by the OS. I think it looks for those starting and end characters for the DBCS and if it's found, it handles the conversion automatically or something? I dunno.

      But changing the language like you suggested is something I did and it still showed garbage, because the program specifically sets the code-page to 1033, US. This is the setup.exe program, the Wise installer one, which has long since been retired, and didn't support unicode or anything fancy like that. I used Wise once to create a setup.exe file, back in high school, when I was first learning how to program, but I don't think even the latest last version was multi-language.

      The actually program itself is though, which was created with Microsoft Visual C. The setup.exe program is calling the ANSI functions, which should never be called anymore, unless a really old OS is what the program is designed to run on.

      Even though during setup, the newest version of the program displays gibberish, the older versions display Chinese, and the code page for them is not set to 1033. This is why I think changing the language settings in 7 didn't work. I'll give it another shot later though after I get some rest.
      -- Law of Expanding Memory: Applications Will Also Expand Until RAM Is Full

      Comment


        #23
        Re: Converting hex to Chinese.

        Originally posted by Spork Schivago View Post
        I was curious as to if I could just append my code-cave at the end of the program, or if I would have to actually over-write code that was non-essentially (for example, the sub-routines for the demo mode of the Wise installer program).
        That's hard to know from external observation. Even if a piece of code LOOKS like it isn't referenced, you have no idea if there aren't some set of circumstances that cause it to be referenced. Having replaced it with stuff of your own leaves the program helpless in that event as it was expecting something other than what you'd placed there.

        For example, many programs have easter eggs buried within. The code can run for years and never touch that easter egg reveal code. This could make it a likely candidate to be overlaid. OTOH, if someone deliberately tried to reveal one of those easter eggs, they might be stunned by the program crashing or running off in some unusual direction (because your patch had it doing something "unexpected" for which it was "unprepared")
        I don't know how you could know from external observation what's what. I mean, you have to actually look at the code and see what it does. That's why I use a disassembler / debugger. It converts the machine code back into assembly.
        Yes, but you have to chase down every conditional jump (jump taken path and jump NOT taken) and understand what each path through the code could possibly do.

        There are lots of ways to obfuscate code that easily confound attempts at reverse-engineering. E.g., I could compute some value and add it to a value stored on the stack -- that just happens to be a return address for some active function. At some future time, that invoked function "returns" to someplace that is not "the location after it's original invocation".

        If most of the time, the value that I compute is "0", you may not notice what is happening. Or, even that this value could potentially change thereby dynamically affecting program flow.

        I've no idea how "precious" the program you're hacking might be -- how much effort the original authors would go to in order to protect it from alteration, etc. OTOH, I have seen numerous cases of deliberate counterfeiting efforts (as well as attempted license subversion) that have been confounded by these sorts of obfuscated "tricks".

        Some programs (like Polderbits) use a lot of "tricks" to make modifying the program very hard. There's ways to detect a debugger. Polderbits has codes like that. There's plugins to hide the debugger, but the Polderbits developer was smart and used old school tricks that you don't see much anymore, but also used those functions in his actual code. So Polderbits is much, much harder to modify than this setup.exe program.
        In the early 80's, video (arcade) games were ripe for counterfeiting. A "clone" would appear within weeks of the original game's release. Keep in mind these are DEVICES -- BIG devices -- not just software you can download from a website! So, you went to great lengths to make the counterfeiter's job harder (take longer -- there is usually a small market window for new games and if you can push the counterfeiters out of that window, the market collapses on them).

        One lesson learned is NOT to make your mechanisms a distinct "go-nogo" test; if the patched program refuses to run ("Counterfeit detected. HALT!"),
        then the counterfeiters know to keep working to locate the "test" that is preventing the program from running in its patched form.

        OTOH, if you design the "tests" so they cause sporatic "anomalous behaviors" (i.e., bugs!), the counterfeiter might not notice that behavior. Or, if he does, it's sporatic nature makes it hard to track down (it's not a "hard fault").

        BUT, players will very definitely notice and avoid a game that seems to be cheating/screwing them. So, the guy (owner/operator) who deployed the game is deprived of revenue. This makes him less likely to want to buy another "knock-off" game from People's Video Game Factory #4423.

        Try replacing strings with numeric strings: "S01", "S02", "S03", ...
        We can actually just search for and find all references to that string now-a-days.
        No, I meant replace the strings in the executable with trivial, short identifiers (S01, S02, etc.) so they appear on the screen as such. Then, make a cheat sheet to explain, to you, what each "means".

        You'd said:
        All I wanted was to be able to read what the text said, either using an on-line Chinese to English translator, or converting it to English.
        I'm suggesting replace the strings with something you KNOW you can generate. Then, do the translation in your head, "at run time". This doesn't help other people but gets you to a point where you know that a particular parameter is "temperature differential per unit time" instead of "temperature speed" (or whatever).

        I have some nice plugins that know about most of the Windows API functions and can show me what parameters are being passed in the comments section to the API. I can also right click on the API call and have it search google, MSDN, etc, for that API function and then I can read how it works.
        Look into IDA Pro/Hex Rays. Likewise, Radare2 can be of some help.

        Keep in mind, when replacing multibyte characters, that you are replacing CHARACTERS, not BYTES. So, if you replace a 2 byte character, make sure you replace it with 2 ASCII characters. If, instead, you replace it with just one, then the second byte of the 2-byte character will be interpreted as the first byte of a new character. Depending on its actual value, this might be some bizarre/unexpected ASCII character or the first of a bogus 2-byte character (the second byte of which is actually the first byte of some other character)
        No, because I'm using a debugger, I'm replacing bytes that (with the language stuff) represent characters. The code-page determines how those bytes are handled. If I could convert the code page from 1033 to the Chinese one, then the two-byte character representations would have been displayed properly, as Chinese characters, instead of garbage.
        I don't understand. Where did you get the example text (with Chinese ideograms) that you posted up-thread?

        Comment


          #24
          Re: Converting hex to Chinese.

          Originally posted by Curious.George View Post
          Yes, but you have to chase down every conditional jump (jump taken path and jump NOT taken) and understand what each path through the code could possibly do.

          There are lots of ways to obfuscate code that easily confound attempts at reverse-engineering. E.g., I could compute some value and add it to a value stored on the stack -- that just happens to be a return address for some active function. At some future time, that invoked function "returns" to someplace that is not "the location after it's original invocation".

          If most of the time, the value that I compute is "0", you may not notice what is happening. Or, even that this value could potentially change thereby dynamically affecting program flow.

          I've no idea how "precious" the program you're hacking might be -- how much effort the original authors would go to in order to protect it from alteration, etc. OTOH, I have seen numerous cases of deliberate counterfeiting efforts (as well as attempted license subversion) that have been confounded by these sorts of obfuscated "tricks".



          In the early 80's, video (arcade) games were ripe for counterfeiting. A "clone" would appear within weeks of the original game's release. Keep in mind these are DEVICES -- BIG devices -- not just software you can download from a website! So, you went to great lengths to make the counterfeiter's job harder (take longer -- there is usually a small market window for new games and if you can push the counterfeiters out of that window, the market collapses on them).

          One lesson learned is NOT to make your mechanisms a distinct "go-nogo" test; if the patched program refuses to run ("Counterfeit detected. HALT!"),
          then the counterfeiters know to keep working to locate the "test" that is preventing the program from running in its patched form.

          OTOH, if you design the "tests" so they cause sporatic "anomalous behaviors" (i.e., bugs!), the counterfeiter might not notice that behavior. Or, if he does, it's sporatic nature makes it hard to track down (it's not a "hard fault").

          BUT, players will very definitely notice and avoid a game that seems to be cheating/screwing them. So, the guy (owner/operator) who deployed the game is deprived of revenue. This makes him less likely to want to buy another "knock-off" game from People's Video Game Factory #4423.



          No, I meant replace the strings in the executable with trivial, short identifiers (S01, S02, etc.) so they appear on the screen as such. Then, make a cheat sheet to explain, to you, what each "means".

          You'd said:


          I'm suggesting replace the strings with something you KNOW you can generate. Then, do the translation in your head, "at run time". This doesn't help other people but gets you to a point where you know that a particular parameter is "temperature differential per unit time" instead of "temperature speed" (or whatever).



          Look into IDA Pro/Hex Rays. Likewise, Radare2 can be of some help.



          I don't understand. Where did you get the example text (with Chinese ideograms) that you posted up-thread?
          I think you're missing something. The program I was trying to hack was created with the Wise installer, and I just had to by-pass those protections. Wise hasn't been made in a long time. A company bought them out and never did anything with them. We can date it a bit because it's not using the unicode functions. The obfuscation that Wise uses is not hard to get around. It's basically an executable ZIP program. The actual program it extracts is "encrypted" but I see how it's decrypted.

          Any change to the program violates the checksums. I found how the checksum is generated, but lost interest. There's three programs the setup.exe Wise program extracts, and I can manually extract them from RAM, fully decrypted, using my debugger.

          For the Chinese stuff I posted in a previous post, I used my debugger to figure out where the string was being loaded, and I spent a lot of time to figure out how it was supposed to be encoded. Then I went to a website that understands that encoding, posted the HEX values, and they showed up as proper Chinese characters, rather than garbage. That's where they came from.

          You see? The Wise installer's locale code ID in the program is hardcoded to 1033, but should be set to 2052. The actual hex bytes are double-byte character set, but because the locale code ID is set to 1033, Windows interrupts them as single. I used this website:

          https://r12a.github.io/apps/encodings/

          And the hex bytes show properly as gb18030, or gbk. They're the only encodings that make sense when I enter it into a converter.

          But if I used a resource editor to change the codepage from 1033 to 2052, checksum doesn't match.

          It's a lot of work to step through the assembly, and I just don't care about it anymore. I install the program, the filenames that are extracted from the setup.exe aren't English or Chinese, but Windows can load them just fine. The actual program they wrote to program te HMI / PLC controllers displays good enough English where I can figure out what's going on. Windows seems to have no problem loading gibberish for the file names, thankfully.
          -- Law of Expanding Memory: Applications Will Also Expand Until RAM Is Full

          Comment

          Working...
          X