-
Notifications
You must be signed in to change notification settings - Fork 11
Language Going Locale
Okay, I have to admit, I haven't been talking about this that much. Actually, I haven't been talking about it at all. However, this aspect of Custom Wars Tactics has been haunting me since the beginning of the project.
How do we make a game that not only can handle the data of the game, but also handle when people want to play this game in multiple languages?
Our project is no exception to this rule. Language is just not important when it comes to developing data, and most projects don't even think about language at all until the last part of the project. One of these things happen when it comes to handling language:
- Make a separate data file with that locale as the focus
- Inside the file, add certain sections for locale
- Don't even worry about locale, these guys are playing this in
- Do some wonky webpage solution to the problem
One of the biggest hurdles about language is that it creates a mess wherever it goes. As far as data is concerned, it is almost like trying to harness the power of a hurricane in a bottle. There is a few reasons why language is pretty hard to control:
- It exponentially increases the verbosity of the data
- Not every single language is in UTF-8
- There isn't a lot of ways to cleanly represent language in data
I mean, language is scary. Not only that, but many think of getting the game done first before even thinking of the language because seriously, as long as the game is playable, people can figure it out. Language is a bonus, just plain and simple.
I've been preparing us to take language seriously, but I needed a way to make sure that we were able to expand into it. Here are a few of the intermediate planned steps that I'm proposing we take to rectify this problem:
- Let's use ISO 639-1 to structure the languages
- Let's use JSON to make sure we can expand cleanly into languages
Let's take a step back and explain that, because I feel that is the crux of the conversation. One of the things that is missing from each and every clone of this game is a focus on language. However, I feel if we are to take the time now and focus on it, we will do a lot of good for the project:
- Our project will have a lot easier time being adopted to many languages and places
- Our data will be mainstream, and the de-facto standard on this type of data storage
Scrambling to handle language at the end is the road to a really bad code. Our game is going to handle it early, and when all the other clones are worried about how to handle the sudden change and creating spaghetti, we'll be able to shift into 3, 4, 5 different languages with relative ease.
Okay, I think you'd remember my push into JSON. This wasn't any accident, I chose JSON because it expands better than any data format in existence. A text format that expands easily is exceptional for dealing with different dimensions of data, and oh, is Custom Wars Tactics filled with data that adds layers upon layers of complexity:
- Object Data (Units, Properties, Inventions, Terrain)
- Commander Data
- Language Data
- Map Data
- Weather Data
- Scenario Data
- Campaign Data
- AI Data
- Server Data
- Save/Replay Data
Like, for a game based on a bunch of menu systems glued together, the amount of complexity stuffed into a simple project for Advance Wars is nothing short of a mess. The game is literally the definition of 5 dimensional chess, and it isn't ashamed to admit it. In order to handle it, a top down view of the data is something I'm always mindful of, because if all these pieces aren't juggled at once, then the entire data structure can get lost in the midst.
Not only that, but the text data format chosen matters too. Having a stiff structure like XML isn't going to cut it, because expansion is going to be very hard. Having an expansive structure like LUA isn't going to be good, because we'll have to work hard to keep it structured.
In order to handle the layers upon layers of complexity, I needed to perform an experiment on the data itself. As you see, JSON is a key/value system that is exceptional in making data easy to read and easy to parse.
- https://github.com/ctomni231/cwtactics/blob/cwt-game/gamedata/co.json
- https://github.com/ctomni231/cwtactics/blob/cwt-game/gamedata/credits.json
- https://github.com/ctomni231/cwtactics/blob/cwt-game/gamedata/weathers.json
- https://github.com/ctomni231/cwtactics/blob/cwt-game/gamedata/tiles.json
However, the more one leans into the serialized object nature of JSON, the harder it is to keep track of the code in the file. I wanted to perform an experiment on our code, and see if there was a way for us to represent data without having to go the full serialization route. The goal behind this decision was that if we can represent most of the data in this way, we can save the serialization the JSON is useful for to handle the rest of the complex items.
The premise of this is pretty simple. Represent the data by arrays:
"map": [ [0, 0, 1], [2, 3, 4] ]
If we can shelve complexity away using the arrays, then it leaves a lot less work for JSON to tidy up the rest of it, especially when it comes down to language. The Map File was the experiment starting point, but not necessarily the target.
The entire premise of making an object_options.json file was to make sure that arrays can be properly conformed to certain types and objects.
However, if you look at a map file, there is going to be more than just one bit of data. We also have descriptions, authors, and other bits of text that is going to have to be represented as well. But, the complexity just raises from there, because we are going to have a Scenario File, and a Campaign File that is sure to have dialogue trees that also have to fit nicely within this construct.
The way I see it, Custom Wars Tactics has a couple ways to deal with language data using the ISO 639-1 as an assist. Each has pros and cons, and I certainly am banking for one method. But I digress, let's go ahead and talk about what those methods are:
You can separate either the entire file:
- co.json
- co_en.json
- co_de.json
- co_zh.json
Or make separate language files for each situation as described here:
- https://github.com/ctomni231/cwtactics/blob/cwt-game/gamedata/language.json
- https://github.com/ctomni231/cwtactics/blob/cwt-game/gamedata/language_de.json
Pro:
- Really easy to implement
- Really easy to understand
Cons:
- Horrible for storage
- Really repetitive
Look, I get it. This is the easy way out. Data wise, all we'd have to do is make copies of the files and change the languages. However, this method is just a little too easy. It'll make map files have to be packaged in ZIP's, and the game won't be compact. I think a data file has to be able to represent all the data within the file, and that includes locales. Doing it this way is basically saying that language is an afterthought, and I don't think that should be the overall direction of the project.
Basically, it'll look a little something like this:
"units": [
{
"id": 0,
"type": "RECN",
"x": 4,
"y": 8,
"loaded": -1,
"owner": 0,
"hp": 100,
"ammo": 0,
"fuel": 99,
"status": 0,
"co": ["MAX", "NELL"],
"stats": [
["en", "The recon unit is..."],
["de", "Die Aufklärungseinheit ist..."]
]
},
{...}
]
Pros:
- Fairly compact
- Really easy to understand and read
Cons:
- Verbosity can get heavy
- Easy to forget translations when searching a file
This method is clean, and pretty. But I can also see it being an absolute nightmare for dialogue trees, since every single line of that would have to be split into languages, and the moment you have more than 2 languages it'll start to get messy and overly cumbersome to read. Pretty much, it'll be good if we have limited amounts of data, but it'll start to become really messy the more data we have to use.
This is a totally unrealistic example, but I'm just showing the basic concept of how this'll work. Basically, it'll look something like this:
"units": {
"en": [
[0, "RECN", 4, 8, -1, 0, "The recon unit is..."],
[1, "INFT", 6, 3, -1, 0, "The infantry unit is..."]
],
"de": [
[0, "RECN", 4, 8, -1, 0, "Die Aufklärungseinheit ist..."],
[1, "INFT", 6, 3, -1, 0, "Die Infanterie Einheit ist..."]
],
"<KEY>": [...]
}
Pros:
- Easier to copy/paste
- Expansion is the primary focus
Cons:
- Harder to realize a default language
- Tends to repeat more data than method #2
I feel this method has the most potential. It is not only clean and pretty, but it also allows expansion in a very neat and compact way. The weakness here is that it repeats data a bit more than Method #2, but it also is the method that makes expansion easiest to understand overall and straddles the line between Method #1 and Method #3.
Hopefully, this gives a bit of a history lesson on why these methods exist. There is also going into full serialization mode, but I feel that solution would be so verbose at the end that it'll end up creating files that are just astronomically large as a consequence. In the grand scheme of things, I do believe we should really strive to handle everything in one file, it is just a matter of whether we handle it using method #2 or method #3.
- Resources
- Tools
- Development
- Design Documents
- Archive