Navigate / search

Generating words for your conlang

One of the most labour-intensive stages of creating a conlang (constructed language) is generating the masses of words required, and also of ensuring you have enough distinct words that use the full range of options that you designed into the language. Thankfully it’s a trivial task to do this with a computer program, and being a developer myself I’ve written a simple script that can be used to generate an “instant dictionary”.

Now obviously you’re not constrained by the output; if you see a word that doesn’t feel right for the assigned meaning, then of course you’re free to swap it for something else. Use your auto-generated dictionary as a jumping-off point to get you going, rather than a shackle for your creativity!

The technical stuff

Perl script in OSX Terminal
Perl script in OSX Terminal

Nice as it would be to offer you a standalone program, I’m not qualified to do that. Instead, I’ll show you how to use a popular programming language called Perl to run my code. Perl was designed to be good at handling strings of characters, so it’s the perfect language for conlanging. Don’t worry, you won’t have to write any code per se, just the file that describes your conlang.

If you’re using a Mac, you can skip the rest of this paragraph, as Perl is provided as part of the OSX operating system (ditto with Linux, but if you’re a Linux geek you probably already know about Perl!). If you’re a Windows user, you can get a free installation bundle from ActiveState; I originally learnt Perl when I was still on Windows. If you go this route, I suggest selecting the 5.14.x package, as I haven’t had the opportunity to try my code on version 5.16.x yet.

You’ll also need a program that can create and edit plain text files. In Windows, you could use Notepad or Wordpad, but I recommend the free text editor Notepad++ as it will be better at handling the files I’ve written in OSX (Windows and Unix use different non-printing characters to denote line endings). For Mac users, the built-in TextEdit program will do fine as long as you make sure to save new files as plain text, not .rtf, or you might like to try the free text editor TextWrangler.

Both Notepad++ and TextWrangler have syntax highlighting, meaning they colour-code the text according to its meaning within the format/programming language being used. This doesn’t make much difference with text files, but if you start poking around in the Perl script, it’s very helpful for distinguishing blocks of code from comments.

Please note that I can’t offer technical support on installing and using these common tools; there are masses of resources online, so I recommend you resort to Google if you have any problems.

You’ll need to run the script in a terminal window; rather than try and explain that here, I’ve linked to some handy online tutorials:

Finally you’ll need to download the zip file containing my Perl script and some sample input files.

Preparing your input

My dictionary script takes two input files. One is a simple list of English words that you’d like to “translate” into your conlang. I’ve provided two sample lists, based on various “universal word root” lists that I’ve encountered over the years. One is fairly short—only 200-odd words, mostly verbs, nouns and adjectives—whereas the other is over a thousand items long and covers a wider range of concepts.  It’s a good idea to edit them to suit your conlang’s culture; for example I omitted words like “husband”, “wife” and “marriage” from my Vinlandic dictionary as skraylings don’t live together after mating.

The other input file you need is one that defines the sound patterns in your language; it’s the one named ‘example.conf’ in the download. This requires a bit more explanation!

The lines starting # are comments, meaning that they are ignored by the Perl script, so you can use them to remind yourself what each part of this “sound configuration” file is doing.

First we define our pattern of sounds within a word – see last week’s post on how to design a phonology for your conlang. Each capital letter in the pattern represents a list of sounds that we’ll define later in the file. The letters can be anything you want as long as they’re unique to each sound group; I generally use C and V to represent the full range of consonants and vowels respectively, and other letters to represent subsets of sounds.

Here, I’ve decided that the pattern is a consonant, a vowel, a “final” consonant, and a short vowel. Note that there has to be a space between each letter.

# Pattern of phonemes in root


Next we have to define each sound group. For each one, you need a line Name=X, where X corresponds to one of the letters in your pattern, then immediately below that you list the sounds, again separated by a space.

In this very simple example, all sounds have equal frequency in the language. For a more naturalistic result you might want to increase the frequency of some sounds and make others rarer. This is simple to do; just include multiple examples of the common sounds in your list, and include rare ones only once. (This will also potentially introduce homophones into your conlang, which is a good thing if you’re aiming for naturalism.)

Another refinement you can make is to allow an element to be optional. To do this, include a hyphen as one  of your possible “sounds”; the script will strip it out when it prints the dictionary. As with other sounds, you can vary the probability by altering the frequency of hyphens in your list.

Note also that the definitions don’t have to be in the same order as the elements in your pattern, and that you only need to define each group once.

# Consonants

p t k b d g
m n l r s z
h w y -

p t k b d g
m n l r s z

# Vowels

a e i o u
aa ei ii ai ao ou

a e i

Create your own file; if you name it after your conlang (e.g. vinlandic.conf), that name will be used in the header of the output file. Make sure you keep all the files in the same directory as the script.

Running the script

Now comes the moment of truth!

In your terminal window, navigate to the folder that contains the downloaded files, and type:

perl -s example.conf -d short_wordlist.txt

Obviously you can substitute the name of your sound file and use the longer word list if you prefer. The script will run, and within moments you should have your output file. Then in your Windows File Manager (Finder in OSX), look in the conlang directory for an HTML file, then double-click on it to open in a browser window. Voilà! Instant dictionary :)

Screenshot of sample output using the provided input files
Screenshot of sample output using the provided input files

If you would prefer a simple text file listing the results, append the flag –txt to your command, thus:

perl -s example.conf -d short_wordlist.txt --txt

Either way, if you’re not happy with the results, you can run the script again. By default the script includes a timestamp in the output file name, so it creates a new file each time.

That’s really all there is to it. I hope you find this script useful; do let me know how you get on!




Loved it!! Or, in your own example language, this is ougo work, very mebi and searu! 😀

Tolkien and Burgess would be proud!

How about putting your code on github, and maybe even CPAN?



TBH, I can’t be bothered with the overhead of putting my script into a public repository – I posted it here because people had asked about it. If it went into github I’d feel obliged to improve and support it, and I just don’t have time for that. Too much like my day job!

Leopoldo H. Mcbride

bennyc50 : Am I the only person in the world that writes perl scripts in a text editor and not in eclipse?


Nope – I work in vim all the time, including at my day job (as you can see from the Terminal screenshot) :)