Dates in Genealogy

· Read in about 4 min · (657 words) ·

Some of you may have noticed that, in lists of events pertaining to people on this list, the dates were not sorted. Instead, they appeared in the order they were entered into my database, which made for a fairly random list, since I rarely found out about things in chronological order. Now, they are finally sorted! As an example, see my great grandmother Hillery (pictured as a child at the top of this post), and notice that her birth is listed before her death (it wasn’t before…) Also, have a look at my 7th great grandfather, John Ross. He has a number of events listed, and you will see thay are in order as well.

What’s big deal, you ask? How hard can it be to sort dates? Sorting dates is actually pretty easy. The problem when dealing with genealogical records is that what you have isn’t necessarily a specific date. You don’t always know an exact date; it might be just a year, and may not be sure even about that. Or it could be a date range. Or possibly you just know a date (or partial date) that your are sure the event happened after or before, but how longer after or before. This is much more complex than “just a date” and much harder to sort.

The GEDCOM standard for dates defines a dizzying array of possibilities for the date field of a genealogical record. To start with it could be a date such as 12 DEC 1887 (full dates are always specified in this format.) The day, or both the day and the month could be missing, leaving a partial date.

Now it starts to get complicated. If an event happened sometime with a range of date, you can say something like BET 1659 AND 1687. If you know someone was alive on a particular date, perhaps because they appear on a census, and you know that they died but you know when, you can use something along the lines of AFT 13 MAR 2003. If you are pretty sure some event happened on a particular day, or even just a particular year, you can use ABT MAR 1773. There are other possibilities as well.

Why so complicated? When doing this kind of research, what you know is often much more complex than either knowing a date or not. You want to record everything you know, but don’t want to imply anything you don’t know. If you are uncertain, you want express that as well. So the GEDCOM standard gives you a rich set of tools for doing so.

So how did I sort the dates? (if you want to avoid the geeky stuff, avert your eyes now…) I cheated a bit. Rather than parsing all the possibilities allowed by the GEDCOM standard, I just search the string for the first thing that looks like a date. If you really want to know, the regular expression I am using to do this is:

(?:(\d{1,2})\s+)?(?:(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s+)?(\d{4})

Notice that the day and month are optional in this expression. If the day is missing, I set it to “1”. If the month is missing a set it to “JAN”. If everything is missing, I set the whole date to the current date. This is why all the items with no dates are always at the end of the list. Once have the calculated date for each item in the list I sort the list based on that date. Of course, I display the original date, not the calculated one.

The sharp-eyed among you will have noticed that this is far from perfect. For example, it will allow dates where the month is missing but not the day, which should never happen. Also, it doesn’t work with years shorter than 4 digits. But it solves my sorting problem, so Yay! If I ever trace my family back so far that I need a 3-digit year, I’ll revisit it!