# NAME Genealogy::Occupation - Normalise and translate genealogical occupation strings # VERSION Version 0.01 # SYNOPSIS use Genealogy::Occupation; my $normaliser = Genealogy::Occupation->new(); my @occupations = $normaliser->normalise( occupation => 'Ag Lab', sex => 'M', ); # Returns ('Agricultural Labourer') # Or pass an arrayref my @more = $normaliser->normalise( occupation => ['Ag Lab', 'Ag Lab', 'Retired'], sex => 'M', ); # Returns ('Agricultural Labourer') - deduplicated and filtered # DESCRIPTION Normalises occupation strings found in genealogical records, handling common abbreviations, malformed entries, locale-specific spellings and translations into French and German. Designed to handle poor-quality data from genealogy software imports where occupation strings may be abbreviated, inconsistent or use archaic terminology. Processing steps applied in order: - 1. Filter out non-occupations (Scholar, Retired, Domestic Duties etc) - 2. Normalise abbreviations and malformed entries to canonical forms - 3. Deduplicate consecutive identical or equivalent entries (compared on pre-translation normalised forms) - 4. Apply locale-specific spellings via `Lingua::EN::ABC` - 5. Translate to French or German if system locale requires it # METHODS ## new ### Purpose Constructs a new normaliser object. ### API Specification #### Input { warn_on_error => { type => 'boolean', optional => 1, default => 0, }, } #### Output { type => 'object', isa => 'Genealogy::Occupation' } ### Arguments - `warn_on_error` - If true, unknown occupations that cannot be translated will emit a warning via `carp` rather than silently falling back to English. Optional, defaults to 0. ### Returns A blessed `Genealogy::Occupation` object. ### Side Effects None. ### Notes The system locale is detected once at construction time and cached for the lifetime of the object. ### Example my $normaliser = Genealogy::Occupation->new({ warn_on_error => 1, }); ## normalise ### Purpose Normalises one or more occupation strings, applying filtering, deduplication, abbreviation expansion, locale spelling and translation in order. ### API Specification #### Input { occupation => { type => ['string', 'arrayref'], }, sex => { type => 'string', optional => 1, memberof => ['M', 'F'], }, } #### Output { type => 'arrayref', element_type => 'string', } ### Arguments - `occupation` - A single occupation string or an arrayref of occupation strings. Required. - `sex` - The sex of the person, `'M'` or `'F'`. Optional but required for correct gendered translations in French and German. Defaults to `'M'` if not provided when a gendered translation is needed. ### Returns An arrayref of normalised occupation strings. May be empty if all occupations were filtered out. ### Side Effects If `warn_on_error` was set at construction and an occupation cannot be translated, emits a warning via `carp`. ### Notes Deduplication operates across the full list of occupations passed in. Processing a single occupation at a time will not deduplicate across multiple calls. ### Example my $result = $normaliser->normalise( occupation => ['Ag Lab', 'Ag Lab', 'Retired'], sex => 'M', ); # Returns ['Agricultural Labourer'] my $result = $normaliser->normalise( occupation => 'Platelayer Railway', ); # Returns ['Railway Platelayer'] # AUTHOR Nigel Horne `` # BUGS Please report bugs via the GitHub issue tracker: [https://github.com/nigelhorne/Genealogy-Occupation/issues](https://github.com/nigelhorne/Genealogy-Occupation/issues) # TODO - Expand French and German translation tables - Add support for additional languages - Add `normalise_place()` equivalent for occupation place strings # SEE ALSO - [Lingua::EN::ABC](https://metacpan.org/pod/Lingua%3A%3AEN%3A%3AABC) - [Params::Get](https://metacpan.org/pod/Params%3A%3AGet) - [Params::Validate::Strict](https://metacpan.org/pod/Params%3A%3AValidate%3A%3AStrict) - [Return::Set](https://metacpan.org/pod/Return%3A%3ASet) # LICENSE AND COPYRIGHT Copyright 2026 Nigel Horne. Usage is subject to GPL2 licence terms. If you use it, please let me know.