Introduction
General introduction to the data base
by J. Thomas Lindblad
The following offers elaborations on the basic structure of the data base and the ways in which information in the original source was converted to fit the format of the data base. Separate attention is given to the important issue of firm identification.
Basic structure
The primary record or basic unit of observation in the data base is the piece of information on one individual incorporated firm in any one of the eight years selected for processing. The same firm may thus appear at least once and at most eight times in the data base.
Processing began with a comparison between two adjacent years, 1925 and 1926. It then transpired that adding an adjacent year to a completed one generated little new information at considerable cost. This becomes important when considering financial constraints to the highly labour-intensive process of manual input.[1] By selecting seven benchmark years, in addition to the odd one for comparative purposes, we seek to cover developments during the three final decades of effective Dutch colonial rule in Indonesia. The benchmark years at identical five-year intervals are: 1910, 1915, 1920, 1925, 1930, 1935 and 1940. The data base thus comprises these seven years with 1926 added. This has repercussions for the subsequent statistical analysis. Continuous time series requiring annual data cannot be constructed and will need to be replaced by comparisons between the seven selected moments of observation.
Although the issue of 1926 contains about ten per cent less firms than in 1925, 3275 against 3497 entries, most information for most companies remains identical. This prompted a method of input that makes maximum use of the information already entered into the data base. The year 1920 was thus built up from a copy of 1925, with necessary additions and deletions, as was 1930 with respect to 1926. This procedure was duplicated in constructing both the earlier and the later years. By applying this procedure, a considerable saving was realized.[2]
In total, the data base contains 22,471 entries, or an average of 2809 entries per year. The year with the largest number is 1920 with 3736 entries, the year with the smallest number 1935 with 1884 entries. The main trends – increase from 1910 to 1920, decline from 1925 to 1935, a slight recovery in 1940 – are reminiscent of the preliminary findings in early processing, yet more solidly substantiated.
The data base is set up as a spreadsheet with entries of firms in rows and variables in columns. For purposes of overview, the 16 variables are listed here with their fundamental characteristics:
A Year ordinal number as given for issue of source
B ID number assigned ordinal number for firm identity
C Firm Name qualitative variable as adapted from the source
D Sector assigned letter code
E Branch assigned letter code
F Stated Aim qualitative variable inferred from source
G Founded year of incorporation as in source
H Headquarters geographical name as in source
I Location geographical name as in source
J Director personal name as in source
K Equity quantitative variable (amount) as in source
L Currency assigned letter code
M Dividend quantitative variable (percentage) as in source
N Dividend Year year as in source
O Firm Name Supplement qualitative variable as in source
P Owners personal/corporate name as in source
From source to data base
The following elaborates, by variable, how the information from the Handboek was selected and, when needed, modified to fit into the format of the data base.
Variable A Year
This is the year of the issue of the Handboek, which may not necessarily coincide with the year to which the given information applies. Considering the procedure of firms submitting standardized forms by October in the year preceding publication, it is highly likely that the same information is expected to be accurate also in the following year, with the notable exception of paid-out dividends that as a rule apply to a preceding year. This variable cannot have a missing value and any other value than the eight selected years is due to a regrettable error.
Variable B ID Number
The issues of the Handboek contain no cross references to other issues. Therefore, a device had to be constructed in order to be able to identify the same firm in different issues. This is accomplished by a six-digit identification number built up as follows:
- digit 1: a code for the year in which this firm was first encountered at construction of the data base. The coding is: 1910 = 9, 1915 = 1, 1920 = 2, 1925 = 3, 1926 = 4, 1930 = 5, 1935 = 6, 1940 = 7.
- digits 2-5: the page number in the Handboek issue where the firm was first encountered.
- digit 6: the number of the paragraph on the page in the Handboek where the firm was first encountered. As a rule, a page in the Handboek contains at most four or five paragraphs. The number ‘9’ is a fictitious paragraph number used when identification had to be revised.
The appropriate ID Number was assigned to all entries in the data base after establishing firm identity.
Variable C Firm Name
The source gives the name of each firm; no nameless enterprises are included. Yet, name of the firm is given in a bewildering variety of formats. The full name often includes a label indicating the type of enterprise. The most common labels are ‘Cultivation company’ (Cultuur-Maatschappij) and ‘Trading company’ (Handel-Maatschappij), but several other labels are also in use. Geographical labels are also often added, mostly ‘Netherlands Indies’ (Nederlandsch-Indisch) or otherwise a reference to a specific region or place. The use of such labels tends to obscure the unique part of the name of the firm, which in turn is given in a spelling that may vary over time. As a consequence, the recognizability of the individual firm in the source is seriously impaired, both when offset against other firms and against entries applying to the same firm in different issues.
In the data base, virtually all original firm names in the source have been adapted to a format putting the unique corporate name up front and for brevity leaving out some of the redundant non-specific information. This adaptation of the source information modification also included a harmonization of the spelling of the unique corporate name. Whenever extensive modification of the firm name proved necessary, the full original formulation of the firm name was retained under separate cover (variable O below).
Variable D Sector
The source obviously does not provide a differentiation of firms by economic sector. However, such a classification is vital in terms of enabling statistical analysis. Therefore, all firm entries in the data base were equipped with a code globally indicating the economic sector where operations were taking place.
The sector code was assigned on the basis of information in both the label in the firm name indicating the type of enterprise (variable C) and the description in the source of intended activities (variable F). Seven sector codes are used:
A agriculture, applied to all firms labeled as ‘Cultivation company’ (Cultuur-Maatschappij), except when non-agricultural activities are explicitly stated.
B finance, in particular banking, but also including other financial institutions.
I manufacturing or industrial processing in the broadest sense, excluding mining.
M mining, including petroleum.
O other economic activities, often unspecified, including public services.
S commercial services in widest sense .
T trading, applied to all firms labeled as ‘Trading company’ (Handel-Maatschappij) except when other activities than trading are explicitly mentioned.
Occasional inconsistencies are inevitable, partly due to lack of clarity in the description of activities as given in the source, partly because of changes in the system of classification.[3] A separate problem concerns firms with multiple types of economic activity spread over more than one sector. The structure of the data base only allows for one sector designation for each entry. Closer scrutiny of the description of activities was needed in order to identify the likely core business. This inevitably introduces a subjective element of interpretation in the assignment of sector code.
The by far most populous sector is agriculture with 8145 entries (36 per cent), followed in the second rank by trading, 4171 entries (18.5 per cent), and in the third rank by ‘other’, 2020 entries (9 per cent). In a conventional setup of economic sectors, firms with code A would count as the primary sector, firms with codes I and M as the secondary sector, and firms with one of the remaining four codes (B, O, S, T) as the tertiary sector.
Variable E Branch
Just as with the economic sector, the source itself does not apply a classification of firms by branch of industry. Again, such a classification is an indispensable tool at an analysis focusing on economic activity. Therefore, a branch code was assigned to virtually all firms outside trading, representing 81 per cent of the total population of the data base (18,271 entries). The extreme prevalence of unspecified commerce in firms defined as ‘Trading firms’ precluded a sensible further differentiation by branch within the sector.
For firms outside trading, the branch of economic activity could generally be inferred from the stated aim of operations (variable F), sometimes in combination with the label in the firm name indicating the type of business. The branch codes for the sectors A, B, I, M, O and S are as follows:[4]
A cof (coffee), cop (copra), gen (agriculture in general), kin (cinchona), olie (vegetable oils), rice (including bibit), rub (rubber), sug (sugar), tea, tob (tobacco), var (various, including citronella, fibres, kapok, fertilizer, palm oil, spices and tapioca).
B bank, fin (non-bank financial activities), verz (insurance).
I mach (machinery, including metals), tex (textiles), var (various, including beverages, chemicals, cigarettes, leather, shipbuilding and soap), wood (including logging).
M min (mining activities, including coal, gold and silver and tin), oil (petroleum).
O bio (cinema and theatre), hotel (hotels, restaurants and bars), ice (ice and mineral water), pharm (pharmacies), print (printing and publishing), publ (public services, for instance electricity), tran (transport in the widest sense, including motor cars, trains, ships and port facilities), var (various, often specialized shops).
S con (construction, including contractors and building materials), real (real estate), var (all sorts of professional services, including administration, advertising, engineering and technical bureaus).
Again, occasional inconsistencies are inevitable, in part due lack of clarity in the information given in the source, possibly also because of mistakes of interpretation at the time of conversion of the information to the format of the data base.
The foremost branches of economic activity are rubber cultivation in sector A, 1635 entries (7.3 per cent), construction in sector S, 1359 entries (6 per cent), ‘various’ manufacturing in sector I, 1280 entries (5.7 per cent), and transport in sector O with 842 entries (3.7 per cent).
Variable F Stated Aim
Most firms listed in the Handboek did provide a description of current and intended activities. Still, these descriptions are often unnecessarily wordy and on occasion rather general or diffuse. A probable explanation lies in the conscious strategy to stake out a wide scope of operations with a view to possible future expansion that could then be undertaken without separate approval by shareholders. As a consequence, it is often difficult to unequivocally distill the firm’s core business from this piece of information alone. In the event, this could only be established for 14,360 entries (64 per cent of the total).
The content of this variable only stems from what is actually stated in the source, that is, it does not reflect an interpretation drawing on the firm’s name. In addition, it is an alphanumerical variable rendering the content of a text excerpt, which by definition may prove difficult to squeeze into the format of a spreadsheet structure. In combination with other information, however, this variable played an important role when seeking to assign the most appropriate sector and branch codes for the firm’s activities (variables D and E).
Variable G Founded
This is the year of incorporation as a firm with limited liability for its owners. It is given for almost all firms, 22,066 entries or 98 per cent of the total. The precise date as well as the date of publication in a law gazette such as the Nederlandsch Staatsblad are given in the source but left out in the data base. The year of foundation should obviously be the same for the same firm when figuring in different issues of the Handboek. The year of foundation proved a useful tool in firm identification.
Variable H Headquarters
The city or town, where the firm’s headquarters are located, is almost universally given, specified for 22,382 entries (99.6 per cent of the total). The information is copied directly from the source without any harmonization of spelling over time or classification by region in the Netherlands Indies or by country outside the Dutch colony. Batavia (now Jakarta) was the most popular location for headquarters in the Netherlands Indies, whereas Amsterdam ranked first among locations outside the colony and London took the lead among headquarters outside both colony and mother country.
Dutch-owned firms clearly had a professed preference for headquarters (zetel) in the Netherlands, arguably on account of access to the Dutch capital market and contacts with other Dutch corporations as well as government officials. Smaller Dutch-owned firms often chose to keep headquarters in the colony itself, usually near the location of actual operations. The same also held true for firms with non-Dutch foreign owners. Firms owned or managed by Chinese or indigenous Indonesian residents of the colony by definition always had headquarters in the Netherlands Indies.
The location of headquarters plays a crucial role in firm identification and differentiation of firms by nationality.
Variable I Location
The location of a firm’s activities is explicitly stated in merely 3105 entries, corresponding to 15 per cent of the total. This number includes both general regional labels and individual cities or towns. The information was copied directly from the source, again without harmonization of spelling or any interpretation based on information elsewhere.
This variable has some idiosyncrasies of its own. The most conspicuous difficulty lies in the prevalence of multiple locations of operations. Since the structure of the data base only allows for one single location of operations, it proved necessary to select the one deemed most important to represent the firm’s core business. A blank here means that either the information was not provided in the source or, more commonly, that the firm operated at a host of locations, from which the most prominent one could not be isolated.
Another difficulty refers to a change of locations over time. Such information was included in the entry in the relevant year, but could cast doubt on a correct identification of the firm on the basis of its record over time.
Variable J Director
The name of the firm’s director is given for most firms, about 85 per cent of the total (19,209 entries). This testifies to a high degree of compliance with the publisher’s specific request for names of directors to be provided. On occasion, however, it is not quite clear in the source what the precise function was of the individual cited. In smaller firms, the director or manager is likely to coincide with the owner. This information was not included in early computerization of the Handboek data since the software at the time only allowed for processing of numerical variables. Inclusion of so many personal names of directors invites efforts in the vein of network analysis.
Variable K Equity
The firm’s equity capital is given in the source in almost all entries, 21,983 entries corresponding to 98 per cent of the total. This again testifies to the high willingness among subscribers to comply with the specific requests by the publisher. Equity is generally given in both nominal terms and as actually paid up. As a rule, the citation in the data base refers to paid-up equity capital. However, it needs to be kept in mind that paid-up equity capital does not offer a full representation of the funds at the firm’s disposal. Information on financial reserves is rarely given and was not included in the data base.
All amounts are in thousands of the applicable currency, mostly the Dutch guilder. Firms incorporated in the colony may have stated equity in Netherlands Indies guilders. Non-Dutch foreign firms with headquarters overseas generally used the currency of their home country when stating the amount of equity. Such amounts need to be converted into Dutch guilders prior to further analysis (variable L below).
In the absence of any information whatsoever on employment and output, the amount of equity capital by and large forms the sole yardstick to measure the size dimensions of the enterprise.[5] This financial parameter can therefore play a pivotal role in statistical analysis.
Variable L Currency
The default currency in the source and the data base is the Dutch guilder, the value of which was virtually identical to that of the Netherlands Indies guilder.[6] This variable solely applies to equity as given in the source (variable K). The guilder is used to express the amount of equity in the vast majority of cases, 94 per cent, or 21,044 entries. The only other major currency cited in the source is the pound sterling that was used in 801 entries (3.5 per cent of the total). A host of other foreign currencies between them accounted for only 170 of all entries (less than 1 per cent).
Under the Gold Standard, which applied in the Netherlands until 1936, rates of exchange with foreign currencies remained fixed. A conversion of all equity cited prior to 1936 can therefore be done by simply applying the prevalent rates of exchange. The rates were: £ 1 = ƒ 12.50, US $ 1 = ƒ 2.50, ¥ 1 = ƒ 1.23, 1 Thaël (occasionally used by Chinese firms) = ƒ 0.74, 1 DK (Danish crown) = 1 NK (Norwegian crown) = 1 SEK (Swedish crown) = ƒ 0.67, 1 DM = ƒ 0.59, 1 SFR (Swiss franc) = ƒ 0.48, 1 FFR (French franc) = ƒ 0.10, 1 BFR (Belgian franc) = ƒ 0.07. Such a conversion is essential for comparisons of equity capital across nationalities.
Variable M Dividend
The dividend rate is given as the percentage share of nominal equity capital that will be paid out to shareholders in the coming year. As a rule, the dividend rate is determined from financial results in an earlier year that is specified under a separate cover (variable N). A non-zero dividend rate is cited for 3605 firm entries, corresponding to 16 per cent of the total. In 1237 entries (5.5 per cent), the dividend rate is explicitly stated as non-existent (nihil in the original Dutch). No dividend rate at all is specified in the source in all remaining entries, 17,269 entries or 78.5 per cent of the total.
It is unthinkable that corporate firms operating in the Netherlands Indies failed to generate sufficient profits to permit any remuneration on invested capital in such a large number of cases. Apart from disappointing financial results, there are various reasons why the firm may refrain from paying out dividends to its shareholders. There are also plausible reasons why subscribers to the Handboek may not wish to make this kind of sensitive information known to the general public. The requirement to disclose information on dividend payments in annual reports only applies to corporations listed on the stock exchange.
There is no way of knowing whether a zero dividend rate implies disappointing financial results or whether this information has been deliberately left out. By implication, caution should be exercised in calculating average rates over time and across sectors or branches. Outcomes may be distorted by inclusion of explicitly stated zero rates. Therefore the safest procedure is to rely exclusively on non-zero rates.
Variable N Dividend year
For the sake of completeness, the data base specifies the year to which the cited dividend rate (variable M) actually applies. The reference is usually to one or even two years earlier. This applies to both non-zero dividend rates and rates that were explicitly given as zero, in total 4730 entries (21 per cent). This information suggests that the level of the stated dividend rate was common at the time of the issue of the Handboek in question.
Variable O Firm Name Supplement
This is the full name of the firm as stated in the source, prior to adaptation to ensure consistency and recognizability. It is provided for 18,809 entries or 84 per cent of the total. It is only left out if there is no difference with the firm name as already entered into the data base (variable C). The full original firm name does not readily lend itself for an alphabetical ranking of firms or entries of the same firm as it usually begins with a label indicating the type of business.
Variable P Owners
The proprietors of the firm may be either private individuals or other companies. They are clearly identified in only a minority of the entries, 3154 entries or 14 per cent of the total. In combination with names of directors (variable J), this information may prove useful in exploring corporate networks.
Firm identity
The CBI data base can be accessed in two ways. All entries are found in both a standard spreadsheet format and in a catalogue with individual firms arranged alphabetically by year. The entries in the catalogue offer a quick summary of relevant information about the individual firm. The unique identification number of the firm (variable B) permits user to observe the same enterprise at different points in time. The procedure of identification was an essential part of the conversion of information from source to data base.
The data base was constructed through a systematic comparison of the information for a completed year with the information given in the source for the year to be added. This method entailed a decision whether a firm in the new issue was indeed identical to one that was already included in the data base. If so, the assigned identification number would be retained. Four criteria were applied: the unique segment of the name of the firm, the year in which it was founded, the intended type of economic activity and, finally, the order of magnitude of equity capital. Of these criteria, the year of incorporation proved to be the most reliable one, whereas firm names and stated aims were more susceptible to variations in spelling or formulation. The amount of equity was likely to change over time, yet not to jump suddenly from one size category to another. Whenever a firm was found not to be identical to one already in the data base, a new identification number was assigned.
The identification number has as such no intrinsic meaning; it only guides users to the place in the Handboek where this particular firm was first encountered. The foremost advantage of the identification number is as a device in statistical analysis of consecutive observations of the same firm at a maximum of eight separate points of observation. Such an analysis may render insights into the business history of the individual enterprise.
The most straightforward way of spotting the individual firm in the data base remains the firm name and it is for that reason that the full firm name, as given in the source, had to be modified by putting the unique part of the name up front. Even so, some firms ended up with the same modified firm name. This anomaly could only be resolved by adding a fictitious number to the firm name.
The identification imposed upon firms listed separately from one another in the source is not entirely foolproof. There are two types of possible errors. One occurs when two entries get the same identification number although they are in fact different companies, another when different entries pertaining to the same firm have not been assigned the same identification number. Various checks - firm names by identification number, identification numbers by firm name – were applied to keep such errors to a minimum.
Notes
[1] The option of full coverage was chosen when producing a microfilm copy of all 53 issues of the Handboek (Leiden University Library: Special Collections. KIT Collection). Also with the microfilm, manual input obviously remains necessary to convert the information into machine-readable format.
[2] The total effort of manual input is estimated to have amounted to about 360 working-days.
[3] Activities of a firm labeled as a ‘Cultivation company’ may be located outside agriculture even if this is not specified. Similarly, firms classified as ‘Trading companies’ may have their foremost activities outside trading. A minuscule number of cases, 28 entries (0.001 per cent), are still designated with codes that became obsolete when revising the system of classification. These codes are H (= T), N (= M) and D, P and W (= O).
[4] The system of classification was revised in the course of constructing the data base, which explains why original Dutch-language branch codes are occasionally in use.
[5] A rudimentary, alternative indication of size would for agricultural enterprise be the number and area of the estates under the firm’s command. Names of the estates are generally given in the firm entry, whereas details on the estates are provided in a separate chapter of the Handboek. Similarly, the scope of activities of trading firms may be suggested by numbers of branch offices in the Netherlands Indies, listed in the back of the Handboek and arranged by city or town.
[6] W.L. Korthals Altes, De betalingsbalans van Nederlandsch-Indië 1822-1939. PhD dissertation, Erasmus University Rotterdam (1986) xvi.