London School of Economics EOPP: Economic Organisation and Public Policy Programme LSE
EOPP: Economic Organisation and Public Policy Programme

Data Sampling

Our data come from a village- and household- level survey conducted in Andhra Pradesh (AP), Karnataka (KA), Kerala (KE) and Tamil Nadu (TN). The survey was conducted between September-November 2002.

The administrative unit below the state in India is the district. Each Indian district is divided into blocks. Every block consists of multiple GPs. A GP typically consists of 1-5 revenue villages, and its demarcation is done on a population basis. The Panchayat Act of every Indian states mandates the population criteria to be followed in that state. In Andhra Pradesh and Kerala, it is a (revenue) village irrespective of its size. In Tamil Nadu it is a revenue village with population of 500 or more. In Karnataka it is a group of villages with population between 5 and 7 thousand.

Sampling was done in multiple stages, and consisted of purposive sampling up to the level of blocks and random sampling within these blocks. Our final sample consists of 527 villages belonging to 201 elected GPs. In a random sub-sample of 259 villages, 20 household surveys per village were conducted, giving a sample of 5,180 households. In addition, a household survey was also fielded to an elected member of the GP in every village (with precedence given to the GP head if he/she lived in that village) - this gives us an additional household sample of 544 elected officials. We describe the stages of our sampling below.

  • District sample: For each pair of states two districts (one per state) that shared a common boundary were selected. One district in KA (Kolar) that shared boundaries with both AP and TN entered the sample twice. The same holds for one district in AP (Chithoor) This gives us nine unique districts - 2 districts each in AP, KE and TN and 3 in KA. The district pairs were selected, with one exception, to focus on districts that had belonged to same administrative unit during colonial rule, but had been transferred to different units when the states were reorganized in 1956. These are the districts of Bidar and Medak from the erstwhile state of Hyderabad, now in KA and AP respectively, Pallakad, Coimbatore, Kasargod, Dakshin Kanada, Dharmapuri, and Chithoor, all from erstwhile Madras state and now in KE, TN, KE, KA, TN and AP respectively. In KA, we also sampled Kolar district. This was a part of erstwhile Mysore state, the precursor to modern KA, and thus does not follow the colonial- rule matching process described above. However, its inclusion increases variation when we compare the other three states with KA. Furthermore, Kolar has common borders with both Chithoor in AP and Dharmapuri in TN - which allows for a three part comparison within the same geographic area. The map provides a graphical description of this matching.

  • Block sample: For each district pair (which shared a common boundary) 3 pairs of blocks were selected (that is, 3 blocks in each of the two districts). If one district was matched with 2 different districts then 6 blocks were chosen from it (three per match). In one block in KE an additional block was sampled as a check on our language matching. This gave us a total of 37 blocks (12 in KA, 9 in AP and TN and 7 in KE). The additional block was sampled in Kerala as a check on our sampling strategy. For each pair of districts the three pairs of blocks which were the most linguistically similar, in terms of the mother tongue of individuals living in the block, were chosen. Language is a good proxy in these regions for cultural differences given the prevalence of caste and linguistic endogamy. Hence, language matching allows us to partially control for "unobservable" socio-cultural differences. Linguistic similarity was computed using 1991 census block level language data. The historical and administrative similarity of linguistically matched blocks was checked using princely state maps and the Report of the States Reorganization Committee. Details on how the linguistic and historic matching was implemented are in Appendix II.

  • GP sample: In AP, KA and TN we randomly sampled 6 GPs per block. In KE the population per GP in KE is roughly double that in the other three states. For this reason, in KE we instead sampled 3 GPs in every block. This procedure gave a total of 201 GPs

  • Village sample: In every sampled GP in AP, KAand TN we sampled all villages if the GP had 3 or fewer villages. If it had more than three villages, then we selected the Pradhan's village and randomly selected two other villages. We excluded all villages with less than 200 persons from our sampling frame. All hamlets with population over 200 were considered as independent villages in drawing the sample. In KE, we directly sampled wards instead of villages (as villages in KE tend to be very large) - we sampled 6 wards per GP. This gave us a final village sample size of 527 villages. The state-wise break up is AP: 69 villages, KA: 182 villages, KE: 126 wards; TN 129 villages . For sampled villages, any associated hamlets were also included as part of the sample.

  • Household sample: In every block in AP, KA and TN we randomly selected 3 of our 6 sampled GPs and conducted household interviews in all sampled villages falling in these GPs. In KE we randomly selected 2 GPs in one block and one GP in the other block. Within sampled GPs we conducted household interviews in all sampled wards. Overall this gave us a final sample size of 5180 households.

  • Choice of households within a village: Twenty households were sampled, of which four were always SC/ST. The survey team leader in every village walked the entire village to map it and identify total number of households. This was used to determine what fraction of households in the village were to be surveyed. The start point of the survey was randomly chosen, and after that every Xth household was surveyed such that the entire village was covered (going around the village in a clockwise fashion).

  • Elected official sample: In every village in our sample an interview was conducted with an elected Panchayat official - if the Pradhan lived in the village he/she was interviewed, otherwise a ward member was randomly selected. In some cases, the Pradhan was not available at first visit and a ward member was selected. However, in these cases the investigator usually went back and interviewed the Pradhan. Hence our sample of elected officials is larger than the number of sampled villages and stands at 544. Number of villages for household sample were: AP: 32 villages, KA: 90 villages, KE 66 villages, TN 71 villages.