Blog / News

Survey response coding: Human vs. Machine

hot air balloons in sky
in Market Research

Extracting pertinent data from open-ended responses can be time-consuming and costly for companies involved in market research. International research presents the extra layer of complexity associated with handling multi-lingual responses. One method of cutting costs and data processing time is to assign a code to frequently-recurring answers from a ‘code-frame’. The need for back-translation of the foreign verbatims to English is therefore completely eliminated as the output of the coding process facilitates easy data analysis.

The process of coding has typically been performed manually because verbatim responses can often be heavy with synonyms and colloquial language making any automated process prone to mis-interpretation errors. As recurring answers are identified the code-frame can be easily expanded upon. For multi-lingual verbatims, coders should only analyse responses written in their native tongues. If required to construct code-frames then they collaborate closely with the researchers conducting the study and end-users to ensure these frames are specifically tailored to the objectives of the research. Without close collaboration between coders, researchers and end-users, responses may be categorised too broadly, for instance, low price and value for money responses may be grouped together, but the client may see them as distinct issues and wish for them to be coded separately.

With very large data samples, manual coding can still be a time-consuming task. The coding can be split up between several coders to speed up the process but this introduces consistency problems. To control quality it’s necessary to have an independent linguist running secondary checks on the coded responses to identify inconsistencies. It is optimal for the independent linguist to be involved at the commencement of work and for them to be closely overseeing all coding activity to spot errors or deviations at an early stage. This method is shown to produce the most accurate data extraction results from studies assessing the conformity and precision of categorised responses.

In recent years, text analysis software has become available to make automated data extraction from large-scale survey responses feasible. Companies such as SPSS and SAS provide a suite of software solutions to partially automate aspects of the coding process and combine them with other functionalities used for market research analysis. Some products can also be integrated with machine translation products to analyse responses in certain foreign languages, although they still need to be closely scrutinised by linguists for quality. Other semi-automated products have been developed in-house by some large market-research companies as proprietary solutions.

However, software solutions are not without their disadvantages too. They typically involve high set-up costs both in terms of software and time spent training personnel in their use. For many small to medium-sized market research companies the initial investment alone is too great to warrant the purchase of a specific software product for survey coding. Many products also have on-going maintenance fees and limits on the total number of survey responses that can be coded.

Time spent preparing the software to analyse a project’s data can also be onerous. The researcher must specify a host of different parameters, such as how the system should construct codes and the rules/exceptions that need to be followed. A researcher must also take the time to place different permutations of each word or phrase into the system to counter inexact matching which can commonly occur. The researcher needs to have the training and technical expertise to operate the software effectively and be consistently using these skills in order to maintain familiarity with its correct operation.

Automated systems work best with simple data which is rarely found in Market Research questionnaires. Thus, problems arise as true open-ended responses which are fully coded through an automated system will usually contain ambiguous, incorrect and ill-formed data if the results are not evaluated by a human eye. Unfortunately, no automated coding system is able to judge the subtle nuances, slang, abbreviations, spelling errors and emotion icons which arise in research verbatim.

Coding survey responses is the fine line between the subjective and the literal. Computer automated systems can save time in the long-term for firms handling large samples of data. However, they require trained personnel to operate and ‘eyeball’ responses as they are not good at determining context. Manual coding, although more time-consuming for projects exceeding 10,000 responses, captures more accurate and consistent research results.

For press and media enquiries please contact:

Rachel Hoy

T: 0207 940 8108



Language Connect delivers fast, accurate language translation services 24 hours a day