FORMATIVE USABILITY EVALUATION

© 1997 by Walter Maner (unless otherwise noted)
May be reproduced only for non-commercial educational purposes.

The outline below requires Netscape/Microsoft browsers, version 3.x or later, for correct viewing.

Revised March 15, 1997
Based partly on Hix and Harston, DEVELOPING USER INTERFACES and on
Nielsen, USABILITY ENGINEERING
  1. Back Top Next
    PICTURE

  2. Back Top Next
    DEFINITION

    An evaluation of an unfinished user interface, done about three times
    during each iterative design cycle, which aims to expose usability
    problems that exist in the current iteration.
    -
    Contrasts with "summative evaluation," which is done when the
    interface is complete, and with "human factors testing," which is
    done in a more carefully controlled research setting.
  3. Back Top Next
    QUESTIONS ANSWERABLE DURING FORMATIVE EVALUATION

    1. Are parts of the interface error-prone?
    2. Do some tasks take more time than expected?
    3. Do users find some tasks especially difficult?
    4. Does the interface violate common usability guidelines?
    5. Is there sufficient online help?
    6. What changes would users like to see?
    7. What gripes do users have?
    8. What mistakes do users make?
    9. Where are users likely to get stuck?
    10. Will users need a wizard (intelligent agent) to guide them through
      certain complex tasks?
  4. Back Top Next
    BENEFITS OF FORMATIVE EVALUATION

    1. May be done very early in the design process, when about 10% of
      the project resources have been expended
    2. May give the first solid measurements of task performance
    3. May help designers gain empathy for persons trying to use the
      software in real situations
    4. May help developers decide when the project can move on to the
      next stage
    5. May increase user interest and eventual acceptance of the final
      product
    6. May uncover problems that were not noticed during iterative
      prototyping
  5. Back Top Next
    STEP 1

    DESIGN THE EVALUATION.
    1. Set goals.
      1. DIAGNOSIS
        To determine whether any usability problems exist
      2. VERIFICATION
        To determine whether the design meets benchmarks and satisfies
        specified usability requirements
      3. VALIDATION
        To determine whether the design will be usable in practice by
        its intended users
    2. Identify desired inputs and outputs.
      1. Possible INPUTS
        1. Interface prototype
        2. Question list, if doing a structured interview
        3. Usability checklist, if doing heuristic evaluation
        4. Usability benchmark requirements
        5. Various testing scripts derived from task scenarios
      2. Possible OUTPUTS
        1. Individual test reports
        2. Aggregate or tabular data from sets of test reports
        3. Analysis of problems found
        4. Prioritized list of change requests
    3. Choose an evaluation strategy, which could include one or more of
      the following:
      1. Automatic event-level test data collection and statistical
        analysis by specialized testing software
      2. Professional review by a human-computer interaction expert
      3. Heuristic evaluation based on a detailed checklist derived from
        applicable GUI design principles or guidelines
      4. User survey
        1. User preference questionnaire,
          where each Participant ranks pre-selected interface features
          on an agree--disagree scale
        2. Structured interview,
          where each Participant is asked a pre-planned series of
          questions after the test session is complete
        3. Focus groups,
          where a trained facilitator leads a small group of
          participants through a pre-planned series of questions or
          issues.
      5. Scenario-based, script-driven testing
        1. Inputs
          • oral and/or written step-by-step, subtask-by-subtask
            instructions for the Participant
            -
            given in the form of a script derived from one of the
            task scenarios
          • written step-by-step, subtask-by-subtask instructions for
            the Evaluator
            -
            given in the form of a script, similar to the
            Participant's script, but including special testing
            instructions and space for recording data
        2. Outputs
          • either (a) the time Participant took to complete each
            subtask if testing performance (e.g., on benchmark tasks)
            or (b) the Participant's verbal protocol if Participant
            was asked to "talk aloud" during the test
          • log of errors that the Participant made on each subtask
            • any impasse that prevents completion of a subtask
            • any wrong turn that delays completion of a subtask
          • hints, if any, given to Participant by Evaluator on each
            subtask
          • final outcome of each subtask
            • abandoned after ___ seconds and ___ errors
            • completed after ___ seconds and ___ errors
            • completed after ___ seconds with no errors but with
              apparent difficulty
            • completed after ___ seconds with no errors and no
              apparent difficulty
          • comments from Participant
          • comments from Evaluator
    4. Choose an evaluator.
      1. For bias reduction, the Evaluator should not be a member of the
        development team.
      2. Evaluators should be receptive and open-minded persons,
        prepared to respectfully receive as much negative feedback as
        participants want to give.
    5. Choose test participants.
      1. Identify potential participants based on a profile of the
        target population.
      2. Divide participants into "usability classes" based on factors
        deemed relevant, such as ...
        1. experience with computers
        2. experience with similar systems
        3. experience with this system
      3. Employ three representative participants from each usability
        class.
      4. For subsequent rounds of testing, retain one participant from
        the previous round and employ two new participants.
    6. If not already done, perform user task analysis and, from this,
      build a hierarchical task model.
    7. If not already done, construct a representative task scenario for
      each KIND of high-level user task. "Obvious" tasks should not be
      excluded.
    8. Using representative task scenarios as a guide, create about a
      half-dozen test scripts.
      1. SIMILARITIES Between Scenarios and Scripts
        1. Both have the same starting place.
        2. Both point toward the same completion state or goal.
        3. Both contain a strongly ordered, integrated sequence of
          subtasks sufficient to achieve that goal.
        4. Both contain mid-level subtasks or subgoals.
      2. DIFFERENCES Between Scenarios and Scripts
        1. Scenarios mention low-level subtasks but scripts do not.
        2. Scenarios make reference to specific elements of the
          interface but scripts do not.
        3. Scenarios describe in detail HOW the subtask was
          accomplished but scripts only state WHAT the proposed
          subtask is.
        4. Scenarios describe what participants have done while scripts
          list what participants will be asked to do.
      3. EXAMPLE
        1. RIGHT for a Script (but too coarse-grained for a task
          scenario):
          Select "sugar" as an ingredient.
        2. RIGHT for a Scenario (but too fine-grained for a script):
          Open the drop-down list inside the box labeled "Ingredients"
          and then click on the "sugar" item.
  6. Back Top Next
    STEP 2

    DEVELOP A PROTOCOL FOR THE TEST SESSIONS.
    1. Decide how the script will be used.
      1. The Evaluator may give the entire script (the entire written
        list of subtasks) to the Participant.
        1. Not recommended
        2. May create pressure to finish or allow foreshadowing
          effects.
      2. The Evaluator may give written directions to the Participant
        one subtask at a time, as required.
        1. Recommended
        2. Works best for complex subtasks.
      3. The Evaluator may give oral directions to the Participant one
        subtask at a time, as required.
        1. Recommended
        2. Works best for simple subtasks.
    2. Determine whether the session will be conducted in the laboratory
      or in the field.
      1. Laboratory testing is often preferred for early- and mid-stage
        testing.
      2. Field testing is often preferred for late-stage testing.
    3. Determine whether the testing will be done an an early stage or at
      a later stage.
      1. Possible Hour-long Early-stage Protocol
        1. Set up (or reset) the test environment.
        2. Put Participant at ease, establish cooperative atmosphere.
        3. If not already done, witness the Participant's signing of
          the informed consent form.
        4. Give general instructions to the Participant, then prepare
          to play the role of co-evaluator.
        5. Run several scripts with Participant thinking aloud and the
          Evaluator prompting, allowing discussion, without time
          pressure.
        6. Interview the Participant following a set list of questions.
        7. Discuss the results with the Participant (debrief) and
          answer questions.
      2. Possible Hour-long Mid-stage Protocol
        1. Set up (or reset) the test environment.
        2. Put Participant at ease, establish cooperative atmosphere.
        3. If not already done, witness the Participant's signing of
          the informed consent form.
        4. Give general instructions to the Participant, then prepare
          to play the role of observer.
        5. Run two or three timed scripts.
          • Participant reads the next subtask aloud (or hears it
            read to them by the Evaluator)
          • As soon as the subtask has been read, the timer is
            started.
          • When task is done or abandoned, record time spent on
            task, number of errors and completion status.
        6. Allow free use of the system for 10 minutes, perhaps with
          the Participant talking aloud.
        7. Run two or three more timed scripts following the same
          protocol as above.
        8. Interview the Participant following a set list of questions.
        9. Discuss the results with the Participant (debrief) and
          answer questions.
      3. NOTES
        1. For consistency and repeatability, an instruction sheet is
          needed.
          • explains it is the software being tested, not the
            Participant (the interface is the "subject" of testing)
          • explains how long the test will take
          • explains the basic procedure to be followed
          • explains the rights the Participant has, including ...
            • the right to stop the test at any time
            • the right to confidentiality
          • explains what is being measured and why
          • explains why a non-disclosure agreement is necessary (if
            it is)
        2. For ethical and legal reasons, a sign-off sheet is needed.
          • to witness the fact that the Participant has made an
            "informed consent"
          • to witness the fact that the Participant has agreed not
            to disclose specified proprietary information
          • to witness the fact that the Participant has given up
            various rights to the products of the testing process
            (e.g., video footage)
        3. Other paperwork may be required depending on the situation.
          • survey forms
          • pre- and post-tests
          • heuristic evaluation checklists
    4. Determine whether to emphasize qualitative data or quantitative
      data.
    5. Identify tools to be used during testing.
    6. Design the forms to be used during testing.
      1. consent forms
      2. instruction sheets
      3. record forms
  7. Back Top Next
    STEP 3

    PILOT-TEST THE SESSION PROTOCOL(S).
    1. Follow the "early stage" protocol described above.
    2. Fix any problems that have been exposed.
      1. problems in the protocol
      2. problems in the script
      3. problems in the forms
  8. Back Top Next
    STEP 4

    CONDUCT THE ACTUAL EVALUATION SESSIONS.
    1. As a backup, start recording the session on videotape, aiming the
      camera so that it can "see" both the Participant's hands and the
      computer screen.
    2. Follow the session protocol.
    3. Observe the Participant as they go about their tasks.
    4. Log any "critical incident" that occurs and consider following up
      such incidents with one or two open-ended, non-leading questions.
      1. After NEGATIVE Incidents
        1. Are you having a problem?
        2. Are you stuck?
        3. Do you need a hint?
        4. Is that the result you wanted?
        5. Is this more difficult than it should be?
        6. What are you thoughts at this point?
        7. What are you trying to do?
        8. What did you think would happen?
      2. After POSITIVE Incidents
        1. Are you feeling more confident now?
        2. Was there a specific clue that allowed you to solve the
          problem?
        3. What made you think this approach would work?
    5. Take careful notes in real time. Contemporaneous note-taking is
      known to be more efficient that, say, taking notes afterward while
      watching the session on videotape.
    6. Thank and reward the Participant, possibly paying them minimum
      wage + $1.
  9. Back Top Next
    STEP 5

    INTERPRET THE DATA COLLECTED.
    1. Use all available data to identify interface problems.
      1. failure rates on particular tasks
      2. time required on particular tasks
      3. failed benchmarks
      4. unchecked items on usability checklists
    2. Characterize problems.
      1. According to CAUSE (if known)
      2. According to FREQUENCY OF OCCURRENCE
      3. According to SEVERITY
        1. CRITICAL
          • Critical problems include all those that make it
            impossible for the Participant to complete a task.
          • IMPLICATION: The product cannot be distributed with such
            a problem, not even to beta testers.
        2. SERIOUS
          • Serious problems include those that allow the Participant
            to suffer the damaging effects of a mistake that a better
            design might have prevented.
          • IMPLICATION: The product could, if necessary, be
            distributed with such a problem but only to beta testers.
        3. MINOR
          • Minor problems include those that cause the Participant
            to become momentarily distracted, confused or
            disoriented.
          • IMPLICATION: The product could, if necessary, be
            distributed to customers with this problem.
  10. Back Top Next
    STEP 6

    PREPARE A REPORT.
    1. Give a summary of findings, subtask by subtask, in a table under
      these headings:
      1. task description
      2. task completion status
        1. task abandoned ___ times after ___ seconds (average) and ___
          errors (average)
        2. task completed successfully ___ times in ___ seconds
          (average) after ___ errors (average)
        3. NOTE: If there is concern about statistical "outliers," the
          median and standard deviation may be reported as well.
    2. If usability performance requirements were developed for specific
      tasks or subtasks, give a summary of relevant findings in a table
      under these headings:
      1. general description of the task or subtask
      2. specific benchmark applied
      3. worst acceptable performance on this benchmark
        (maximum errors or maximum time)
      4. planned target performance on this benchmark
        (acceptable errors or acceptable time)
      5. best possible performance on this benchmark
        (minimum errors or minimum time)
      6. observed performance on this benchmark
        (actual errors or actual time)
    3. If heuristic evaluation was used, list areas where design fails
      applicable guidelines.
    4. List all problems exposed during testing in a table under these
      headings:
      1. description of the problem
      2. severity of the problem
      3. frequency of the problem
      4. possible remedy for the problem
    5. List changes needed to the interface in priority order,
      considering these possibly conflicting rules:
      1. Give a high priority to the more severe problems.
      2. Give a high priority to the more common problems.
      3. Give a high priority to problems that can be fixed cheaply.
      4. Give a high priority to problems that can be fixed quickly.
    6. Recommend changes.
    7. Review session videotapes and prepare a collage of short clips to
      support these change requests.