FORMATIVE USABILITY EVALUATION

PICTURE
DEFINITION
QUESTIONS ANSWERABLE DURING FORMATIVE EVALUATION
BENEFITS OF FORMATIVE EVALUATION
STEP 1
STEP 2
STEP 3
STEP 4
STEP 5
STEP 6

© 1997 by Walter Maner (unless otherwise noted)
May be reproduced only for non-commercial educational purposes.
The outline below requires Netscape/Microsoft browsers, version 3.x or later, for correct viewing.

Revised March 15, 1997
Based partly on Hix and Harston, DEVELOPING USER INTERFACES and on
Nielsen, USABILITY ENGINEERING

PICTURE
DEFINITION
An evaluation of an unfinished user interface, done about three times
during each iterative design cycle, which aims to expose usability
problems that exist in the current iteration.
-
Contrasts with "summative evaluation," which is done when the
interface is complete, and with "human factors testing," which is
done in a more carefully controlled research setting.
QUESTIONS ANSWERABLE DURING FORMATIVE EVALUATION
1. Are parts of the interface error-prone?
2. Do some tasks take more time than expected?
3. Do users find some tasks especially difficult?
4. Does the interface violate common usability guidelines?
5. Is there sufficient online help?
6. What changes would users like to see?
7. What gripes do users have?
8. What mistakes do users make?
9. Where are users likely to get stuck?
10. Will users need a wizard (intelligent agent) to guide them through
  certain complex tasks?
BENEFITS OF FORMATIVE EVALUATION
1. May be done very early in the design process, when about 10% of
  the project resources have been expended
2. May give the first solid measurements of task performance
3. May help designers gain empathy for persons trying to use the
  software in real situations
4. May help developers decide when the project can move on to the
  next stage
5. May increase user interest and eventual acceptance of the final
  product
6. May uncover problems that were not noticed during iterative
  prototyping
STEP 1
DESIGN THE EVALUATION.
1. Set goals.
  1. DIAGNOSIS
    To determine whether any usability problems exist
  2. VERIFICATION
    To determine whether the design meets benchmarks and satisfies
    specified usability requirements
  3. VALIDATION
    To determine whether the design will be usable in practice by
    its intended users
2. Identify desired inputs and outputs.
  1. Possible INPUTS
    1. Interface prototype
    2. Question list, if doing a structured interview
    3. Usability checklist, if doing heuristic evaluation
    4. Usability benchmark requirements
    5. Various testing scripts derived from task scenarios
  2. Possible OUTPUTS
    1. Individual test reports
    2. Aggregate or tabular data from sets of test reports
    3. Analysis of problems found
    4. Prioritized list of change requests
3. Choose an evaluation strategy, which could include one or more of
  the following:
  1. Automatic event-level test data collection and statistical
    analysis by specialized testing software
  2. Professional review by a human-computer interaction expert
  3. Heuristic evaluation based on a detailed checklist derived from
    applicable GUI design principles or guidelines
  4. User survey
    1. User preference questionnaire,
      where each Participant ranks pre-selected interface features
      on an agree--disagree scale
    2. Structured interview,
      where each Participant is asked a pre-planned series of
      questions after the test session is complete
    3. Focus groups,
      where a trained facilitator leads a small group of
      participants through a pre-planned series of questions or
      issues.
  5. Scenario-based, script-driven testing
    1. Inputs
      - oral and/or written step-by-step, subtask-by-subtask
        instructions for the Participant
        -
        given in the form of a script derived from one of the
        task scenarios
      - written step-by-step, subtask-by-subtask instructions for
        the Evaluator
        -
        given in the form of a script, similar to the
        Participant's script, but including special testing
        instructions and space for recording data
    2. Outputs
      - either (a) the time Participant took to complete each
        subtask if testing performance (e.g., on benchmark tasks)
        or (b) the Participant's verbal protocol if Participant
        was asked to "talk aloud" during the test
      - log of errors that the Participant made on each subtask
        
        any impasse that prevents completion of a subtask
        
        any wrong turn that delays completion of a subtask
      - hints, if any, given to Participant by Evaluator on each
        subtask
      - final outcome of each subtask
        
        abandoned after ___ seconds and ___ errors
        
        completed after ___ seconds and ___ errors
        
        completed after ___ seconds with no errors but with
        apparent difficulty
        
        completed after ___ seconds with no errors and no
        apparent difficulty
      - comments from Participant
      - comments from Evaluator
4. Choose an evaluator.
  1. For bias reduction, the Evaluator should not be a member of the
    development team.
  2. Evaluators should be receptive and open-minded persons,
    prepared to respectfully receive as much negative feedback as
    participants want to give.
5. Choose test participants.
  1. Identify potential participants based on a profile of the
    target population.
  2. Divide participants into "usability classes" based on factors
    deemed relevant, such as ...
    1. experience with computers
    2. experience with similar systems
    3. experience with this system
  3. Employ three representative participants from each usability
    class.
  4. For subsequent rounds of testing, retain one participant from
    the previous round and employ two new participants.
6. If not already done, perform user task analysis and, from this,
  build a hierarchical task model.
7. If not already done, construct a representative task scenario for
  each KIND of high-level user task. "Obvious" tasks should not be
  excluded.
8. Using representative task scenarios as a guide, create about a
  half-dozen test scripts.
  1. SIMILARITIES Between Scenarios and Scripts
    1. Both have the same starting place.
    2. Both point toward the same completion state or goal.
    3. Both contain a strongly ordered, integrated sequence of
      subtasks sufficient to achieve that goal.
    4. Both contain mid-level subtasks or subgoals.
  2. DIFFERENCES Between Scenarios and Scripts
    1. Scenarios mention low-level subtasks but scripts do not.
    2. Scenarios make reference to specific elements of the
      interface but scripts do not.
    3. Scenarios describe in detail HOW the subtask was
      accomplished but scripts only state WHAT the proposed
      subtask is.
    4. Scenarios describe what participants have done while scripts
      list what participants will be asked to do.
  3. EXAMPLE
    1. RIGHT for a Script (but too coarse-grained for a task
      scenario):
      Select "sugar" as an ingredient.
    2. RIGHT for a Scenario (but too fine-grained for a script):
      Open the drop-down list inside the box labeled "Ingredients"
      and then click on the "sugar" item.
STEP 2
DEVELOP A PROTOCOL FOR THE TEST SESSIONS.
1. Decide how the script will be used.
  1. The Evaluator may give the entire script (the entire written
    list of subtasks) to the Participant.
    1. Not recommended
    2. May create pressure to finish or allow foreshadowing
      effects.
  2. The Evaluator may give written directions to the Participant
    one subtask at a time, as required.
    1. Recommended
    2. Works best for complex subtasks.
  3. The Evaluator may give oral directions to the Participant one
    subtask at a time, as required.
    1. Recommended
    2. Works best for simple subtasks.
2. Determine whether the session will be conducted in the laboratory
  or in the field.
  1. Laboratory testing is often preferred for early- and mid-stage
    testing.
  2. Field testing is often preferred for late-stage testing.
3. Determine whether the testing will be done an an early stage or at
  a later stage.
  1. Possible Hour-long Early-stage Protocol
    1. Set up (or reset) the test environment.
    2. Put Participant at ease, establish cooperative atmosphere.
    3. If not already done, witness the Participant's signing of
      the informed consent form.
    4. Give general instructions to the Participant, then prepare
      to play the role of co-evaluator.
    5. Run several scripts with Participant thinking aloud and the
      Evaluator prompting, allowing discussion, without time
      pressure.
    6. Interview the Participant following a set list of questions.
    7. Discuss the results with the Participant (debrief) and
      answer questions.
  2. Possible Hour-long Mid-stage Protocol
    1. Set up (or reset) the test environment.
    2. Put Participant at ease, establish cooperative atmosphere.
    3. If not already done, witness the Participant's signing of
      the informed consent form.
    4. Give general instructions to the Participant, then prepare
      to play the role of observer.
    5. Run two or three timed scripts.
      - Participant reads the next subtask aloud (or hears it
        read to them by the Evaluator)
      - As soon as the subtask has been read, the timer is
        started.
      - When task is done or abandoned, record time spent on
        task, number of errors and completion status.
    6. Allow free use of the system for 10 minutes, perhaps with
      the Participant talking aloud.
    7. Run two or three more timed scripts following the same
      protocol as above.
    8. Interview the Participant following a set list of questions.
    9. Discuss the results with the Participant (debrief) and
      answer questions.
  3. NOTES
    1. For consistency and repeatability, an instruction sheet is
      needed.
      - explains it is the software being tested, not the
        Participant (the interface is the "subject" of testing)
      - explains how long the test will take
      - explains the basic procedure to be followed
      - explains the rights the Participant has, including ...
        
        the right to stop the test at any time
        
        the right to confidentiality
      - explains what is being measured and why
      - explains why a non-disclosure agreement is necessary (if
        it is)
    2. For ethical and legal reasons, a sign-off sheet is needed.
      - to witness the fact that the Participant has made an
        "informed consent"
      - to witness the fact that the Participant has agreed not
        to disclose specified proprietary information
      - to witness the fact that the Participant has given up
        various rights to the products of the testing process
        (e.g., video footage)
    3. Other paperwork may be required depending on the situation.
      - survey forms
      - pre- and post-tests
      - heuristic evaluation checklists
4. Determine whether to emphasize qualitative data or quantitative
  data.
5. Identify tools to be used during testing.
6. Design the forms to be used during testing.
  1. consent forms
  2. instruction sheets
  3. record forms
STEP 3
PILOT-TEST THE SESSION PROTOCOL(S).
1. Follow the "early stage" protocol described above.
2. Fix any problems that have been exposed.
  1. problems in the protocol
  2. problems in the script
  3. problems in the forms
STEP 4
CONDUCT THE ACTUAL EVALUATION SESSIONS.
1. As a backup, start recording the session on videotape, aiming the
  camera so that it can "see" both the Participant's hands and the
  computer screen.
2. Follow the session protocol.
3. Observe the Participant as they go about their tasks.
4. Log any "critical incident" that occurs and consider following up
  such incidents with one or two open-ended, non-leading questions.
  1. After NEGATIVE Incidents
    1. Are you having a problem?
    2. Are you stuck?
    3. Do you need a hint?
    4. Is that the result you wanted?
    5. Is this more difficult than it should be?
    6. What are you thoughts at this point?
    7. What are you trying to do?
    8. What did you think would happen?
  2. After POSITIVE Incidents
    1. Are you feeling more confident now?
    2. Was there a specific clue that allowed you to solve the
      problem?
    3. What made you think this approach would work?
5. Take careful notes in real time. Contemporaneous note-taking is
  known to be more efficient that, say, taking notes afterward while
  watching the session on videotape.
6. Thank and reward the Participant, possibly paying them minimum
  wage + $1.
STEP 5
INTERPRET THE DATA COLLECTED.
1. Use all available data to identify interface problems.
  1. failure rates on particular tasks
  2. time required on particular tasks
  3. failed benchmarks
  4. unchecked items on usability checklists
2. Characterize problems.
  1. According to CAUSE (if known)
  2. According to FREQUENCY OF OCCURRENCE
  3. According to SEVERITY
    1. CRITICAL
      - Critical problems include all those that make it
        impossible for the Participant to complete a task.
      - IMPLICATION: The product cannot be distributed with such
        a problem, not even to beta testers.
    2. SERIOUS
      - Serious problems include those that allow the Participant
        to suffer the damaging effects of a mistake that a better
        design might have prevented.
      - IMPLICATION: The product could, if necessary, be
        distributed with such a problem but only to beta testers.
    3. MINOR
      - Minor problems include those that cause the Participant
        to become momentarily distracted, confused or
        disoriented.
      - IMPLICATION: The product could, if necessary, be
        distributed to customers with this problem.
STEP 6
PREPARE A REPORT.
1. Give a summary of findings, subtask by subtask, in a table under
  these headings:
  1. task description
  2. task completion status
    1. task abandoned ___ times after ___ seconds (average) and ___
      errors (average)
    2. task completed successfully ___ times in ___ seconds
      (average) after ___ errors (average)
    3. NOTE: If there is concern about statistical "outliers," the
      median and standard deviation may be reported as well.
2. If usability performance requirements were developed for specific
  tasks or subtasks, give a summary of relevant findings in a table
  under these headings:
  1. general description of the task or subtask
  2. specific benchmark applied
  3. worst acceptable performance on this benchmark
    (maximum errors or maximum time)
  4. planned target performance on this benchmark
    (acceptable errors or acceptable time)
  5. best possible performance on this benchmark
    (minimum errors or minimum time)
  6. observed performance on this benchmark
    (actual errors or actual time)
3. If heuristic evaluation was used, list areas where design fails
  applicable guidelines.
4. List all problems exposed during testing in a table under these
  headings:
  1. description of the problem
  2. severity of the problem
  3. frequency of the problem
  4. possible remedy for the problem
5. List changes needed to the interface in priority order,
  considering these possibly conflicting rules:
  1. Give a high priority to the more severe problems.
  2. Give a high priority to the more common problems.
  3. Give a high priority to problems that can be fixed cheaply.
  4. Give a high priority to problems that can be fixed quickly.
6. Recommend changes.
7. Review session videotapes and prepare a collage of short clips to
  support these change requests.