---
title: Improving Data Quality via Pre-Task Participant Screening in Crowdsourced GUI Experiments
tags: 
author: [Nakamura Laboratory (Meiji University)](https://docswell.com/user/nkmr-lab)
site: [Docswell](https://www.docswell.com/)
thumbnail: https://bcdn.docswell.com/page/3JK9WNMLJD.jpg?width=480
description: Improving Data Quality via Pre-Task Participant Screening in Crowdsourced GUI Experiments by Nakamura Laboratory (Meiji University)
published: April 20, 26
canonical: https://docswell.com/s/nkmr-lab/KN7LVR-2026-04-20-091327
---
# Page. 1

![Page Image](https://bcdn.docswell.com/page/3JK9WNMLJD.jpg)

Improving Data Quality via Pre
- Task Participant Screening
in Crowdsourced GUI Experiments
Pre - task
Main task
Screening improves model fit (
Takaya Miyama , Satoshi Nakamura (Meiji University)
Shota Yamanaka ( LY Corporation )
R² )
1


# Page. 2

![Page Image](https://bcdn.docswell.com/page/LE3W1VY6E5.jpg)

Background:
Crowdsourced GUI experiments
Advantages
• Fast recruitment: &gt; 1,000 participants in a few hours
• Large samples: help evaluate performance models and rare events (e.g., pointing errors) [1].
Disadvantages
• Low observability:
inattentive/nonconforming behavior
can reduce data quality [2].
• Results may differ from lab:
faster but less accurate performance [3] → distort model evaluation.
Need a way to screen out
inattentive/nonconforming participants
for reliable model evaluation.
[1] Yamanaka, HCOMP 2021, [2]
Brühlmann +, Methods in Psychology,
2020 , [3] Findlater+, CHI 2017
2


# Page. 3

![Page Image](https://bcdn.docswell.com/page/8EDKX85M7G.jpg)

What does “inattentive/nonconforming” look like?
Conforming
Partially conforming
Highly nonconforming
careful, accurate
faster, less accurate
minimal effort, random actions
3


# Page. 4

![Page Image](https://bcdn.docswell.com/page/V7PKP8GQJ8.jpg)

What does “inattentive/nonconforming” look like?
Conforming
careful, accurate
4


# Page. 5

![Page Image](https://bcdn.docswell.com/page/2JVV2N6PJQ.jpg)

What does “inattentive/nonconforming” look like?
Partially conforming
faster, less accurate
5


# Page. 6

![Page Image](https://bcdn.docswell.com/page/5EGLRKNQJL.jpg)

What does “inattentive/nonconforming” look like?
Highly nonconforming
minimal effort, random actions
6


# Page. 7

![Page Image](https://bcdn.docswell.com/page/4JQYVNQW7P.jpg)

What does “inattentive/nonconforming” look like?
Conforming
Partially conforming
Highly nonconforming
careful, accurate
faster, less accurate
minimal effort, random actions
The same task, very different data quality.
→ Need screening before the main task.
7


# Page. 8

![Page Image](https://bcdn.docswell.com/page/K74WMGN1E1.jpg)

Approach: Pre
- task screening before the main task
• Run a pre - task first; only
• Screen out
passing participants
inattentive/nonconforming
proceed to the main task.
participants
(not selecting top performers).
Pre - task
Main task
Screen out
non - passing participants
All participants start here.
.
Only passing participants
proceed.
8


# Page. 9

![Page Image](https://bcdn.docswell.com/page/LJ1Y8D55EG.jpg)

Pre - task: Size - adjustment
Resize the on - screen card image to match a physical card [4].
• Brief: &lt; 10 seconds on average
• Task - relevant : accurate operation is relevant to GUI tasks (e.g., pointing).
• Screening rule : Use the size - adjustment error between the on
- screen card and physical card.
→ passing
if below threshold,
non - passing
otherwise.
size - adjustment error
[4] Li+, Scientific Reports, 2020
9


# Page. 10

![Page Image](https://bcdn.docswell.com/page/GJWGZY5W72.jpg)

Evaluation overview
1. Crowdsourced experiment
(data collection)
2. Simulation: test whether the screening improves
Pre - task
: size - adjustment (pre - task) → pointing (main task).
model fit.
Main task
10


# Page. 11

![Page Image](https://bcdn.docswell.com/page/4EZL1XN273.jpg)

Crowdsourced experiment : Pre
- task (size - adjustment)
Resize the on - screen card image to match a physical card.
• Reference card:
ISO/IEC 7810 ID - 1 (e.g., credit, ID, transit cards); match the short side (53.98 mm)
• Device: iPhone - only (7+); infer device PPI, convert
.
px → mm .
• Measure: absolute size - adjustment error (mm).
size - adjustment error
11


# Page. 12

![Page Image](https://bcdn.docswell.com/page/Y76WL4967V.jpg)

Crowdsourced experiment : Main task
(pointing)
Tap the two targets alternately.
• Design:
• W (mm): 9 levels (2.0, 2.8, 3.6, 4.4, 5.2, 6.0, 6.8, 7.6, 8.4
• Trials: 360 per participant
• Measures:
)
movement time ( MT ) and error rate ( ER ) → model fit ( R² )
30 mm
W
12


# Page. 13

![Page Image](https://bcdn.docswell.com/page/G75M1QN574.jpg)

Crowdsourced experiment: Data collection
• Platform: Yahoo! Crowdsourcing (no pre
• Participants:
- screening)
N = 519 analyzed
• Time: 5 min 27 s on average
Pre - task
Main task
13


# Page. 14

![Page Image](https://bcdn.docswell.com/page/9J291P5GER.jpg)

Crowdsourced experiment: Pre
- task outcome
• 310 (60%)
had ≤ 2 mm error ( likely passing , conforming ).
• 143 (28%)
had ≥ 10 mm error ( likely non - passing , highly inattentive/nonconforming
).
14


# Page. 15

![Page Image](https://bcdn.docswell.com/page/DEY4Z5KGJM.jpg)

Crowdsourced experiment: Pre
- task outcome
• 310 (60%)
had ≤ 2 mm error ( likely passing , conforming ).
• 143 (28%)
had ≥ 10 mm error ( likely non - passing , highly inattentive/nonconforming
).
The pre - task outcome is continuous (no single cutoff)
→ evaluate screening under different threshold values.
15


# Page. 16

![Page Image](https://bcdn.docswell.com/page/VJNY3NR878.jpg)

Simulation: Does screening improve model fit?
If the pre - task can screen out participants likely to be nonconforming,
mixing more non - passing participants should reduce model fit in the main task.
Parameters:
• N: simulated sample size (
N = 80 )
• T (mm) : threshold on the pre
- task outcome for defining
• X (%) : ratio of non - passing
participants mixed into the sample (
passing
/ non - passing
(T = 1 –10, step 1)
X = 0 –100% , step 10).
Models:
• 𝑀𝑇 = 𝑎 + 𝑏 ∙ log 2
• 𝐸𝑅 = 1 − erf
𝐴
+1
𝑊
𝑊
2 2𝜎𝑦
[5] Fitts, Journal of Experimental Psychology,
[5]
[6]
1954, [6] Yamanaka +, ISS 2020
16


# Page. 17

![Page Image](https://bcdn.docswell.com/page/YE9P9RMXJ3.jpg)

Simulation results: How to read the R² heatmaps
• Each cell shows
R² for a ( T , X ) pair.
• Right: X↑ (more non - passing mixed) /
Down: T↑ (less strict screening).
Non - passing
Threshold T (mm)
ratio X (%)
0%
50%
100%
●●●●● ●●●●●
●●●●● ◆◆◆◆◆
◆◆◆◆◆ ◆◆◆◆◆
high
low
17


# Page. 18

![Page Image](https://bcdn.docswell.com/page/GE8D9WL9ED.jpg)

Simulation results: ER model fit (R²)
• Clear degradation:
R² drops as X↑ (more non - passing mixed) and
• Best fit: the top - left cell ( T = 1mm, X = 0% ).
Non - passing
Threshold T (mm)
T↑ (less strict screening).
ratio X (%)
0%
50%
100%
●●●●● ●●●●●
●●●●● ◆◆◆◆◆
◆◆◆◆◆ ◆◆◆◆◆
high
low
18


# Page. 19

![Page Image](https://bcdn.docswell.com/page/LELMWNLM7R.jpg)

Simulation results: ER model fit (R²)
• Clear degradation:
R² drops as X↑ (more non - passing mixed) and
• Best fit: the top - left cell ( T = 1mm, X = 0% ).
Non - passing
Threshold T (mm)
T↑ (less strict screening).
ratio X (%)
0%
50%
100%
●●●●● ●●●●●
●●●●● ◆◆◆◆◆
◆◆◆◆◆ ◆◆◆◆◆
0.989
high
R² = 0.989
R² = 0.853
0.853
low
19


# Page. 20

![Page Image](https://bcdn.docswell.com/page/4JMY9XM6JW.jpg)

Simulation results: ER model fit (R²)
• Clear degradation:
R² drops as X↑ (more non - passing mixed) and
• Best fit: the top - left cell ( T = 1mm, X = 0% ).
Non - passing
Threshold T (mm)
T↑ (less strict screening).
ratio X (%)
0%
50%
100%
●●●●● ●●●●●
●●●●● ◆◆◆◆◆
◆◆◆◆◆ ◆◆◆◆◆
0.989
high
R² = 0.989
R² = 0.853
Keeping T strict and X small improves model fit, reducing the risk of
misleading model evaluation due to nonconforming data.
0.853
low
20


# Page. 21

![Page Image](https://bcdn.docswell.com/page/PJR9GNVN79.jpg)

Simulation results: MT model fit (R²)
• Limited degradation:
→ accuracy of the pre
R² drops as X↑ and T↑ too, but the change is
smaller than for ER .
- task operation is more clearly reflected in
tap failure (ER) than in speed (MT) .
Non - passing
Threshold T (mm)
ratio X (%)
0%
50%
100%
●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●
high
low
21


# Page. 22

![Page Image](https://bcdn.docswell.com/page/PEXQXNZ8JX.jpg)

Conclusion &amp; future work
Main points:
• A brief pre - task (&lt; 10 s size adjustment) enables screening using
• Strict screening
and less nonconforming data
only pre - task outcomes
improve data quality (model fit,
.
R² ).
Limitations:
• May miss participants who are
• Choosing threshold is a trade
conforming
in the pre - task but nonconforming
- off (stricter → fewer
nonconforming data
in the main task
.
, smaller N).
Next:
• Compare with traditional methods (e.g., gold tasks, attention checks).
• Test whether the screening can be applied to other GUI tasks (e.g., dragging, steering, crossing).
I would appreciate it if you could ask questions slowly and in simple English.
22


