Tag Archives: visualisation

I made this project in frame of course Data Visualisation on Coursera.

The source data for visualisation is information about 612 flaws in the inspected object. Each flaw characterized by size : Height and Length in mm.

Two systems used to find flaws and some flaws were found by both systems (dataset «Matched») some flaws were found by each of the systems (datasets «System A only» and «System B only»).

My task is to visualise data proving that both systems successfully could find critical flaws (H > 2, L > 14).

So I i made two attempts to solve this task.

Variant 1 :: XY graph

I just made XY chart in L, H coordinate system for three categories. Result is shown below.


Then I showed critical area with a pink rectangle. So one can see that most part of matched indication are located in the critial area.

I used positions to show the distribution of indications, hue and shape to show categories. Shape can help to see overlapped points.

L axis was chosen as logarithmic to show area with low length values more accurate.

One problem of this visualisation is that it is hard to make quantitative analisis of data. I mean to evaluate: what part of flaws is critical, and is each of systems shows same performance in this area.

So I decided to make second variant.

Variant 2 :: Bars in the table

I transformed data to a table shown below. I chose four areas with variing H and L and calculated quantity of flaws in each area.

Size Matched System 1 only System 2 only Total
H <= 2; L <=14 131 43 46 220
H <= 2; L > 14 202 21 7 230
H > 2; L <=14 15 1 34 50
H > 2; L > 14 89 9 14 112
Total 437 74 101 612

Then I decided to show grafically that data separated in theese areas by table (2×2 matrix).

Each cell of table contains bar plot with three categories. Each bar position show the quantity of flaws in chosed area and hue shows category.

Critical area cell shown by red frame.


Here one can see that:

— most part of unmatched flaws located in noncritical area (espeсially in area with low heights) — this gives us an assumption that for area with H <= 2; L <=14 the probability of detection is low;

— system B is able to detect more high and short flaws  (or it overestimates the height).

I can see that is not the best solution too, but I spent a tremendous amounts of time to make this task and my head is blown up =)

All calculations and visualisation were performed with OpenOffice software.
I could not make visualisation in SVG