week-15-16:line score

Introduction

The goal is to reviste the scoring functions used by the algorithm.

We will, first, define some objects. Then, we will briefly remember where the lineScore function is use. Now, we have everything to start, we will look at different possibilities for score function and how to implement it. Then, we will see a way to compare the three most interesting functions. Finally, we will discuss a little bit of the code infrastructure needed for this line score evaluation.

 

Nomenclature

line: A string that go from one pin to another. (A line is implemented as list of pixel position.)

resultImage: The image that hold the result actual result of the algorithm. At begin this image is white, then at each step the new line is drawn.

diffImage: The diffImage is the virtual image produce by subtracting the actual resultImage from the original image. This image is virtual, because it can contains negatives values and we normally don't have negative value in grayscale images

sketchImage: The sketchImage is produce by subtracting the actual resultImage from the original image without going into the negatives values. So the minimal value is 0.

 

Context


The base algorithm evaluates at each step all the lines from the current pin to all other pins using a score function (We can view this as a cost function, but, here it seems more appropriate to use the term of score).
This score function takes as input a line and return the score of this line.

 

Functions

We have implement the following lineScore functions. You can find the pseudo code bellow. The names are the same as in the code but the implementation differs slightly. In particular, the diffImage doesn't "exist" as an object, but is computed inside the function. 

The first function, lineScore, was developt using the algorithm used by Christian Siegel (https://github.com/christiansiegel/knitter).  The other functions was completly developt by us.

 

Function: lineScore

Input: set of pixel P
Output: score obtains by this set of pixel

Algorithm:

For each pixel in P:
    color = pixel.getColorFromTheSketchImage()    //  encoded as RGB
    darkness = color.getDarkness()    // 255 - average of RGB
    score = score + darkness
    counter = counter + 1

finalScore = score / counter
return finalScore

Idea behind this score function:
If we choose the darker line in the sketch image. This line will be added in result image and the difference between the original image and the result image will be smaller.

 

Function: lineScoreEquilibrate

Input: set of pixel P
Output: score obtains by this set of pixel

Algorithm:

For each pixel in P:
    color = pixel.getColorFromTheSketchImage()  
    darkness = color.getDarkness()  -  0.3 * color.limit()  // color.limit() is the maximal darkness value
    score = score + darkness
    counter = counter + 1

finalScore = score / counter
return finalScore

Idea behind this score function and remarks:
The idea was to penalize the line that go thoough white pixels. So, we shifft, all the range of possible darkness values. But, logically, we obtain the same result. We, just, shiffted the score. 

 

 

Function: lineScoreSignedDifferenceBetweenOriginalAndResult

Input: set of pixel P
Output: score obtains by this set of pixel

Algorithm:

For each pixel in P:
    color = pixel.getColorFromDiffImage()  // we change the input image!
    darkness = color.getDarkness()    // The darkness is virtual and can be neagtif
    score = score + darkness
    counter = counter + 1

finalScore = score / counter
return finalScore

Idea behind this score function and remarks:
The idea was to penalize, also, the pixel that get darker than the orignal image. 

 

Function: lineScoreWeightedExtremity

Input: set of pixel P
Output: score obtains by this set of pixel

Algorithm:

maxFactor = 3

For each pixel in P:
    color = pixel.getColorFromTheDiffImage()    
    darkness = color.getDarkness()   // the value is between -255 and 255
    factor = 1/color.limit() * absoluteValue(darkness)    // the value is between 0 and 1
    factor = power(maxFactor, factor)  // this create an explonential function from 1 to maxFactor
    scrore = score + factor * darkness
    counter = counter + 1

finalScore = score / counter
return finalScore

Idea behind this score function and remarks :
The idea was to penelize or favorize more the value that are far of the original value.
 

 

Function: lineScoreDelta

Input: list of pixel L    // assume with an order
Output: score obtains by this set of pixel

Algorithm:

For each pixel in list L\{last} :    // We just don't take the last pixel of the list because, it hasn't a next pixel
    nextPixel = pixel.getNext()
    darkness1 = pixel.getDarkness()
    darkness2 = nextPixel.getDarkness()
    diff = absoluteValue(darkness1 - darkness2)
    score = score + diff
    counter = counter + 1

finalScore = score / counter
return finalScore

Idea behind this score function and remarks :
The human eyes are more aware of contrast than lightness. The idea was to use this fact and search for the line with the less contrast. But use alone, always the same lines are chossen.

 

Metrics

We have use two metrics to evaluate the different score functions. The first one is the absolute error of the image. The second one is human preference.

Absolute error: This metric is simply the sum of the absolute difference of darkness between the resultImage and the originalImage.

Human preference: I ask to six persons to choose for each image (abstract, city, .....) the version they prefer. The pictures used for this test are in the section "Best Score Images".

We like they idea to have two really different metrics. On one hand, absolute value error (AE) give us a really easy and fair way to evaluate the result. This can be interpreted as the distance of the result image from the original image. We may, also, use a mean square error (MSE) or an error function based on neighbor contrast, but we don't go in this direction for time reason. On the other hand, people preferences gives us a subjective metrics but, finally, the images will be viewed by human being.


Data used for evaluation

We have run the algorithm with the following three lineScore functions; lineScore, lineScoreSignedDifferenceBetweenOriginalAndResult, lineScoreWeightedExtremity.
We do not evaluate  lineScoreEquilibrate because it produce exactly the same result as lineScore and we do not evaluate lineScoreDelta because it need to be combine to not loop always over the same lines.

We use these tree lineScore functions on 7 images, form 0 to 30002 strings. We use a square randomize grid (but always the same!) to not have the effluence of the border and to not have the Moiré effect.

For each of the images and for each of the functions at each 1000 strings, we save an image and compute the absolute error.


Representation of data

To let the data be more representative. We produce charts and gif. The charts are based on absolute error for each image. We also want to view the differences of the two main categories of lineScore functions during the process. So, we produce gif that shows at each 1000 string the difference between the LineScore and the DiffOriResult function.

We put all of them at the end of this pages.

 

Analysis

Comparison between DiffOriRsult (D) and weightedExtremity3 (W)
First, we like to highlight the fact that D and W function are really close in the resulting images and in the underline scoring procedure. In fact, W is a try to tune the D function. We, just, give more weight to the extreme values.
Human preference for W or D is not clear (see table in section “Human preverence”).  The curve of the charts have the same shape (see charts in section “Charts of absolute error by images”).   Differentiation images are quite homogenous (sees images in section “Difference between scoring function at the minimal absolut value point”). So, we can state that W and D belong to the same category of scoring function and the comparison between these two scoring functions are not so relevant. Thus, for the rest of the analysis, we will focus on the comparison between lineScore and diffOriResult function.

Comparison between LineScore (L) and DiffOriRsult (D)
We see from the charts that process using D function are stabilize earlier than process using L function. This come from the fact, that all lines in D become bad earlier and the process starts to loop in a deterministic fashion. At opposite, L tries still to ameliorate the picture until the sketchImage is totally white.
Besides, the human strongly prefer function W and D over L. The W and D function together get 38 votes ant the function L only 4 votes.  The absolute error difference between functions D and L, at the minimal point, is less than 1 percent in every cases. This mean that globally the error is quite the same. If we take a look at an absolute error difference around 1 percent between two images using the same scoring function, we, just, observe than on is slightly lighter than the other. (See the images of section "One percent difference").  But, in the case of an absolute error difference of 1 percent between two images using other scoring function, the image seems more contrasted, more precise and clearly preferred by human.  If we make the differentiation of the two images, we observe that the differences are concentrate around white areas (see images of sections "Difference between linescore and diffOriResult at the minimal absolute value point"). This seems logical, because one D function cares about white areas and L function are more focused on respecting the black areas. Y
Finally, we can claim that our DiffOriResult function clearly improve the existing algorithm.    

 

 

Implemetation needed for lineScore evaluation:

 

 

 

Best Score Images: (Used for human test)

Images with the smallest absolut error for each series.

Left: LineScore    middle: DiffOriResult    Right: WheigtedExtremity


Abstract1, 27000 strings


City1, 30000 strings


dance1, 30000 strings


Elgerco1, 25000 strings


Starik1, 16 strings


Tree3, 15 strings


Tree4, 19 strings

 

 

Difference between scoring function at the minimal absolut value point:

The following images are the absolut error between lineScore and diffOriResult (left images) or between diffOriResult and wheigtedExtremity3 (right images)  and at the minimal absoulte error point. They was compute using compare of imagemagik. More precisly, they was compute using the following command line:

>>> compare "$file1" "$file2" -metric mae -fuzz 1% -compose src -quality 100 -highlight-color IndianRed4 -lowlight-color White $outname  

where $file1 = linescore image , $file2 = diffOriResult image for left image
where $line1 = diffOriResult image, $file2 = wheigtedExtremity3 for right image
 

 


abstract1, 2700 strings


City1, 30000 strings


dance1, 30000 strings


elgreco1, 25000


starik1, 16000 strings


tree3, 15000 strings


tree4, 19000 strings

 

 

 

Human preferences results:

IMAGE abstract city dance elgreco starik tree3 tree4
Person1 W D D W D D L
Person2 D D W D W L D
Person3 W D W W W D W
Person4 D W D W L L D
Person5 W W W D W D W
Person6 W W D D W D D

Total sum: 19 W, 19 D, 4 L
 

 

Charts of absolut error by images:

 

One percent difference with the same scoring function

These pictures have more or less a difference of one percent of absolute error value. They have simply 1000 string of difference, but otherwise they have the same scoring function.


Left: n:abstract1-s:25001-p:240-ao:9-do:9-sf:equilibrate-e:34134255.jpg
Right: n:abstract1-s:24001-p:240-ao:9-do:9-sf:equilibrate-e:35072259.jpg


Left: n:abstract1-s:25001-p:240-ao:9-do:9-sf:diffOriResult-e:33484740.jpg
Right: n:abstract1-s:24001-p:240-ao:9-do:9-sf:diffOriResult-e:34507232.jpg


Left: n:dance1-s:28001-p:240-ao:9-do:9-sf:equilibrate-e:22705721.jpg
Right: n:dance1-s:27001-p:240-ao:9-do:9-sf:equilibrate-e:24203691.jpg



Left: n:starik1-s:14001-p:240-ao:9-do:9-sf:diffOriResult-e:29942740.jpg
Right: n:starik1-s:13001-p:240-ao:9-do:9-sf:diffOriResult-e:31255929.jpg

 

 

 

Gif of differences between lineScore and diffOriResultLineScore:
 

For every image bellow:

Top Left : image using lineScore
Top Right: image using lineScore
Bottom Left: difference between lineScore and diffOriResult at each step
Bottom Right: image originale
 








image1: image using lineScore
image2: image using lineScore
image3: difference between lineScore and diffOriResult at each step
image4: image originale