SQLite

As an alternative to MySQL, we can also use SQLite, which is more efficient in non-distributed installations. Moreover, it is lightweight and non-intrusive (no installation required, only unzip one file). The command-line shell version can be downloaded here.

Assuming that you extracted sqlite3 in <crosswords>/db/, open a command prompt there. Extract the dictionary and adjacency matrix in the same directory.

First start sqlite3 with a new empty database "cw":

sqlite3 cw

Then, create the tables:

create table Words (
    wid int not null,
    word varchar(200) not null,
    primary key (wid),
    unique (word)
);

create table Neighbors (
    widfrom int not null,
    widto int not null,
    weight float not null,
    primary key (widfrom, widto),
    foreign key (widfrom)
        references Words(wid)
        on update cascade
        on delete cascade,
    foreign key (widto)
        references Words(wid)
        on update cascade
        on delete cascade
);

Then, import CSV data. This will take a while (15 minutes). After that, you can safely delete the CSV files to free some memory on your disk.

.separator ","
.import index Words
.import adjacency Neighbors

Let's create an index on Neighbors to speed up queries (this might take some time too):

create index n_from on Neighbors(widfrom);
create index n_to on Neighbors(widto);

The database is ready to be handle queries. You can check the size of tables:

select count(*) from Words union select count(*) from Neighbors;

Which will print:

618236
32053595

You can also filter words using some pattern:

select word
from Words
where word like 'IRON%'
limit 10;

Which should print the following:

IRON
IRON 54
IRON 56
IRON 57
IRON 58
IRON AGE
IRON BACTERIUM
IRON BLOOM
IRON CAGE
IRON CARBIDE

Let's get words that are associated to "cast" and "iron":

select word, score
from (
    select widfrom, sum(weight) as score
    from (
        select wid
        from Words
        where word in ('CAST', 'IRON')
    ) Inputs
    inner join Neighbors on wid = widto
    group by widfrom
    order by score desc
) Outputs
inner join Words on wid = widfrom
limit 20;

The following result should be displayed:

CAST IRON|2.0
CASTIRON|2.0
CAST|1.6
IRON|1.6
AGA COOKER|1.2
BE|1.2
BESSEMER STEEL|1.2
BLOOMING|1.2
CHILL|1.2
CLOUT|1.2
DUCTILE IRON|1.2
FISHBELLY|1.2
GIRYA|1.2
JAIL LOCK|1.2
KARAHI|1.2
KETTLEBELL|1.2
LANCE|1.2
OUT|1.2
PEARLITE|1.2
PIG|1.2

We can also filter according to some known pattern, for instance if we are trying to solve a crossword. Let's find what "eat mouse" and start by "c":

select word, score
from (
    select widfrom, sum(weight) as score
    from (
        select wid
        from Words
        where word in ('EAT', 'MOUSE')
    ) Inputs
    inner join Neighbors on wid = widto
    group by widfrom
    order by score desc
) Outputs
inner join Words on wid = widfrom
where word like 'C%'
limit 10;

You should get the following:

CHURCHMOUSE|1.0
CROWEATER|1.0
CAN|0.7
CAP|0.7
CAT|0.7
CATERPILLAR|0.7
CHEESE|0.7
CHURCH|0.7
CLICK|0.7
COLOR|0.7

We can find equivalents to some word:

select word
from (
    select widto
    from (
        select wid
        from Words
        where word = 'BE'
    ) Inputs
    inner join Neighbors on wid = widfrom
    where weight = 1
) Outputs
inner join Words on wid = widto
order by word;

We get the following, which is obviously not optimal:

AM
ARE
BE
BEED
BEEN
BEER
BEES
BEEST
BEING
BENGKULU
BERLIN
BES
GERMANY
INDONESIA
IS
WERE

To use it from Scala, Xerial is a JDBC driver for SQLite, which can be added by SBT:

libraryDependencies += "org.xerial" % "sqlite-jdbc" % "3.8.9.1"

On the server, we can configure Scala Play to use SQLite instead of MySQL. Note that no password is required.

db.default.driver=org.sqlite.JDBC
db.default.url="jdbc:sqlite:../db/cw"

 

Crossword and Items

To use the crossword page of the web application, you need to import crossword.csv and items.csw :

create table Crosswords (
    cwid int not null,
    source varchar(50),
    lang varchar(5),
    title varchar(50),
    url varchar(100),
    author varchar(50),
    cwdate date,
    difficulty int,
    primary key (cwid)
);
 
create table Items (
    cwid int not null,
    wid int not null,
    xcoord int not null,
    ycoord int not null,
    clue varchar(1000) not null,
    direction varchar(10) not null,
    primary key (cwid, xcoord, ycoord, direction),
 
    foreign key (cwid)
        references Crosswords(cwid)
        on update cascade
        on delete cascade,
    foreign key (wid)
        references Words(wid)
        on update cascade
        on delete cascade
);
 
.separator ","
.import crosswords.csv Crosswords
.import items.csv Items
 
create index item on Items(cwid);