Dating Articles With KL

How : Take a subset of articles from a year, consider this subset as a year in term of file and then compute the different metrics with this added "year".

 

With 15 articles taken from 1840, 1880, 1920, 1960, 1995

Best Est. : 1840, 0.031660477138007924                     Best Est. : 1840, 0.01932141594350566

Articles removed from the test set :

Best Est. : 1899, 0.5329287459661379                     Best Est. : 1840, 0.5597024069836072

 

Best Est. : 1880, 0.043447215324982254                     Best Est. : 1880, 0.020543887035207067

Articles removed from the test set :

Best Est. : 1880 0.46239086016971465                     Best Est. : 1880, 0.5223219897849468

 

Best Est. : 1920, 0.03990581959677868                     Best Est. : 1920, 0.022052117114189976

Articles removed from the test set :

Best Est. : 1920 0.45956701697224045                     Best Est. : 1920, 0.5159421530851702

 

Best Est. : 1960, 0.04071517376390723                     Best Est. : 1960, 0.025952373391739208

Articles removed from the test set :

Best Est. : 1960, 0.17596041743309723                     Best Est. : 1960, 0.41523532824370085

 

Best Est. : 1995, 0.047007380332804216                     Best Est. : 1995, 0.027078483784840415

Articles removed from the test set :

Best Est. : 1995, 0.1913952267788147                     Best Est. : 1995, 0.24046060197740382

And here are the results with trying to represent a year from 15 articles :

 

Best Est. : 1847, 0.6293087038542783                     Best Est. : 1850, 0.6886133376755331

Best Estimation : 1847, 0.6932575433754864                     Best Est. : 1850, 0.7003421853715898

 

Best Est. : 1850, 0.6632945152618476                     Best Est. : 1850, 0.7223024234233494

 

Best Est. : 1850, 0.7382428795259338                     Best Est. : 1847, 0.7431473183454298

 

Best Est. : 1850, 0.7958442463528926                     Best Est. : 1847, 0.7685063648973741