talkie-lm/talkie: talkie is a vintage language model from 1930
talkie is an inference library for the talkie 13B language model family developed by Alec Radford, Nick Levine, and David Duvenaud.
talkie-1930-13b-base is a 13b language model trained on pre-1931 English-language text.
talkie-1930-13b-it has been instruction-tuned using a novel instruction-following dataset built from pre-1931 reference works including etiquette manuals, letter-writing manuals, encyclopedias, and poetry collections. It has also undergone reinforcement learning using online DPO to improve instruction-following capabilities.
We also provide a 'modern' base model, talkie-web-13b-base, with the same architecture and training FLOPs as talkie-1930, but trained on FineWeb, to allow for controlled comparisons between modern and vintage models. Note that we need to be careful about the claims we make contrasting the behavior and capabilities of the models, because temporal coverage is not the only difference in the pretraining corpora. For example, the distribution of subject matters differs significantly.