1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
|
*** START OF THE PROJECT GUTENBERG EBOOK 3202 ***
Moby (tm) Thesaurus II Documentation Notes
This documentation, the software and/or database are:
Public Domain material by grant from the author, January, 2001.
Moby (tm) Thesaurus for the MSDOS operating system is compressed and
distributed as a single zip file. After extraction, the vocabulary
files included with this product are in ordinary ASCII format with
CRLF (ASCII 13/10) delimiters.
MOBY Thesaurus II CONTENTS
This file (3202-0.txt)
Unabridged Moby Main Thesaurus file
(https://www.gutenberg.org/files/3202/files/mthesaur.txt)
Roget 1911 (https://www.gutenberg.org/files/3202/files/roget13a.txt)
NOTE: Accents have been stripped from words, e.g., 'etude' does not
mark the accent on the initial 'e'.
Moby Thesaurus is the largest and most comprehensive thesaurus data
source in English available for commercial use. This second edition
has been thoroughly revised adding more than 5,000 root words (to
total more than 30,000) with an additional _million_ synonyms and
related terms (to total more than 2.5 _million_ synonyms and related
terms). Although this thesaurus is provided in a very simple ASCII
format suitable to viewing, editing, and automatic parsing, most
users will consider reformatting schemes to represent the data in a
more economical form, such as table of related terms whose index can
be shared by many roots. This is roughly the technique used by the
thesaurus in print form that has the large index coupled with the
synonyms under abstract (and arbitrary) headings in the front matter.
Tables of related terms can be stored in, for example, LZ compressed
form until actually required by the application. Combining such schemes
could easily reduce the storage requirement of this data by an order
of magnitude or more. The supplementary file, roget13a.txt, provides a
small thesaurus already organized in this form that you may wish to use
as a guide when developing your own categories of synonyms. Also, of
course, uncommon words can be stripped out according to the developer's
criterion, keeping only the core and most oftenly used information.
Once unarchived, the database format is flat-file ASCII: each record
(delimited from other records with a terminal carriage return/linefeed
[ASCII 13/10] character) is of the form:
(In this example, the root word is 'frill', which is always the first
word of the list. The synonyms and related words are listed in ASCII
alphabetical order after the root. Each entry, including the root,
is followed by a comma. The last entry in a record is followed by a
carriage return/linefeed [ASCII 13/10].)
frill, addition, adornment, amenity, beading, beauties, bedizenment,
binding, bonus, bordering, bordure, bravery, chiffon, clinquant,
colors, colors of rhetoric, crease, creasing, crimp, crisp, decoration,
dog-ear, double, double over, doubling, duplication, duplication
of effort, duplicature, edging, elegant variation, embellishment,
embroidery, enfold, expletive, extra, extra added attraction, extra
dash, extravagance, fat, featherbedding, festoons, figure, figure
of speech, filigree, filling, fillip, fimbria, fimbriation, fine
writing, finery, flection, flexure, floridity, floridness, flounce,
flourish, floweriness, flowers of speech, flute, fold, fold over,
folderol, foofaraw, frilliness, frilling, frills, frills and furbelows,
fringe, frippery, froufrou, furbelow, fuss, gaiety, galloon, gather,
gaudery, gewgaw, gilding, gilt, gingerbread, hem,infold, interfold,
jazz, lagniappe, lap over, lapel, lappet, list, lushness, luxuriance,
luxury, motif, needlessness, ornament, ornamentation, ostentation,
overadornment, overlap, padding, paste, payroll padding, plait, plat,
pleat, pleonasm, plica, plicate, plication, plicature, ply, premium,
prolixity, purple patches, quill, redundance, redundancy, ruche,
ruching, ruff, ruffle, selvage, showiness, skirting, something extra,
stuffing, superaddition, superfluity, superfluousness, tautology,
tinsel, trappings, trickery, trimming, trumpery, tuck, turn over,
twill, twist, unnecessariness, valance, verbosity, welt, wrinkle
[carriage return]
Part-of-Speech information is not stored with this thesaurus. A
separate file (mposp10.zip) available from Project Gutenberg by
the same author supplies a separate lexical database providing the
part(s)-of-speech for a large collection (>200,000) of English words
and phrases that can be used in conjunction with this list to supply
POS information if needed by the particular application.
Quick Start
1) Insure you have at least 26Mb of free disk space to hold the
contents of this zip file.
2) Create a directory to hold these files listed above.
3) On the PG Catalog page click on the selection "More Files". You
will see a "files.zip" folder in the list. Move this zipped folder
to your computer. On your computer open "files.zip", double click on
its "files" subdirectory and copy the contents into the destination
directory on your computer.
*** END OF THE PROJECT GUTENBERG EBOOK 3202 ***
|