Show simple item record

dc.contributor.advisorΣταματάτος, Ευστάθιοςel_GR
dc.contributor.authorΧουβαρδάς, Ιωάννης - Γεώργιοςel_GR
dc.coverage.spatialΣάμοςel_GR
dc.date.accessioned2015-11-18T10:39:45Z
dc.date.available2015-11-18T10:39:45Z
dc.date.issued2006el_GR
dc.identifier.otherhttps://vsmart.lib.aegean.gr/webopac/FullBB.csp?WebAction=ShowFullBB&EncodedRequest=*AAmk*3D*D3w*10*C9*89*84*5D*5DJ*C9*197&Profile=Default&OpacLanguage=gre&NumberToRetrieve=50&StartValue=1&WebPageNr=1&SearchTerm1=2006%20.1.44709&SearchT1=&Index1=Keywordsbib&SearchMethod=Find_1&ItemNr=1el_GR
dc.identifier.urihttp://hdl.handle.net/11610/12497
dc.description.abstractAutomatic authorship identification offers a valuable tool for supporting crime investigation and security. It can be seen as a multi-class, single-label text categorization task. Automatic authorship identification depends on selecting stylisticfeatures that would capture an authors writing style independent of the content or genre of text. Character n-grams have been used successfully to represent text for stylistic purposes in literature.They seem to be able to capture nuances in lexical, syntactical, and structural level. To date character n-grams of fixed length have been used for authorship identification. In this thesis: we introduce a new approach for selecting variable length n-grams inspired by previous work for selecting variable-length word sequences. We propose the use of variable-length n-grams to represent the stylistic information of the documents to be classified. We explore the significance of digits as stylistic features for distinguishing between authors and show that an increase in performance can be achieved using simple text pre-processing. Using a subset of the new Reuters corpus, consisting of texts on thesame topic by 50 different authors, we show that the proposed feature selection method is at least as effective as information gain for selecting the most significant n-grams, although the feature sets produced by the two methods have few common members.el_GR
dc.language.isoenel_GR
dc.subjectΕπιλογή Χαρακτηριστικώνel_GR
dc.subjectΑναγνώριση συγγραφέαel_GR
dc.subjectFeature Selectionel_GR
dc.subjectAuthorship identificationel_GR
dc.subject.lcshIntegrated software
dc.titleN-gram Feature Selection for Authorship Identificationel_GR
heal.typemasterThesisel_GR
heal.academicPublisherΠανεπιστήμιο Αιγαίου. Σχολή Θετικών Επιστημών. Τμήμα Μηχανικών Πληροφοριακών και Επικοινωνιακών Συστημάτων. Τεχνολογίες και Διοίκηση Πληροφοριακών και Επικοινωνιακών Συστημάτων.el_GR
heal.academicPublisherIDaegeanel_GR
heal.fullTextAvailabilitytrueel_GR
dc.notes$aΗ εργασία έχει ψηφιοποιηθεί, αλλά ο συγγραφέας ΔΕΝ έχει ορίσει τα δικαιώματα πρόσβασης.el_GR


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record