Fluid Geopoetics Data Project.pdf

May 22, 2017 | Autor: Natalia Boyarskaya | Categoria: Russian Literature, Digital Humanities
Share Embed


Descrição do Produto

   

Fluid  geopoetics  DATA  project Natalia  Boyarskaya  and  Mikhail  Maiatsky   UNIL   EPFL   [email protected] and [email protected]  

   

  Abstract   The   Fluid   geopoetics   DATA   is   a   collaborative   interdisciplinary   research   project,   uniting  efforts  of  the  experts  in  literary  studies,  philosophy,  and  geography  and  in   computer   science.   Its   goal   is   to   examine   an   underexplored   category   of   "negative   geopoetics"   in   Russian   literature   (XIX-­‐XXI   centuries).   The   project   resulted   in   creation   of   a   knowledge   database   that   allows   identifying,   examining,   and   classifying   "no-­‐places"   and   negatively   described   places   in   the   Russian   text   of   this   period.   A   platform   of   collaborative   exploration,   using   modern   methods   of   automated   text   processing   has   been   developed   and   is   made   available   for   the   community.  This  paper  describes  technical  development  of  the  project.       KEYWORDS:   Data   in   literary   studies,   user-­‐curated   and   dynamic   ontology,   negative   geopoetics,   semantic   clouds,   conceptualization   and   categorization,   ontology-­‐organized  knowledge  base,  key  phrase.    

1 Intoduction   The  Fluid  geopoetics  DATA  is  a  collaborative  interdisciplinary  research  project,   uniting   efforts   of   experts   in   literary   studies,   philosophy,   and   geography   from   the   University   of   Lausanne   (UNIL   team,   led   by   Prof.   Anastasia   de   la   Fortelle)   and   computer   scientists   from   Swiss   Federal   Institute   of   Technology   in   Lausanne   –   (EPFL   team,   led   by   Prof.   Karl   Aberer).   The   project   was   launched   in   February   2015   in   the   framework   of   the   CROSS   programme,   that   supports   collaborations   between   researchers   in   humanities   at   UNIL   and   specialists   in   sciences   and   engineering  at  EPFL.    

1

This   project   continues   the   classical   approach   of   geopoetics,   to   the   notion   of   negative   geopoetics1.   The   latter   brings   to   light   an   underexplored   category   of   objects   in   Russian   culture   and   literature   during   the   XIX-­‐XXI   centuries:   local   spaces  that  were  previously  marginalized  (as  not  suitable  for  the  perspective  of   the  centrally-­‐placed  observer);  the  places  that  refuse  any  cultural  markers,  and   resist   to   any   (mis)use   by   a   global   mythology.   The   negative  geopoetics   pretends   to  examine,  describe  intertextually  and  classify  the  landscapes  whilst  focusing  on   splits   and   rifts,   such   as   catastrophes,   revelations   of   the   Other   and   of   the   irreducible:   waste-­‐lands;   landscapes   of   catastrophes   –   natural   (falling   meteors)   or   sociogenic   (Tchernobyl);   abandoned   GULAG   camps;   various   types   of   wetlands;   junk-­‐yards,   waste   deposits,   squats,   abandoned   houses   (occupied   by   migrants,   tramps   or   homeless),   short-­‐stay   places   (flophouses,   camping   sites,   refugee  camps)  and  “no-­‐places”  (“a  space  in  which  neither  identity,  nor  relation,   nor  history  is  symbolized”2).  The  current  project  is  based  on  the  previous  studies   «Obscure   territories   and   "negative   geopoetics”   (“Territoires   obscurs   et   “géopoétique  negative”)»  (UNIL,  2014-­‐2015).     This   project   also   stems   from   and   is   based   upon   the   on-­‐going   SNSF   Sinergia   project   at   EPFL   –   «Crowdsourced   conceptualization   of   complex   scientific   knowledge   and   discovery   of   discoveries».   The   Sinergia   project   is   a   collaboration  between  physicists,  complexity  scientists  and  computer  scientists.   Its   main   focus   is   the   development   and   advance   of   automated   methods   of   information  management  in  natural  sciences.  Its  results  are  implemented  in  the   ScienceWISE  platform  (http://ScienceWISE.info)  that  allows  importing,  storing   and  searching  of  scientific  data,  and  provides  a  semantic  recommender  system.   The  ScienceWISE  platform  allows  a  community  of  scientists,  working  in  a  specific   domain   to   generate   dynamically   as   a   part   of   their   everyday   work   a   web-­‐based   interactive   semantic   environment   for   Science,   consisting   of   highly   structured   meta-­‐data   directly   connected   to   the   body   of   research   papers.   The   ScienceWISE   Ontology   underpins   the   whole   system   and   is   the   result   of   the   combination   of   automated   tools   and   a   large   crowdsourcing   effort.   The   system   automatically   splits  large  collections  of  texts  into  hierarchy  of  research  topics.     For   the   humanity   scholars   it   was   important   to   adapt   the   automated   tools   of   analysis  to  their  own  big  collection  of  texts,  relevant  for  negative  geopoetics;  for   the  computer  scientists  the  Fluid  geopoetics  DATA  project  gave  a  novel  use-­‐case   of  an  application  of  semantic  technologies  and  methods  of  complexity  science  to   1

See: Forquenot   de   La   Fortelle,   A.   “Sur   quelques   étrangers   exotiques   dans   la   prose   contemporaine  russe”/  Exotismes  dans  la  culture  russe  (Études  de  Lettres,  n°  283,  UNIL,  2009,  p.   253-­‐262;  Vinogradova  A.  “Les  espaces  de  la  marginalité  dans  la  littérature  russe  actuelle".  2010;   Coldefy-­‐Faucard   :   Coldefy-­‐Faucard   A.,   “La   tentation   de   l’Arctique   chez   Boris   Pilniak”,   Exotismes   dans  la  culture  russe,   Études  de  Lettres,   n°   283,   UNIL,   2009,   p.   217-­‐226;   Coldefy-­‐Faucard   2010:   Coldefy-­‐Faucard   A.,   «  Géographie   du   mythe  »,   Revue  des  Deux  mondes,   Paris,   octobre-­‐novembre;   Nadtochiy  2016,  Edouard,  «Χωρα,  Snuff,  Obscure  Territories»  (in  Russian)  //  «Sinij  divan»,  2016   (№  20),  s.  43-­‐60. 2 Augé, M. Non-places : Introduction in anthropology of supermodernity. London : Verso,1995.

2

the  field  of  humanities.       The   main   common   objective   of   this   project   was   a   conceptualization   of   the   corpora   of   the   documents,   prepared   as   a   part   of   the   interdisciplinary   UNIL’s   project  «Obscure  territories  and  “negative  geopoetics”».    For  this  purpose  we   needed  to  build:         • a  representative  and  comprehensive  corpus  of  negative  geopoetics   data  and   • a   high-­‐quality   ontology   of   geopoetical   concepts   that     represent   the   field  and  is  justified  by  usage.    

2 Technical  development   2.1 ScienceWISE  platforme   The   ScienceWISE.info   allows   scientists   to   reorder   daily   new   articles   according   to   their   personal   interests,   such   that   the   most   interesting   articles   appear   first;   bookmark   and   annotate   this   articles   using   scientific   ontology;   create   and   organize   personal   literature   collections,   perform   semantic   search   for   scientific   literature.   The   ScienceWISE   platform   (Fig.   1)   includes   a   number   of   elements:   (1)  an   expanding   collection   of   field-­‐specific   expert-­‐community-­‐ranked   encyclopedia   articles   (mostly   on   physics);   (2)   an   ontological   structure   (concepts   and   logical   relations   between   them)   encompassing   this   encyclopedia;   (3)   established   connections   of   ontology   entries   to   a   vast   collection   of   research   papers;   (4)   an   operational   platform,   allowing   scientists   to   annotate   and   conceptually   index   (bookmark)   the   research   papers,   link   them   against   the   ontology,   validate   and   dynamically  update  the  ontology  through  annotation,  etc.       The  ScienceWise   is  more  than  a  simple  platform  for  entering  or  organizing  the   information.  It  performs  some  reasoning  on  top  of  the  existing  ontology,  simple   disambiguation   of   concepts,   and   provides   tools   to   describe   semantic   relations   (Fig.  2).  The  system  itself  consolidates  all  local  inputs  into  the  current  ontology   and  creates  a  comprehensive,  global  and  dynamic  knowledge  system.          

3

  Fig.  1.  High-­‐level  architecture  of  ScienceWISE  

 

Fig.  2.  Visual  representation  of  the  concept  in  the  ontology,  together  with  its  patterns  category  (top),   semantic  relations  to  the  other  concepts  (left  and  right  arrows)  and  alternative  definitions  (bottom).  

 

2.2 Fluid  geopoetics  DATA  as  a  new  project  of  humanitarian  branch   of  ScienceWISE  platform     The  first  application  of  the  ScienceWISE  platform  in  the  field  of  humanities  was   an  attempt  to  build  The  Digital  humanities  ontology,  using  the  archives  of  Digital   Humanities   journals   and   the   papers   of   participants   of   the   International   Conference   which   took   place   in   July   2014   in   Lausanne   (https://dh2014.org).   This   project   inherited   the   principles   and   model   of   the   organisation   of   scientific   information  from  natural  sciences.     Fluid  geopoetics  DATA   project   makes   a   new   step   toward   the   Digital   Humanities   research.   The   major   challenge   of   the   project   concerns   the   database   of   the   texts   4

themselves,   which   contains   not   papers   and   articles   but   the   fiction   and   literary   texts.   Successful   adaptation   of   the   Fluid   geopoetics   DATA   to   the   ScienceWISE   infrastructure   demands   for   additional   tools   of   semantic   and   linguistic   analysis   that  correspond  better  to  the  richness  and  diversity  of  the  literary  language  (in   red,  Fig.  3).    

  Fig.  3.  Negative  Geopoetics  DATA  /  ScienceWise  integration  schema.    

2.2.1 Creation  of  the  corpus  of  texts     The   collection   of   the   texts   of   Fluid   geopoetics   DATA   is   based   on   the   Russian   e-­‐ library  (Lib.ru)  and  it  stays  open  for  new  additions.  It  also  contains  a  corpus  of   texts   in   English   from   the   Gutenberg   project   (Gutenber.org).   This   compilation   allows   us   to   demonstrate   that   the   same   semantic   tools   are   applicable   to   collections   in   various   European   languages.   The   database   contains   more   than   2000  authors  and  27400  texts  in  Russian,  and  6700  authors  and  14500   texts  in   English.  The  authors  can  be   ordered   according  to  the  document  frequency  of   use   of   the   concepts   of   our   ontology   (which   therefore   takes   into   account   the   total   number  of  words  in  the  work  of  the  author).   The   collection   contains   texts   dating   back   to   the   year   1562.   The   period   filter   allows   choosing   any   period.   One   can   also   concentrate   on   any   genre   interesting   for  research  (genre  filter,  Fig.  4).  

5

  Fig.  4.  Time-­‐  and  genre-­‐filter  in  work

For  the  literary  studies,  especially  for  the  literary  history,  it  is  very  important  to   be   able   to   trace   the   development   of   the   phenomena,   trends,   patterns,   etc.   A   feature   of   the   timeline   allows   make   clear   these   evolutionary   aspects   of   the   negative  geopoetics  (Fig.  5).    

Fig. 5. Timeline view of the negtive geopoetics evolution.

2.2.2 Ontology  of  negative  geopoetical  concepts   We   started   to   create   the   ontology   of   negative   geopoetics   from   the   initial   list   of   concepts   that   we   selected   manually.   On   the   base   of   Dictionary   of   associations   (wordassociations.ru)   and   Dictionary   of   synonyms   (sinonimus.ru)   we   semi-­‐ automatically   produced   semantic   clouds   for   each   of   the   primary   concepts   and   increased   the   initial   list   to   700   concepts   (in   green,   Fig.   3).   Then   we   completed   lexical   analysis   (tokens)   by   one   that   refers   to   the   meaning   and   sorted   out   the   list   to  ten  semantic  centres.  We  designated  them  by  artificial  -­‐ity  words  to  emphasize   their   abstract,   or   ideal,   or   «  constructed»   character   (SWAMP-­‐ity;   EMPT-­‐ity;   CHAOS-­‐ity  and  the  like).  Among  others  was  the  DRIV-­‐ity  category  that  helps  us  to   consider   the   aspect   of   movement   and   circulation   within   negative   geopoetics.   These  concepts  are  geopoetically  and  negatively  «marked».  They  form  the  basis   of  any  further  statistical  calculation.     6

 

Fig.  6.  Concepts  categorization  

  There   are   other   concepts   that   do   not   have   obvious   negative   semantic,   but   that   are   necessary   for   the   descriptions   of   obscure   territories   and   negative   places.   We   grouped   them   at   the   subcategory   of   «  non-­‐classifiable  »   which   have   been   called   «AUXILIARIES».   These   concepts   are   excluded   from   the   statistics,   but   present   in   the  ontology,  accompanying  marked  concepts,  entering  di-­‐,  tri-­‐,  etc.  -­‐grams.      

  Fig.  7.  N-­‐gramms    

Using   the   method   of   user-­‐curated   and   dynamic   ontology   elaborated   by   ScienceWISE  experts  we  gradually  built  the  negative  geopoetics  ontology.   Based   on   this   ontology,   the   system   extracts   from   any   given   text   all   the   ontological   concepts   («FOUND   CONCEPTS»).   In   addition   to   it   the   system   automatically   identified   concepts   that   were   not   previously   known   and   offers   them  to  the  user  for  validation.  All  these  concepts  are  ordered  by  their  relevance   to   the   current   text.   Some   of   the   most   relevant   concepts   from   all   these   lists   are   7

suggested  as  «CHOSEN  CONCEPTS».  This  automatic  suggestion  can  be  manually   validated  and  improved,  by  simply  moving  the  concepts  between  the  columns.     If  a  concept  is  missing,  it  can  be  easily  added  to  the  ontology  by  any  researcher   who   is   working   with   this   text.   In   this   way   the   ontology   is   collaboratively   developed  (Fig.  9).    

 

 

Fig.   9.   Mechanism   of   user-­‐curated   and   dynamic   ontology   at   work:   conceptualisation   and   categorization   of   the  new  word.    

We  have   applied  to  negative  geopoetics  database  the  high-­‐quality  state-­‐of-­‐the-­‐ art   algorithms   for   concept   discovery   which   was   developed   by   the   ScienceWISE  researchers.  As  result  we  received  a  number  of  concepts  that  have   been  used  to  describe  the  negatively  marked  places  in  the  corpus,  that  is  not  just   a  list  of  free  keywords,  but  rather  a  list  of  key  phrases.  The  algorithm  allows  to   detect   various   word-­‐concept   representations   of   the   same   concepts   within   a   literary  text.   Following   the   ScienceWISE   principle   of   the   ontology-­‐organized   knowledge   base,   we   consider   literary   writing   (similarly   to   scientific papers   on   physics)   as   «bags   of   concepts»   (not   just   «bags   of   words»   what   would   be   significantly   reducing).  The  use  of  the  modularity-­‐based   community   detection   technique   allows   to   determinate   automatically   the   number   of   communities   and   their   hierarchy.    

8

  Fig.   10.   Representation   of   the   concept   of   negative   geopoetics   MARE   (БОЛОТО,   in   Russian)   together   with   different  kinds  of  semantic  relations.  

2.3 As  a  result,  a  created  Negative  Geopoetics  DATA  allows  us:      

• • • • •

• •

To   define   automatically   what   kind   of   negative   place   describes   each   literary  test;     To  assign  the  text  to  one  or  several  categories  of  negative  geopoetics   (…-­‐ty);     To   show   the   ranks   of   negative   concepts,   used   in   the   text   or   by   some   author  or  during  a  certain  period  of  the  literary  history;     To   find   out   the   negatively   depicted   toponyms   and   discover   among   them  those  used  metaphorically  (Sahara,  Siberia…);   To   provide   a   list   of   different   morphological   groups   of   negative   geopoetics  such  as:  negative  place  action  (to  rot,  to  decay,  to  perish…),   negative  place  epithet  (burnout,  fetid,  deserted,  disused…)  etc.     To   determinate   the   relation   between   a   negative   place   and   a   character,   a  negative  place  and  a  narrator;   To  identify  general  topic  (city,  village,  industry,  transport,  war…);    

We  are  working  also  on  the  possibility: •

To   recognize   a   stylistic   context   (description   of   emotions,   judgement,   subjective  perception  of  a  landscape…)  



To   trace   the   correspondence   with   the   genres   (war   stories,   local   legends,  traveller  notes…).    

9

3 Conclusion   The   researchers   from   the   University   of   Lausanne   obtained   a   powerful   and   rewarding  tool  that  could  be  further  developed  and  improved  in  the  process  of   collaborative   exploration.   All   scholars   in   humanities   are   free   to   use   the   created   database  in  comparative  literary  studies  and  as  well  as  for  the  exploration  of  any   other  subjects.       The   indisputable   advantage   and   innovativeness   of   the   negative   geopoetics   database   is   the   possibility   to   analyse   not   just   one   text,   but   to   operate   on   a   big   corpus  or  a  number  of  smaller  sub-­‐corpora.  Using  the  database  we  were  able  to   find  a  number  of  results  specific  for  the  literary  data  as  such:  plenty  of  synonyms   and   a   strengthening   of   horizontal   relations.   These   results   will   be   reported   elsewhere.     The   ScienceWISE   researchers   have   got   the   opportunity   to   test   their   specific   methods   in   a   field   of   literary   studies   and   to   draw   a   conclusion   about   these   compatibilities  in  relation  with  the  corpus  of  literary  texts  in  a  given  language.       The   project   opens   many   interesting   possibilities.   For   example,   it   would   be   interesting   to   evaluate   the   co-­‐occurrence   of   negative   geopoetical   elements:   are   they  independent  or  concomitant?  We  leave  it  for  future  work.    

4 Acknowledgement  

This  work  was  supported  by  the  CROSS  (EPFL-­‐UNIL  collaborative  grant)  and  by   the  Swiss  National  Science  Foundation.  

10

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.