247 Hesburgh Library, Navari Family Center for Digital Scholarship
*In order to protect the health and wellness of our community, this event has been canceled. We will share more information on rescheduling, as appropriate, at a later date. *
Text mining — a process for extracting information from an unstructured text — requires everyday files (PDF, Word, HTML, etc.) to be transformed into plain text files. Once your files are in a plain text format (no bold, italics, underlining, etc.) they are ready for automated processing and computer analysis.
This hands-on workshop will demonstrate and facilitate the use of a free Java-based program called Tika that can do this work. More specifically, this workshop will help attendees install Tika and teach them how to convert just about any file into plain text, with hopes that they will leave with the knowledge and confidence to use text mining services available on the internet.
There are no prerequisites, but participants should bring their own laptops.
Open to all graduate and undergraduate students, postdocs, faculty, and staff.