Show simple item record

dc.contributor.authorBaker, James
dc.contributor.authorMilligan, Ian 17:35:06 (GMT) 17:35:06 (GMT)
dc.descriptionThis article Published by the Editorial Board of the Programming Historian is made available under a Creative Commons Attribution 2.0 Generic License. Available at:
dc.description.abstractThis lesson will look at how research data, when organised in a clear and predictable manner, can be counted and mined using the Unix shell. The lesson builds on the lessons “Preserving Your Research Data: Documenting and Structuring Data” and “Introduction to the Bash Command Line”. Depending on your confidence with the Unix shell, it can also be used as a standalone lesson or refresher. Having accumulated research data for one project, a historian might ask different questions of that same data when returning to it during a subsequent project. If this data is spread across multiple files - a series of tabulated data, a set of transcribed text, a collection of images - it can be counted and mined using simple Unix commands. The Unix shell gives you access to a range of powerful commands that can transform how you count and mine research data. This lesson will introduce you to a series of commands that use counting and mining of tabulated data, though they only scratch the surface of what the Unix shell can do. By learning just a few simple commands you will be able to undertake tasks that are impossible in Libre Office Calc, Microsoft Excel, or other similar spreadsheet programs. These commands can be easily extended for use with non-tabulated data. This lesson will also demonstrate that the options for manipulating, counting and mining data available to you will often depend on the amount of metadata, or descriptive text, contained in the filenames of the data you are using as much as the range of Unix commands you have learnt to use. Thus, even if it is not a prerequisite of working with the Unix shell, taking the time to structure your research data and filenaming conventions in a consistent and predictable manner is certainly a significant step towards getting the most out of Unix commands and being able to count and mine your research data. For the value of taking the time to make your data consistent and predictable beyond matters of preservation, see “Preserving Your Research Data: Documenting and Structuring Data”.en
dc.publisherThe Editorial Board of the Programming Historianen
dc.rightsAttribution 2.0 Generic*
dc.subjectGuides and tutorialsen
dc.subjectResearch dataen
dc.subjectData miningen
dc.titleCounting and Mining Research Data with Unixen
dc.typeTechnical Reporten
dcterms.bibliographicCitationJames Baker and Ian Milligan. “Counting and Mining Research Data with Unix.” Programming Historian, September 2014.en
uws.contributor.affiliation1Faculty of Artsen

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution 2.0 Generic
Except where otherwise noted, this item's license is described as Attribution 2.0 Generic


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages