- Download OpenRefine and Sublime Text
- Learn RegEx
- Use Sublime
- Use RegEx in Sublime
- Use OpenRefine
- Use RegEx in OpenRefine
Download these in the background while you are completing the next tasks
- Download OpenRefine for Windows here
- unzip, and double-click on openrefine.exe. If you’re having issues with the above, try double-clicking on refine.bat instead.
- Download OpenRefine for macOS here
- Open, drag icon into the Applications folder and double click on it.
- Download Sublime for Windows here
- Run the installer
- Download Sublime for macOS here
- Drag Sublime into your applications
Complete the RegEx tutorial here: https://regexone.com/
You can refer to http://www.regular-expressions.info/tutorial.html for more detailed information about regular expressions
You can practice your regular expressions here: https://regexr.com
- Download the zip file here
- Extract it into a new folder on your Desktop
- Open the folder in Sublime
After each task has been completed check that your solution is correct by comparing it with the answer file
- In
file1.csvremove all the instances of the character. - In
file2.csvreplace all the instances of the character.with- - In
file3.csvfind all the upper-case text, and make it lower case- Make sure your search is case-sensitive
- Try looking in the command palette
- macOS
⌘ + ⇧ + P - Windows
Ctrl + ⇧ + P
- macOS
- In
file4.csvreplace all instances of the wordsone,two, andthreewith the numbers1,2, and3respectively - In
file5.csvreplace all the repeating instances ofawith a singlea
- Watch the introduction video here: http://www.youtube.com/watch?v=B70J_H_zAWM
- Download some data from Reaper. If you are having issues, download the sample file here
- Create a new project and import the file from Reaper
- Make some changes to the file
- Export the file as a csv
-
In
file6.csvtrim the whitespace in the first column and capitalise the second- Look in the
Common Transformationsmenu
- Look in the
-
In
file7.csvsplit the created date intocreated_yearcreated_monthcreated_dayandcreated_time- Look in the
Edit Columnmenu
- Look in the
-
In
file8.csvsplit all the urls into a new column using this code:-
import re if value != None: return ",".join(re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', value))
-
Use
Add column based on this column -
Change the language to
Python / Jython
-
-
In
file9.csvsplit all the hashtags into a new column using this code:-
import re if value != None: return ",".join(re.findall(r"#(?:\[[^\]]+\]|\S+)", value))
-
Use
Add column based on this column -
Change the language to
Python / Jython
-
-
In
file10.csvfix the formatting issues in Sublime text, import it into OpenRefine