A workshop and workbook
avian – 2 matches Avian – 12 matchesavian – with the insensitive case flag – 14 matches [A-Z]\w* [A-Z]+ – match only “all caps” words. BUT this is not quite right. It doesn’t work. Do you know why?\b[A-Z]+\b – match on a word boundary using an anchor class: \b\b[A-Z]{2,}\b – Abbreviations are usually 2 or more upper case characters.Note
" or [[ ] are denoted by square brackets.* , + , ?{ } allow for define repetition\b is an anchor denoting a word boundaryMatch the last words of sentences
\w+. – This doesn’t work because “.” matches every character\w+\. – We escape the period “.” with a the escape character \\w+\.\s – More precise this time. Matching on 56 words. Using \s allows us to stop matching email address by matching whitespace \sNote
\w is a “word”" character\s is a “space” character. is a meta character (introduced above)Find all years
\d\d\d\d – a lot of matches here\d{4} – more succinct but has the same meaning as above\b\d{4}\b – word boundaries \b help but there are still some false positives\b(19|20)\d\d\b – better and works for the twenties and twenty-first centuriesNote
| for alternation, alternatives( ) grouping{ } multiplierFind a phone number
\(\d{3}\) \d{3}-\d{4} – Very specific. This works as long as phone numbers are formatted consistently\(?\d+\)? ?[\d-]{5,}\d – more permissive\(?\d+(\)|.)? ?[\d-.]{5,}\d – more permissive still. Allows for . instead of - as a separatorNote
\(? indicates optionality matching zero or one occurrence\w+@[\w\.]+
Note
^(\w+ ?)+$ – match repeating words + optional spaceNote
+ can be applied to a group^ prior to a match pattern means begins with$ following a match patterns means ends withhonour – 14 matcheshonou?r – optional “u” and still 14 matcheshon(our|ourable|esty?) – honour honourable, honest, honesty; for 66 matches^(ACT|SCENE) [IVXLCDM]+ – literal word, space, roman numerals; for 20 matches^[A-Z]+$^.*\? – from start of line to question mark$0 code.(\w+) (\w+) – in the Expression panel will highlight all names"$0" – in the Substitution pane will reproduce the text pattern matched within forward slashes (Expression pane / /)- $2, $1 – swap the order of the first and last name and precede the whole name with a dash ‘-’<b>$2</b>, $1 – Bold the last name and add a comaPlease note this is actual twitter stream data about a politician, the tweets may be offensive
You can do this exercise in Regexr.com or copy the textbox data and paste to RegEx101.com
Please complete the paper Feedback Form
Presenter