Regular Expressions for Numbers in BASH

Introduction

This is a short post on how to recognize numbers such as simple integers, real numbers and special codes such as zip codes and credit card numbers and also extract these number from unstructured text in the popular bash (Bourne Again Shell) shell or scripting language. Bash is the default Unix console or terminal window on Macintosh computers as well as a number of other Apple products. It is also used by cygwin and several other variants of Unix and Linux.

Regular expressions are a compact efficient way of representing patterns of characters including the letters in the English alphabet and digits. There is extensive information on the web on regular expression. Interested readers can start with the Wikipedia page on regular expressions. The goal of this post in to illustrate specifically how to recognize common types of numbers using regular expressions in BASH. The regular expressions in the examples will also work in some other environments.

The example scripts below were tested on a MacBook Air with Mac OS X 10.10.3 (Yosemite) and this version of the bash shell:

GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)
Copyright (C) 2007 Free Software Foundation, Inc.

NOTE: If you are new to programming or regular expressions, regular expressions are not derived from the English language or basic arithmetic taught in schools. Generally aspects of programming that are not close to English or standard mathematics are harder to master and require more practice, drilling, and subsequent continued use to master. Programming languages such as BASIC or Python that are closer to English are easier to learn and master than languages such as C or C++ that make heavy use of custom computer notations and terms not found in standard English or standard school mathematics. Regular expressions are way out there in the cryptic computer notation wilderness.

The example bash scripts are listed below. They are also available at GitHub. The HTTPS URL for the Git repository at GitHub is:

https://github.com/jmcgowan79/mathbash.git

To get you must have Git installed and configured on your computer. Then:

$ git clone https://github.com/jmcgowan79/mathbash.git
$ cd mathbash
$ chmod ugo+x *.sh             # make scripts executable
mathbash$ ./test_isnumber.sh   # to test the installation

Regular Expressions for Numbers

This script recognizes various types of numbers including integers, real numbers, complex numbers, and special codes including zip codes, telephone numbers, and credit card numbers.

isnumber.sh

#!/bin/bash
# test if a string is a number, report type of the number (e.g. INTEGER, REAL, etc.)
# exit code is 0 for success -- string is a NUMBER
# exit code is 1 for failure -- string is NOT A NUMBER
#
# illustrates regular expressions for recognizing number strings
#
# bash is sometimes in /usr/local/bin/bash
#
# (C) 2015 John F. McGowan, Ph.D. 

if [[ "$#" -ne 1 || "$1" == "-h" || "$1" == "-?" ]]; then
    echo "Usage: `basename $0`  "
    echo "  -- reports number type of possible_number_string "
    echo "  -- POSITIVE INTEGER, NON-NEGATIVE INTEGER, SIGNED INTEGER"
    echo "  -- HEXADECIMAL, REAL NUMBER, VECTOR, ZIP CODE, TELEPHONE NUMBER "
    echo "  -- CREDIT CARD NUMBER"
    echo " "
    echo "  -- ILLUSTRATES REGULAR EXPRESSIONS FOR RECOGNIZING NUMBER STRINGS"
    echo " "
    echo '  -- use bash$ echo $? to test exit code'
    echo "  -- use enclosing quotes for string with spaces such as credit card numbers"
    echo " "
    echo " Author: John F. McGowan, Ph.D. ([email protected])"
    echo " "
   exit 0
fi

# Unix/bash exit code of 0 means success (is a number in this case)
is_number=1  # start no number found

number_string=$1

# regular expressions match patterns of characters
#
# caret ^ represents the start of a string or line outside of brackets
# dollar $ represents the end of a string or line
# square brackets [1-9] represent all characters in the brackets
# [abc] for example can be "a," "b," or "c"
# hyphen inside brackets indicates a range of characters
# typically digits or letters
# [1-9] represents the digits in range 1,2,3,...9
# [a-c] represents the letters a,b,c
#
# inside brackets caret ^ negates the list of characters
# for example [^0-9] represents all characters EXCEPT 0,1,2,...9
#
# ? indicates 0 or 1 of preceding pattern
# * indicates 0 or more of preceding pattern
# + indicates 1 or more of preceding pattern
# . matches any single character
#
# (...) is a group
# for example, (ab)? matches (nothing) or ab
# for example, (ab)* matches (nothing), ab, abab, ...
# for example, (ab)+ matches ab, abab, ababab, ...
# (...){n,m} indicates from n to m repetitions of the pattern
# for example, (ab){2,3} matches only abab and ababab
# the backlash is used to escape the characters with special meanings
# \^ \$ \( \) \[ \] \{ \} \* \. \? \+
#
# =~ is the reguar expression pattern matching operator in bash

# positive integers/counting numbers (1,2,3,...)
if [[ $number_string =~ ^[1-9][0-9]*$ ]]; then
    echo "POSITIVE INTEGER"
    is_number=0
fi

# add zero to numbers
# zero was remarkably difficult to invent
# the ancient Babylonians had a place-value
# number system based on 60 (not 10) which
# included an implicit zero, but the explicit
# symbol for zero took many more centuries to
# invent

# non-negative integers (0,1,2,...)
if [[ $number_string =~ ^[0-9]+$ ]]; then
    echo "NON-NEGATIVE INTEGER"
    is_number=0
fi

# negative numbers are even less obvious

# signed integers (..., -2, -1, 0, 1, 2,...)
if [[ $number_string =~ ^[+-][0-9]+$ ]]; then
    echo "SIGNED INTEGER"
    is_number=0
fi

# hexadecimal numbers are used with computers
# and low-level programming of computers

# hexadecimal (base 16) numbers such as AA12 or 0x12ab etc.
if [[ $number_string =~ ^(0[xX])?[0-9a-fA-F]+$ ]]; then
    # also recognize C format hex numbers such as 0xaf12
    echo "HEXADECIMAL NUMBER (INTEGER)"
    is_number=0
fi

# fractions such as 1/2, 1/3 date to antiquity but the
# concept of real numbers such as square root of 2
# proved difficult to grasp.  The ancient Greeks
# knew a proof that the square root of 2 could not
# be a ratio of two integers, but were apparently
# unable to make the leap to real numbers.

# real numbers/decimal numbers (0.0, ..., 0.5, ..., 1.0, ..., 3.1415...,...)
real_regexp="[+-]?([0-9]+|[0-9]+\.[0-9]*|\.[0-9]+)"
if [[ $number_string =~ ^$real_regexp$ ]]; then
    echo "REAL NUMBER"
    is_number=0
fi

# vectors are usually used to represent a magnitude
# with a direction such as the direction and speed
# of the wind or an ocean current (early uses of
# the vector concept)

# vector with enclosing parenthesis, e.g. (1, 2, 3)
vector_regexp="\(( *$real_regexp, *)+$real_regexp *\)"
if [[ $number_string =~  ^$vector_regexp$ ]]; then
    echo "VECTOR"
    is_number=0
fi

# vector with enclosing brackets, e.g. [1, 2, 3]
vector_regexp="\[( *$real_regexp, *)+$real_regexp *\]"
if [[ $number_string =~  ^$vector_regexp$ ]]; then
    echo "VECTOR"
    is_number=0
fi

# vector with enclosing curly braces, e.g {1, 2, 3}
vector_regexp="\{( *$real_regexp, *)+$real_regexp *\}"
if [[ $number_string =~  ^$vector_regexp$ ]]; then
    echo "VECTOR"
    is_number=0
fi

# the imaginary numbers turned up in roots of
# polynomials and are now used in everthing from
# electrical engineering, cryptography, to
# quantum mechanics, but remain mysterious.

# pure imaginary numbers i = square root(-1)
if [[ $number_string =~ ^$real_regexp[iI]$ ]]; then
    echo "PURE IMAGINARY NUMBER";
    is_number=0
fi

# complex numbers (1.1 + 2i, -1 + 2.1i, ...)
#
complex_regexp="$real_regexp( *[+-] *($real_regexp)?[iI])?"
if [[ $number_string =~ ^$complex_regexp$ ]]; then
    echo "COMPLEX NUMBER"
    is_number=0
fi

# large integers are frequently used as unique identifiers

# zip code (United States)
if [[ $number_string =~ ^[0-9]{5,5}(-[0-9]{4,4})?$ ]]; then
    echo "ZIP CODE (USA)"
    is_number=0
fi

# telephone number (USA)

if [[ $number_string =~ ^(([0-9]( |-))?[0-9]{3,3} +|([0-9]( |-))?\([0-9]{3,3}\) *)?[0-9]{3,3}( |-)[0-9]{4,4}$ ]]; then
    echo "TELEPHONE NUMBER (USA)"
    is_number=0
fi

# credit card number  (16 digits)

if [[ $number_string =~ ^[0-9]{16,16}|([0-9]{4,4} ?){4,4}$ ]]; then
    echo "CREDIT CARD NUMBER"
    # remove spaces from credit card number
    number_string_cleaned=${number_string// /}
#    echo ""
    if [[ $number_string_cleaned =~ ^4[0-9]{6,}$ ]]; then
	echo "PROBABLE VISA CARD (VISA CARD START WITH 4)";
    fi
    if [[ $number_string_cleaned =~ ^5[1-5][0-9]{5,}$ ]]; then
	echo "PROBABLE MASTER CARD";
    fi
    is_number=0
fi

# report if string is not a number
#
if [[ $is_number == 1 ]]; then
    echo "NOT A NUMBER"
fi

exit $is_number

How to Extract Numbers from Text using Regular Expressions in BASH

This short script demonstrate how to use the regular expressions for numbers to extract numbers and numeric data from unstructured text, a common problem in this age of the Internet. Note that bash stores the first matched sub-pattern indicated by enclosing parenthesis in the regular expression in the special variable BASH_REMATCH[1]

extract_number.sh

#!/bin/bash
#
# example of extracting number from text using regular expressions in bash
# -- we frequently want to extract numerical data from unstructured text
#
# illustrates regular expressions for recognizing number strings
#
# bash is sometimes in /usr/local/bin/bash
#
# (C) 2015 John F. McGowan, Ph.D. ([email protected])


if [[ "$1" == "-h" || "$1" == "-?" ]]; then
    echo "Usage: `basename $0`  extract number from string "
    echo "       -- exit code 1 if no number found"
    echo "       -- exit code 0 if a number is found"
    echo "       -- reports number if found"
    echo " "
    echo " Author: John F. McGowan, Ph.D. ([email protected])"
    echo " "
    exit 0
fi

found_number=1  # haven't found number yet
#
# inside brackets, caret negates the list of characters
# [^0-9] matches all characters except for 0,1,2...9
#
# real numbers/decimal numbers (0.0, ..., 0.5, ..., 1.0, ..., 3.1415...,...)
real_regexp="[+-]?([0-9]+|[0-9]+\.[0-9]*|\.[0-9]+)"
complex_regexp="$real_regexp( *[+-] *($real_regexp)?[iI])?"

if [[ $1 =~ [^0-9\-]*($complex_regexp) ]]; then
    echo ${BASH_REMATCH[1]}
    found_number=0
fi

exit $found_number

Tester for Is Number Script

This is a script to run a series of tests on the isnumber.sh script to verify that it is installed and working correctly. Note that bash is not always located at /bin/bash. It may also be at /usr/local/bin/bash or other locations on your computer’s file system.

#!/bin/bash
#
# test script for isnumber.sh
#
# Author: John F. McGowan Ph.D. ([email protected])
# (C) 2015 John F. McGowan
#

# test non-numbers
#
ntests=0
nfails=0

echo "NOT A NUMBER TESTS"
report=`./isnumber.sh dog`
result=$?  # need to assign this to result immediately after isnumber.sh exits
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "PASSED"    
else
    echo "FAILED"
    nfails=`expr $nfails + 1`
fi

report=`./isnumber.sh 123x`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "PASSED"    
else
    echo "FAILED"
    nfails=`expr $nfails + 1`
fi

report=`./isnumber.sh 1.2.3`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "PASSED"    
else
    echo "FAILED"
    nfails=`expr $nfails + 1`
fi

report=`./isnumber.sh 1.2.i`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "PASSED"
else
    echo "FAILED"
    nfails=`expr $nfails + 1`
fi

# test numbers
#
echo "NUMBER TESTS"
# integer
report=`./isnumber.sh 1`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

# zero
report=`./isnumber.sh 0`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

# negative integer
report=`./isnumber.sh -1`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh 1.23`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh .1`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh 0.1`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh af`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh 0xaf`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh "(1,2,3)"`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh [1,2,3]`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh {1,2, 3}`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh 12i`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi


report=`./isnumber.sh 12.i`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

report=`./isnumber.sh .12i`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi


report=`./isnumber.sh 1.2i`
result=$?
ntests=`expr $ntests + 1`
if [[ $result == "1" ]]; then
    echo "FAILED"
    nfails=`expr $nfails + 1`
else
    echo "PASSED"
fi

echo " "
echo "SUMMARY"
echo "---------------------------------"
echo "FAILED $nfails OF $ntests TESTS";
if [[ $nfails == 0 ]]; then
    echo "PASSED ALL TESTS!!!!"
fi

# the end

Example Successful Output from Tester

This is an example of the output when the tests are all passed.

$ ./test_isnumber.sh 
NOT A NUMBER TESTS
PASSED
PASSED
PASSED
PASSED
NUMBER TESTS
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
 
SUMMARY
---------------------------------
FAILED 0 OF 19 TESTS
PASSED ALL TESTS!!!!

© 2015 John F. McGowan

About the Author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].

2 Comments

  1. howardat58 August 17, 2015
  2. Mokhtar Ebrahim March 3, 2018

Leave a Reply