Data Extract
Data Extract Benford's Law Fraud detectionDownloads First digits lawSite Map
Home Page | Population | Sampling | Fraud | Other | Tutorial | Macros
Data Extract

Command Overview

Note: this feature is only available in version 1.3 and above

The extract command is used to select data which meets a specified criteria based upon data in the input source. The command supports three sources of data for testing:

  1. Data file in tab separated value format
  2. WorkSheet
  3. Selected range within a worksheet

The extract command can be used to "drill down" and focus on just on certain data, e.g. errors, based upon a variety of criteria such as:

  • numeric values, e.g. less than $200, between $50 - $100, etc
  • text values, e.g. division name of "AB", last name starts with B, etc
  • date values, e.g. between 1-1-2005 and 6-30-2006, weekday of Saturday, etc.
  • computed values, e.g. tax > invoice amount * .05, selling price < cost, etc.
  • complex criteria, e.g. based upon a combination of one or more of the above

The key to the extract command is the "condition" statement, which defines the criteria which must be met in order for the input data to be extracted to the output result. A condition statement is specified as cond="<condition statement or statements>".

A condition statement is one which evaluates to true or false, depending upon the input data. All variables are specified with a leading "$" to indicate that they are a variable (and not a text literal). The variable names correspond with the top row of the data in a worksheet or with the first row of data contained in a file in tab separated value format.

Examples of conditions include:

  • $InvoiceAmount > 100
  • left($SupplierNo,2) = "72"
  • ($Cost - $AccumDepr) <> $BookValue

Syntax

The command is entered on the command line as the command name ("extract", or just "ex") followed by one or more command parameters. Each command parameter consists of an argument name, immediately followed by an equal sign ("=") and then immediately followed by a value. The value may be enclosed with quotation marks, and must be enclosed by quotation marks if the value consists of any embedded spaces.

An example of a command parameter to specify that the file name to be processed is test.txt is entered as "file=test.txt" (without the quotations). Each argument on the command line must consist of only valid parameter names. If a required parameter name is missing, then the command line will be colored yellow, an error message will be displayed on the Excel status bar and an audible tone will be sounded.

The following is a listing of all parameter values for the extract command:

 
Argument name Description Example
Common Parameters (Required for all)
cond the condition to be evaluated in order to determine if the row of data is to be extracted $DueDate < "6/10/2007"
recap the worksheet where the results are to be stored $Results
File Data Source (ds=file)
file the name of the file to be processed \test\invoice.txt
Worksheet Data Source (ds=rng)
sheet the name of the worksheet to be processed CMAData
ulc upper left hand corner of the range to be processed a2
Worksheet Selected Range (ds=sel)
(none) no other parameters N/A
Database (ds=db)
(N/A) Not supported, convert to tab separated file N/A
Parameters for the extract command

 

Some examples

The command to be perform an extraction the file named \test\invoice\tran.txt , storing the results on the worksheet named "Extract Data " is as follows:

extract ds=file file=\test\invoice\tran.txt recap="Extract Data" cond="$Amount > 100 and $DueDate < "6/30/2007" "

The same command for a data range on a sheet would be:

extract ds=rng sheet=t_CMAData ulc=a1 recap=Extract Data" cond="$Amount > 100 and $DueDate < "6/30/2007" "

The same command for a selected data range on a sheet would be:

extract ds=rng recap=Extract Data" cond="$Amount > 100 and $DueDate < "6/30/2007" "

 

Described below are some of the elements of a condition, and their description

Operator Description Example
Mathematical Operators
+ Addition $Amount + $SalesTax
- Subtraction $InvoiceAmount - $Discount
* Multiplication $Units * $Quantity
/ Division $SalesTax / $InvoiceAmount
min Minimum min($InvPaid, $InvoiceDue)
max maximum max($DiscAllowed, $DiscTaken)
abs Absolute value abs($Discount)
sqr Square Root sqr($EOQ)
log Logarithm base 10 log($ElapsedTime)
ln Naperian Log (base E) ln($TimePeriod)
Text Operators
left The left portion of the text left($SupplierNumber)
right The right portion of the text right($SupplierNumber)
ld The Levenshtein distance between two pieces of text (used to determine similarity) ld($VendorAddress,$EmployeeAddress)
Comparison Operators
< Less than $Amount < 100
> Greater than $Amount > 100
<= Less than or equal to $Amount <= 100
>= Greater than or equal to $Amount >= 100
<> Not equal to $Amount <> 100
= Equal to $Amount = 100
Date Operators
wd Weekday 1=Sunday, 2=Monday wd($InvoiceDate) = 6
we Falls on a weekend we($Invoicedate)
qtr Quarter number the selected date falls in (can be based on federal, state, calendar or fiscal year) qtr($DatePaid,"F")
ho date is a federal holiday ho($InvoiceDate)

 

 

 

 

Main Form

Complete Auditor resources
Links | Privacy Policy | FAQ | Downloads | Trademarks
Web Page last updated on 09-21-2007
© EZ-R Stats, LLC 2005-2007