Calculate and Create Columns in Datasets

Apply math formulas, aggregate functions, and text manipulations to dynamically engineer new columns in your CSV and Excel files.

Drag & Drop your file here

or browse files | paste raw data


[ Google AdSense Slot (728x90) ]
Learn More

How to Calculate Column

A complete guide to configuring your data pipeline.

Step 1: Naming the New Variable

After importing your file, open the Create Column (Formula) tool. In the first input box, declare the exact name for the new header you are generating (e.g., Total_Profit or Uppercase_Names).

Step 2: Authoring the Mathematical Formula

In the formula box, use the $ symbol to reference existing columns by their numerical position. For example, to multiply the second column by the third column, type $2 * $3. To add a 20% tax to the fourth column, type $4 * 1.20.

Step 3: Utilizing Built-In Functions

You can bypass manual math by utilizing the Insert Function dropdown. We support aggregate row calculations (e.g., mean($1, $2, $3) to find the average of three columns) and string manipulations (e.g., upper($1) to force all text in the first column to uppercase).

Technical Specifications & Use Cases

Feature engineering — the process of using domain knowledge to extract new variables from raw data — is vital for predictive modeling and financial reporting. However, writing complex formulas across millions of rows in traditional spreadsheet software introduces severe latency and the risk of localized formula corruption.

flowingTable executes variable mutation via the high-speed numexpr engine embedded in our Python architecture. By utilizing vectorized operations, mathematical formulas and string concatenations are applied simultaneously to the entire array, bypassing iterative loops. This guarantees absolute computational consistency across the entire matrix, allowing analysts to instantly generate standard deviations, sums, or concatenated string identifiers without writing raw Python or R scripts.


Frequently Asked Questions

How do I reference a column in a formula if it has no header name — for example, if my file has no header row?

All columns can be referenced by their positional index using the dollar-sign prefix syntax, regardless of whether a header row is present. The first column is always $1, the second is $2, and so on. If your file does have named headers and you prefer to use them, you can also reference columns by their exact header name as a string in certain function contexts. The $N positional syntax is the most reliable method when working with files that have auto-generated, numeric, or locale-specific header names.

Can I apply string functions like uppercase or lowercase conversion inside a formula?

Yes. The built-in function library includes string manipulation operations that can be inserted via the 'Insert Function' dropdown. upper($1) converts all characters in the first column to uppercase, while lower($1) converts them to lowercase. You can also concatenate columns with a separator using concat($1, ' - ', $2) to generate composite identifiers. These string functions apply vectorized operations across the entire column simultaneously, which is significantly faster than writing cell-by-cell formulas in a spreadsheet editor.

What happens if my formula references a column that contains text values instead of numbers?

If you apply a mathematical operator (like multiplication or addition) to a column that contains non-numeric text, the engine will raise a type mismatch error and display a descriptive message indicating which column caused the conflict. The original dataset is not modified in this case. To resolve the issue, first use the Substitute tool to clean the affected column — removing currency symbols, unit labels, or other text artifacts — so that the values can be interpreted as numeric types before the formula is reapplied.

How do I write a formula to calculate the percentage difference between two columns?

To compute the percentage change between an 'Old Value' column ($2) and a 'New Value' column ($3), enter the formula (($3 - $2) / $2) * 100 in the formula field. This produces the percentage difference as a decimal number (for example, 15.0 for a 15% increase). If your Old Value column contains any zeros, those rows will produce a division-by-zero result. You can handle this by first using the Filter tool to remove zero-value rows, or by using the Handle Missing Values tool to substitute zeros with a NaN sentinel before running the formula.