Select or Drop Columns in CSV Files
Isolate specific variables or drop unwanted columns and rows from your dataset using precise mathematical slicing.
How to Slice Data
A complete guide to configuring your data pipeline.
Step 1: Selecting the Columns
Open the Select Rows/Columns tool. In the Columns input box, define the exact columns you want to isolate. You can call them by their exact header name (e.g., Sales), by their numerical index (e.g., $1 for the first column), or by ranges (e.g., $3-$5).
Step 2: Inverting the Selection (Dropping)
If your goal is to remove specific columns rather than keep them, check the "Remove columns" checkbox. If you type $2 and check this box, the engine will keep your entire dataset but permanently delete the second column.
Step 3: Row Slicing
You can perform the same operation horizontally on rows. In the Rows input box, type the numerical indices you want to keep or drop (e.g., 1-100 to keep the first hundred rows, or 200- to keep everything from row 200 to the end of the file).
Technical Specifications & Use Cases
Dimensionality reduction is often the first step in data processing. Massive analytical datasets frequently contain dozens of metadata columns (like internal database IDs, timestamps, or system logs) that are irrelevant to the current statistical hypothesis.
Attempting to load and process these wide-matrix datasets consumes unnecessary RAM and slows down data visualization tools. flowingTable allows data scientists and analysts to rapidly slice DataFrames via indexing and column parsing. By executing column drops before exporting, you significantly reduce file bloat, ensuring your downstream machine learning pipelines or CRM dashboards only ingest high-signal variables.
Frequently Asked Questions
What is the difference between selecting specific columns and dropping specific columns?
Selecting columns is an inclusive operation: you list the columns you want to keep and the engine discards everything else. Dropping columns is an exclusive operation: you list the columns you want to remove and the engine retains everything else. Selecting is more efficient when you need only a small subset of a wide dataset (for example, keeping 5 out of 80 columns). Dropping is more efficient when you need most of the dataset but want to remove a few irrelevant or sensitive columns (for example, removing internal audit logs or personally identifiable information before sharing the file).
How do I select or drop a consecutive range of columns using their position numbers?
You can specify a positional range using the $start-$end syntax in the column input field. For example, entering $3-$7 targets columns 3 through 7 inclusive. You can also combine individual references with ranges in the same input: typing $1, $3-$5, Revenue will select the first column, columns 3 through 5, and the column named 'Revenue', regardless of its position. This syntax makes it practical to slice wide datasets without listing every column name individually.
Can I use this tool to reorder my columns into a new sequence without removing any of them?
Yes. To reorder columns, use the Select tool in inclusive mode and list all column references in the desired output order. The engine will reconstruct the DataFrame with columns arranged in the exact sequence you specified in the input field. For example, entering Region, Revenue, CustomerName, Date will output the table with those four columns in that specific left-to-right order, even if the original file had them arranged completely differently.
Is it possible to extract a specific range of rows by their row index number?
Yes. The Rows input box accepts numerical index ranges using the same syntax as columns. Entering 1-500 keeps only the first 500 data rows (excluding the header), while entering 1001- keeps every row from row 1001 to the end of the file. This is useful for splitting a large dataset into sequential chunks for batch processing, or for quickly discarding historical records at the beginning of a time-series file that predate your analysis window.