Merge and Join Datasets Online
Perform relational Inner, Left, Right, and Outer joins on any tabular data file without writing SQL or Python code.
How to Merge Tables
A complete guide to configuring your data pipeline.
Step 1: Establishing the Primary and Secondary Tables
Merging requires two distinct datasets. Upload your first table, then use the New Flow button in the left sidebar to upload your second table into a separate branch. In the right sidebar, open the Merge Tables tool. You will map your first upload to "Table 1 (Left)" and your second upload to "Table 2 (Right)".
Step 2: Defining the Relational Keys
A relational merge requires a common identifier between both files. Using the Key Column dropdowns, select the exact column header in Table 1 that matches the corresponding identifier column in Table 2 (e.g., matching a 'Sample_ID' column in Table 1 to an 'ID_Number' column in Table 2).
Step 3: Selecting the Join Type
You must instruct the engine on how to handle non-matching rows using the Merge Type dropdown:
- Inner Join: Retains strictly the rows where the key exists in both tables.
- Left Join: Keeps all rows from Table 1, attaching data from Table 2 where available, and filling gaps with empty values (NaN).
- Outer Join: Combines all records from both tables, aligning matching keys and retaining all unmatched vectors from both sides.
Technical Specifications & Use Cases
Combining disparate data sources via relational joins is a notoriously fragile operation in standard spreadsheet software, often resulting in misaligned rows. By executing these operations through a strict pandas.merge() backend, flowingTable mathematically guarantees row integrity.
This is highly critical in advanced data science pipelines. For instance, when integrating multi-omics datasets, researchers must frequently perform inner joins on sequencing read identifiers to flawlessly align RNA expression matrices with matching genomic variant data. Our architecture supports exact memory-mapped joins, bypassing the visual limitations of the DOM and ensuring that multi-dimensional data arrays remain perfectly synchronized during the cross-referencing process.
Frequently Asked Questions
What is the practical difference between an Inner Join and an Outer Join?
An Inner Join is the most restrictive operation: it retains only the rows whose key value exists in both Table 1 and Table 2 simultaneously, discarding all unmatched records from either side. An Outer (Full) Join is the most inclusive: it retains every row from both tables, aligns the ones that share a matching key, and fills the gaps for unmatched rows with empty values. Use Inner Join when you need a clean, complete dataset with no missing data. Use Outer Join when you need a comprehensive combined view and are comfortable handling null values in the result.
Can I perform a merge if my two tables use different column names for the same identifier?
Yes, this is explicitly supported. The Key Column dropdowns allow you to select a different column from each table to act as the join key. For example, you can map 'CustomerID' in Table 1 to 'client_id' in Table 2, and the engine will correctly align the rows based on the shared underlying values, even though the header names differ between the two files.
How does the tool handle column names that exist in both tables under the same header?
When a non-key column name exists in both tables, the pandas merge engine automatically appends a suffix to each to prevent ambiguity. Columns from Table 1 receive a '_x' suffix and columns from Table 2 receive a '_y' suffix (for example, 'Status_x' and 'Status_y'). This ensures that no data is silently overwritten during the merge and that you can identify the origin table for each column in the output.
Is it possible to merge more than two tables in a single session?
Yes. After completing the first merge, the resulting output table becomes your new working dataset. You can then load a third file via the New Flow button and perform a second merge operation on the combined result. This chaining approach allows you to perform multi-table relational joins sequentially, replicating the behavior of SQL queries that join three or more database tables without writing any code.