Publisher review:Shuffle Merge Files script solves the problem of shuffle-merging files. This script solves the problem of shuffle-merging files -- interlacing (shuffle-merging) many small text files into one large text file, while preserving the order of the lines from within the small files.
In a scientific simulation process, it is not uncommon to need to combine multiple source files into one final file, which will become input for another stage in the process, while preserving the order of the lines from within the source files. For example, when simulating the way checked messages arrive to a centralized component (e.g., a central database server in a distributed banking service), the final file needs to combine all source files in a random way (e.g., the messages arrived at their pace, disturbed by the transfer over Internet), while preserving the order between lines of the same source file (e.g., the receiving-end of the messaging service ensured the messages from the same client arrived in a fixed order). This recipe solves this problem, under the following assumptions:o the source files are named "Prefix_X", with the same Prefix, and X being a 0-bades integer index of the file (e.g., 0, 1, ..., n-1 for n source files)o the shuffle-merge (output) file is "sm-Prefix"The script prints a verification message every 10K lines parsed, and seems to be performing in under 10s on a 1M lines set of input.Possible improvements [est.difficulty: trivial In a scientific simulation process, it is not uncommon to need to combine multiple source files into one final file, which will become input for another stage in the process, while preserving the order of the lines from within the source files.
For example, when simulating the way checked messages arrive to a centralized component (e.g., a central database server in a distributed banking service), the final file needs to combine all source files in a random way (e.g., the messages arrived at their pace, disturbed by the transfer over Internet), while preserving the order between lines of the same source file (e.g., the receiving-end of the messaging service ensured the messages from the same client arrived in a fixed order). This recipe solves this problem, under the following assumptions:o the source files are named "Prefix_X", with the same Prefix, and X being a 0-bades integer index of the file (e.g., 0, 1, ..., n-1 for n source files)o the shuffle-merge (output) file is "sm-Prefix"The script prints a verification message every 10K lines parsed, and seems to be performing in under 10s on a 1M lines set of input.
Shuffle Merge Files 1.0 is a Python script for File Management scripts design by Alexandru Iosup.
It runs on following operating system: Windows / Linux / Mac OS / BSD / Solaris.
solves the problem of shuffle-merging files.
Operating system:Windows / Linux / Mac OS / BSD / Solaris