Class ExternalSort
- java.lang.Object
-
- org.apache.jackrabbit.oak.commons.sort.ExternalSort
-
public class ExternalSort extends Object
Source copied from a publicly available library.- See Also:
- https://code.google.com/p/externalsortinginjava
Goal: offer a generic external-memory sorting program in Java. It must be : - hackable (easy to adapt) - scalable to large files - sensibly efficient. This software is in the public domain. Usage: java org/apache/oak/commons/sort//ExternalSort somefile.txt out.txt You can change the default maximal number of temporary files with the -t flag: java org/apache/oak/commons/sort/ExternalSort somefile.txt out.txt -t 3 You can change the default maximum memory available with the -m flag: java org/apache/oak/commons/sort/ExternalSort somefile.txt out.txt -m 8192 For very large files, you might want to use an appropriate flag to allocate more memory to the Java VM: java -Xms2G org/apache/oak/commons/sort/ExternalSort somefile.txt out.txt By (in alphabetical order) Philippe Beaudoin, Eleftherios Chetzakis, Jon Elsas, Christan Grant, Daniel Haran, Daniel Lemire, Sugumaran Harikrishnan, Jerry Yang, First published: April 2010 originally posted at http://lemire.me/blog/archives/2010/04/01/external-memory-sorting-in-java/
-
-
Field Summary
Fields Modifier and Type Field Description static Comparator<String>defaultcomparator
-
Constructor Summary
Constructors Constructor Description ExternalSort()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voiddisplayUsage()static longestimateBestSizeOfBlocks(File filetobesorted, int maxtmpfiles, long maxMemory)static voidmain(String[] args)static <T> intmerge(BufferedWriter fbw, Comparator<T> cmp, boolean distinct, List<org.apache.jackrabbit.oak.commons.sort.BinaryFileBuffer<T>> buffers, Function<T,String> typeToString)This merges several BinaryFileBuffer to an output writer.static <T> intmergeSortedFiles(List<File> files, BufferedWriter fbw, Comparator<T> cmp, Charset cs, boolean distinct, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType)This merges a bunch of temporary flat files and deletes them on success or error.static <T> intmergeSortedFiles(List<File> files, BufferedWriter fbw, Comparator<T> cmp, Charset cs, boolean distinct, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType)This merges a bunch of temporary flat files and deletes them on success or error.static intmergeSortedFiles(List<File> files, File outputfile)This merges a bunch of temporary flat filesstatic intmergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp)This merges a bunch of temporary flat filesstatic intmergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, boolean distinct)This merges a bunch of temporary flat filesstatic intmergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs)This merges a bunch of temporary flat filesstatic intmergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs, boolean distinct)This merges a bunch of temporary flat filesstatic intmergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs, boolean distinct, boolean append, boolean usegzip)This merges a bunch of temporary flat filesstatic intmergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs, boolean distinct, boolean append, Compression algorithm)static <T> intmergeSortedFiles(List<File> files, File outputfile, Comparator<T> cmp, Charset cs, boolean distinct, boolean append, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType)This merges a bunch of temporary flat files and deletes them on success or error.static <T> intmergeSortedFiles(List<File> files, File outputfile, Comparator<T> cmp, Charset cs, boolean distinct, boolean append, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType)static voidsort(File input, File output)static FilesortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory)static FilesortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip)static FilesortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip, Predicate<String> filterPredicate)Sort a list and save it to a temporary filestatic FilesortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory, Predicate<String> filterPredicate)Sort a list and save it to a temporary filestatic <T> FilesortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip, Function<T,String> typeToString)static <T> FilesortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip, Function<T,String> typeToString, Predicate<T> filterPredicate)Sort a list and save it to a temporary filestatic <T> FilesortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, Compression algorithm, Function<T,String> typeToString)static <T> FilesortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, Compression algorithm, Function<T,String> typeToString, @Nullable Predicate<T> filterPredicate)Sort a list and save it to a temporary file.static <T> List<File>sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType)static <T> List<File>sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate)static <T> List<File>sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType)static <T> List<File>sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate)static List<File>sortInBatch(File file)static List<File>sortInBatch(File file, Comparator<String> cmp)static List<File>sortInBatch(File file, Comparator<String> cmp, boolean distinct)static List<File>sortInBatch(File file, Comparator<String> cmp, boolean distinct, Predicate<String> filterPredicate)This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.static List<File>sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct)static List<File>sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip)static List<File>sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Predicate<String> filterPredicate)This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.static List<File>sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm)static List<File>sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Predicate<String> filterPredicate)static List<File>sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, Predicate<String> filterPredicate)This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.static List<File>sortInBatch(File file, Comparator<String> cmp, Predicate<String> filterPredicate)This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.static <T> List<File>sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType)static <T> List<File>sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate)This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.static <T> List<File>sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType)static <T> List<File>sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate)static List<File>sortInBatch(File file, Predicate<String> filterPredicate)This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.
-
-
-
Field Detail
-
defaultcomparator
public static Comparator<String> defaultcomparator
-
-
Method Detail
-
sort
public static void sort(File input, File output) throws IOException
- Throws:
IOException
-
estimateBestSizeOfBlocks
public static long estimateBestSizeOfBlocks(File filetobesorted, int maxtmpfiles, long maxMemory)
-
sortInBatch
public static List<File> sortInBatch(File file, Predicate<String> filterPredicate) throws IOException
This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.- Parameters:
file- some flat filefilterPredicate- predicate to keep data which need to be sorted- Returns:
- a list of temporary flat files
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file) throws IOException
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, Predicate<String> filterPredicate) throws IOException
This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.- Parameters:
file- some flat filecmp- string comparatorfilterPredicate- predicate to keep data which need to be sorted- Returns:
- a list of temporary flat files
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp) throws IOException
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, boolean distinct, Predicate<String> filterPredicate) throws IOException
This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later.- Parameters:
file- some flat filecmp- string comparatordistinct- Passtrueif duplicate lines should be discarded.filterPredicate- predicate to keep data which need to be sorted- Returns:
- a list of temporary flat files
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, boolean distinct) throws IOException
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Predicate<String> filterPredicate) throws IOException
This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later. You can specify a bound on the number of temporary files that will be created.- Parameters:
file- some flat filecmp- string comparatormaxtmpfiles- maximal number of temporary filescs- character set to use (can use Charset.defaultCharset())tmpdirectory- location of the temporary files (set to null for default location)distinct- Passtrueif duplicate lines should be discarded.numHeader- number of lines to preclude before sorting startsusegzip- use gzip compression for the temporary filesfilterPredicate- predicate to keep data which need to be sorted- Returns:
- a list of temporary flat files
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip) throws IOException
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm) throws IOException
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Predicate<String> filterPredicate) throws IOException
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate) throws IOException
This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later. You can specify a bound on the number of temporary files that will be created.- Parameters:
file- some flat filecmp- string comparatormaxtmpfiles- maximal number of temporary filescs- character set to use (can use Charset.defaultCharset())tmpdirectory- location of the temporary files (set to null for default location)distinct- Passtrueif duplicate lines should be discarded.numHeader- number of lines to preclude before sorting startsusegzip- use gzip compression for the temporary filestypeToString- function to map string to custom type. User for coverting line to custom type for the purpose of sortingstringToType- function to map custom type to string. Used for storing sorted content back to filefilterPredicate- predicate to keep data which need to be sorted- Returns:
- a list of temporary flat files
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(File file, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate) throws IOException
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate) throws IOException
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
- Throws:
IOException
-
sortInBatch
public static <T> List<File> sortInBatch(BufferedReader fbr, long actualFileSize, Comparator<T> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, int numHeader, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType, Predicate<T> filterPredicate) throws IOException
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct, Predicate<String> filterPredicate) throws IOException
This will simply load the file by blocks of lines, then sort them in-memory, and write the result to temporary files that have to be merged later. You can specify a bound on the number of temporary files that will be created.- Parameters:
file- some flat filecmp- string comparatormaxtmpfiles- maximal number of temporary filescs- character set to use (can use Charset.defaultCharset())tmpdirectory- location of the temporary files (set to null for default location)distinct- Passtrueif duplicate lines should be discarded.filterPredicate- predicate to keep data which need to be sorted- Returns:
- a list of temporary flat files
- Throws:
IOException
-
sortInBatch
public static List<File> sortInBatch(File file, Comparator<String> cmp, int maxtmpfiles, long maxMemory, Charset cs, File tmpdirectory, boolean distinct) throws IOException
- Throws:
IOException
-
sortAndSave
public static File sortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory, Predicate<String> filterPredicate) throws IOException
Sort a list and save it to a temporary file- Parameters:
tmplist- data to be sortedcmp- string comparatorcs- charset to use for output (can use Charset.defaultCharset())tmpdirectory- location of the temporary files (set to null for default location)filterPredicate- predicate to keep data which need to be sorted- Returns:
- the file containing the sorted data
- Throws:
IOException
-
sortAndSave
public static File sortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory) throws IOException
- Throws:
IOException
-
sortAndSave
public static File sortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip, Predicate<String> filterPredicate) throws IOException
Sort a list and save it to a temporary file- Parameters:
tmplist- data to be sortedcmp- string comparatorcs- charset to use for output (can use Charset.defaultCharset())tmpdirectory- location of the temporary files (set to null for default location)distinct- Passtrueif duplicate lines should be discarded.filterPredicate- predicate to keep data which need to be sorted- Returns:
- the file containing the sorted data
- Throws:
IOException
-
sortAndSave
public static File sortAndSave(List<String> tmplist, Comparator<String> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip) throws IOException
- Throws:
IOException
-
sortAndSave
public static <T> File sortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip, Function<T,String> typeToString, Predicate<T> filterPredicate) throws IOException
Sort a list and save it to a temporary file- Parameters:
tmplist- data to be sortedcmp- string comparatorcs- charset to use for output (can use Charset.defaultCharset())tmpdirectory- location of the temporary files (set to null for default location)distinct-usegzip- assumes we used gzip compression for temporary filestypeToString- function to map string to custom type. User for coverting line to custom type for the purpose of sortingfilterPredicate- predicate to keep data which need to be sorted- Returns:
- the file containing the sorted data
- Throws:
IOException
-
sortAndSave
public static <T> File sortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, boolean usegzip, Function<T,String> typeToString) throws IOException
- Throws:
IOException
-
sortAndSave
public static <T> File sortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, Compression algorithm, Function<T,String> typeToString, @Nullable @Nullable Predicate<T> filterPredicate) throws IOException
Sort a list and save it to a temporary file. In case this method is directly used and path filters predicates are provided, try to avoid usage of ArrayList (tmplist) as removal from ArrayList is O(n) operation.- Parameters:
tmplist- data to be sortedcmp- string comparatorcs- charset to use for output (can use Charset.defaultCharset())tmpdirectory- location of the temporary files (set to null for default location)distinct-algorithm- compression algorithm to use for the temporary filestypeToString- function to map string to custom type. User for coverting line to custom type for the purpose of sortingfilterPredicate- predicate to keep data which need to be sorted- Returns:
- the file containing the sorted data
- Throws:
IOException
-
sortAndSave
public static <T> File sortAndSave(List<T> tmplist, Comparator<T> cmp, Charset cs, File tmpdirectory, boolean distinct, Compression algorithm, Function<T,String> typeToString) throws IOException
- Throws:
IOException
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile) throws IOException
This merges a bunch of temporary flat files- Parameters:
files-outputfile- file- Returns:
- The number of lines sorted. (P. Beaudoin)
- Throws:
IOException
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp) throws IOException
This merges a bunch of temporary flat files- Parameters:
files-outputfile- file- Returns:
- The number of lines sorted. (P. Beaudoin)
- Throws:
IOException
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, boolean distinct) throws IOException
This merges a bunch of temporary flat files- Parameters:
files-outputfile- file- Returns:
- The number of lines sorted. (P. Beaudoin)
- Throws:
IOException
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs, boolean distinct, boolean append, boolean usegzip) throws IOException
This merges a bunch of temporary flat files- Parameters:
files- TheListof sortedFiles to be merged.distinct- Passtrueif duplicate lines should be discarded. (elchetz@gmail.com)outputfile- The outputFileto merge the results to.cmp- TheComparatorto use to compareStrings.cs- TheCharsetto be used for the byte to character conversion.append- Passtrueif result should append toFileinstead of overwrite. Default to be false for overloading methods.usegzip- assumes we used gzip compression for temporary files- Returns:
- The number of lines sorted. (P. Beaudoin)
- Throws:
IOException- Since:
- v0.1.4
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs, boolean distinct, boolean append, Compression algorithm) throws IOException
- Throws:
IOException
-
mergeSortedFiles
public static <T> int mergeSortedFiles(List<File> files, File outputfile, Comparator<T> cmp, Charset cs, boolean distinct, boolean append, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
This merges a bunch of temporary flat files and deletes them on success or error.- Parameters:
files- TheListof sortedFiles to be merged.outputfile- The outputFileto merge the results to.cmp- TheComparatorto use to compareStrings.cs- TheCharsetto be used for the byte to character conversion.distinct- Passtrueif duplicate lines should be discarded. (elchetz@gmail.com)append- Passtrueif result should append toFileinstead of overwrite. Default to be false for overloading methods.usegzip- assumes we used gzip compression for temporary filestypeToString- function to map string to custom type. User for coverting line to custom type for the purpose of sortingstringToType- function to map custom type to string. Used for storing sorted content back to file- Throws:
IOException- Since:
- v0.1.4
-
mergeSortedFiles
public static <T> int mergeSortedFiles(List<File> files, File outputfile, Comparator<T> cmp, Charset cs, boolean distinct, boolean append, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
- Throws:
IOException
-
mergeSortedFiles
public static <T> int mergeSortedFiles(List<File> files, BufferedWriter fbw, Comparator<T> cmp, Charset cs, boolean distinct, boolean usegzip, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
This merges a bunch of temporary flat files and deletes them on success or error.- Parameters:
files- TheListof sortedFiles to be merged.fbw- Buffered writer used to store the sorted contentcmp- TheComparatorto use to compareStrings.cs- TheCharsetto be used for the byte to character conversion.distinct- Passtrueif duplicate lines should be discarded. (elchetz@gmail.com)usegzip- assumes we used gzip compression for temporary filestypeToString- function to map string to custom type. User for coverting line to custom type for the purpose of sortingstringToType- function to map custom type to string. Used for storing sorted content back to file- Throws:
IOException- Since:
- v0.1.4
-
mergeSortedFiles
public static <T> int mergeSortedFiles(List<File> files, BufferedWriter fbw, Comparator<T> cmp, Charset cs, boolean distinct, Compression algorithm, Function<T,String> typeToString, Function<String,T> stringToType) throws IOException
This merges a bunch of temporary flat files and deletes them on success or error.- Parameters:
files- TheListof sortedFiles to be merged.fbw- Buffered writer used to store the sorted contentcmp- TheComparatorto use to compareStrings.cs- TheCharsetto be used for the byte to character conversion.distinct- Passtrueif duplicate lines should be discarded. (elchetz@gmail.com)algorithm- algorithm for compression by default assumes we used gzip compression for temporary filestypeToString- function to map string to custom type. User for coverting line to custom type for the purpose of sortingstringToType- function to map custom type to string. Used for storing sorted content back to file- Throws:
IOException- Since:
- v0.1.4
-
merge
public static <T> int merge(BufferedWriter fbw, Comparator<T> cmp, boolean distinct, List<org.apache.jackrabbit.oak.commons.sort.BinaryFileBuffer<T>> buffers, Function<T,String> typeToString) throws IOException
This merges several BinaryFileBuffer to an output writer.- Parameters:
fbw- A buffer where we write the data.cmp- A comparator object that tells us how to sort the lines.distinct- Passtrueif duplicate lines should be discarded. (elchetz@gmail.com)buffers- Where the data should be read.typeToString- function to map string to custom type. User for converting line to custom type for the purpose of sorting- Throws:
IOException
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs, boolean distinct) throws IOException
This merges a bunch of temporary flat files- Parameters:
files- TheListof sortedFiles to be merged.distinct- Passtrueif duplicate lines should be discarded. (elchetz@gmail.com)outputfile- The outputFileto merge the results to.cmp- TheComparatorto use to compareStrings.cs- TheCharsetto be used for the byte to character conversion.- Returns:
- The number of lines sorted. (P. Beaudoin)
- Throws:
IOException- Since:
- v0.1.2
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile, Comparator<String> cmp, Charset cs) throws IOException
This merges a bunch of temporary flat files- Parameters:
files-outputfile- filecs- character set to use to load the strings- Returns:
- The number of lines sorted. (P. Beaudoin)
- Throws:
IOException
-
displayUsage
public static void displayUsage()
-
main
public static void main(String[] args) throws IOException
- Throws:
IOException
-
-