- Read Content from Files which are inside Zip file
- 6 Answers 6
- In Java 8
- In Java 7
- Java ZipInputStream
- Java ZipInputStream
- ZIP
- ZipInputStream constructors
- ZipInputStream getNextEntry
- Java read ZIP example
- Java decompress ZIP example
- Author
- Class ZipInputStream
- Field Summary
- Fields declared in class java.util.zip.InflaterInputStream
- Fields declared in class java.io.FilterInputStream
- Constructor Summary
- Method Summary
- Methods declared in class java.util.zip.InflaterInputStream
- Methods declared in class java.io.FilterInputStream
- Methods declared in class java.io.InputStream
- Methods declared in class java.lang.Object
- Field Details
- LOCSIG
- EXTSIG
- CENSIG
- ENDSIG
- LOCHDR
- EXTHDR
- CENHDR
- ENDHDR
- LOCVER
- LOCFLG
- LOCHOW
- LOCTIM
- LOCCRC
- LOCSIZ
- LOCLEN
- LOCNAM
- LOCEXT
- EXTCRC
- EXTSIZ
- EXTLEN
- CENVEM
- CENVER
- CENFLG
- CENHOW
- CENTIM
- CENCRC
- CENSIZ
- CENLEN
- CENNAM
- CENEXT
- CENCOM
- CENDSK
- CENATT
- CENATX
- CENOFF
- ENDSUB
- ENDTOT
- ENDSIZ
- ENDOFF
- ENDCOM
- Constructor Details
- ZipInputStream
- ZipInputStream
- Method Details
- getNextEntry
- closeEntry
- available
- read
- skip
- close
- createZipEntry
- How to read file from ZIP using InputStream?
- 7 Answers 7
Read Content from Files which are inside Zip file
I am trying to create a simple java program which reads and extracts the content from the file(s) inside zip file. Zip file contains 3 files (txt, pdf, docx). I need to read the contents of all these files and I am using Apache Tika for this purpose. Can somebody help me out here to achieve the functionality. I have tried this so far but no success Code Snippet
public class SampleZipExtract < public static void main(String[] args) < ListtempString = new ArrayList(); StringBuffer sbf = new StringBuffer(); File file = new File("C:\\Users\\xxx\\Desktop\\abc.zip"); InputStream input; try < input = new FileInputStream(file); ZipInputStream zip = new ZipInputStream(input); ZipEntry entry = zip.getNextEntry(); BodyContentHandler textHandler = new BodyContentHandler(); Metadata metadata = new Metadata(); Parser parser = new AutoDetectParser(); while (entry!= null)< if(entry.getName().endsWith(".txt") || entry.getName().endsWith(".pdf")|| entry.getName().endsWith(".docx"))< System.out.println("entry=" + entry.getName() + " " + entry.getSize()); parser.parse(input, textHandler, metadata, new ParseContext()); tempString.add(textHandler.toString()); >> zip.close(); input.close(); for (String text : tempString) < System.out.println("Apache Tika - Converted input string : " + text); sbf.append(text); System.out.println("Final text from all the three files " + sbf.toString()); >catch (FileNotFoundException e) < // TODO Auto-generated catch block e.printStackTrace(); >catch (IOException e) < // TODO Auto-generated catch block e.printStackTrace(); >catch (SAXException e) < // TODO Auto-generated catch block e.printStackTrace(); >catch (TikaException e) < // TODO Auto-generated catch block e.printStackTrace(); >> >
Why not pass the zip file straight to Apache Tika? It’ll then call the recursing parser you supply for each file in the zip, so you don’t have to do anything special!
That’s what I was wondering but couldn’t get enough tutorial in how to do that. I am also little worried about this — javamex.com/tutorials/compression/zip_problems.shtml, not sure if Tika address this issue.
61 Mb for Tika? 61 Mb only for working with ZIP which can be done with ~10 strings?! My app with 15+ activities weights smaller than 4 Mb. I think there’s a disrespection for users to have apps so big only for trivial tasks.
6 Answers 6
If you’re wondering how to get the file content from each ZipEntry it’s actually quite simple. Here’s a sample code:
public static void main(String[] args) throws IOException < ZipFile zipFile = new ZipFile("C:/test.zip"); Enumerationentries = zipFile.entries(); while(entries.hasMoreElements()) < ZipEntry entry = entries.nextElement(); InputStream stream = zipFile.getInputStream(entry); >>
Once you have the InputStream you can read it however you want.
Is there a way to pass byte[] array to the constructor of ZipFile (content.getBytes()) ? if not how can we do this?
@Simple-Solution I think the easiest way to do that is write the byte array into a new File , and give that File instance to the constructor
As of Java 7, the NIO АРI provides a better and more generic way of accessing the contents of ZIP or JAR files. Actually, it is now a unified API which allows you to treat ZIP files exactly like normal files.
In order to extract all of the files contained inside of a ZIP file in this API, you’d do as shown below.
In Java 8
private void extractAll(URI fromZip, Path toDirectory) throws IOException < FileSystems.newFileSystem(fromZip, Collections.emptyMap()) .getRootDirectories() .forEach(root -> < // in a full implementation, you'd have to // handle directories Files.walk(root).forEach(path ->Files.copy(path, toDirectory)); >); >
In Java 7
private void extractAll(URI fromZip, Path toDirectory) throws IOException < FileSystem zipFs = FileSystems.newFileSystem(fromZip, Collections.emptyMap()); for (Path root : zipFs.getRootDirectories()) < Files.walkFileTree(root, new SimpleFileVisitor() < @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException < // You can do anything you want with the path here Files.copy(file, toDirectory); return FileVisitResult.CONTINUE; >@Override public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException < // In a full implementation, you'd need to create each // sub-directory of the destination directory before // copying files into it return super.preVisitDirectory(dir, attrs); >>); > >
Because of the condition in while , the loop might never break:
Instead of the null check there, you can try this:
ZipEntry entry = null; while ((entry = zip.getNextEntry()) != null) < // Rest of your code >
@Shatir hopefully you’ve tried this and realized that there wouldn’t be a reference to the ZipEntry for use inside the while block. This would also work if you’d prefer: ZipEntry entry = zip.getNextEntry(); while (entry !=null) < /* do stuff */ entry = zip.getNextEntry(); >
Sample code you can use to let Tika take care of container files for you. http://wiki.apache.org/tika/RecursiveMetadata
Form what I can tell, the accepted solution will not work for cases where there are nested zip files. Tika, however will take care of such situations as well.
My way of achieving this is by creating ZipInputStream wrapping class that would handle that would provide only the stream of current entry:
public class ZippedFileInputStream extends InputStream < private ZipInputStream is; public ZippedFileInputStream(ZipInputStream is)< this.is = is; >@Override public int read() throws IOException < return is.read(); >@Override public void close() throws IOException
ZipInputStream zipInputStream = new ZipInputStream(new FileInputStream("SomeFile.zip")); while((entry = zipInputStream.getNextEntry())!= null) < ZippedFileInputStream archivedFileInputStream = new ZippedFileInputStream(zipInputStream); //. perform whatever logic you want here with ZippedFileInputStream // note that this will only close the current entry stream and not the ZipInputStream archivedFileInputStream.close(); >zipInputStream.close();
One advantage of this approach: InputStreams are passed as an arguments to methods that process them and those methods have a tendency to immediately close the input stream after they are done with it.
i did mine like this and remember to change url or zip files jdk 15
import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.net.MalformedURLException; import java.net.URL; import java.util.Scanner; import java.util.stream.Stream; import java.util.zip.ZipEntry; import java.util.zip.ZipFile; import java.io.*; import java.util.*; import java.nio.file.Paths; class Main < public static void main(String[] args) throws MalformedURLException,FileNotFoundException,IOException< String url,kfile; Scanner getkw = new Scanner(System.in); System.out.println(" Please Paste Url ::"); url = getkw.nextLine(); System.out.println("Please enter name of file you want to save as :: "); kfile = getkw.nextLine(); getkw.close(); Main Dinit = new Main(); System.out.println(Dinit.dloader(url, kfile)); ZipFile Vanilla = new ZipFile(new File("Vanilla.zip")); Enumerationentries = Vanilla.entries(); while(entries.hasMoreElements()) < ZipEntry entry = entries.nextElement(); // String nextr = entries.nextElement(); InputStream stream = Vanilla.getInputStream(entry); FileInputStream inpure= new FileInputStream("Vanilla.zip"); FileOutputStream outter = new FileOutputStream(new File(entry.toString())); outter.write(inpure.readAllBytes()); outter.close(); >> private String dloader(String kurl, String fname)throws IOException < String status =""; try < URL url = new URL("URL here"); FileOutputStream out = new FileOutputStream(new File("Vanilla.zip")); // Output File out.write(url.openStream().readAllBytes()); out.close(); >catch (MalformedURLException e) < status = "Status: MalformedURLException Occured"; >catch (IOException e) < status = "Status: IOexception Occured"; >finally < status = "Status: Good";>String path="\\tkwgter5834\\"; extractor(fname,"tkwgter5834",path); return status; > private String extractor(String fname,String dir,String path) < File folder = new File(dir); if(!folder.exists())< folder.mkdir(); >return ""; > >
Java ZipInputStream
Java ZipInputStream tutorial shows how to read ZIP files in Java with ZipInputStream .
Java ZipInputStream
ZipInputStream is a Java class that implements an input stream filter for reading files in the ZIP file format. It has support for both compressed and uncompressed entries.
ZIP
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. Java Archive (JAR) is built on the ZIP format.
ZipInputStream constructors
ZipInputStream has the following constructors:
ZipInputStream(InputStream in) ZipInputStream(InputStream in, Charset charset)
ZipInputStream getNextEntry
The ZipInputStream’s getNextEntry reads the next ZIP file entry and positions the stream at the beginning of the entry data.
Java read ZIP example
The following example reads the contents of a ZIP file.
package com.zetcode; import java.io.BufferedInputStream; import java.io.FileInputStream; import java.io.IOException; import java.time.LocalDate; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; public class JavaReadZip < private final static Long MILLS_IN_DAY = 86400000L; public static void main(String[] args) throws IOException < String fileName = "src/resources/myfile.zip"; try (FileInputStream fis = new FileInputStream(fileName); BufferedInputStream bis = new BufferedInputStream(fis); ZipInputStream zis = new ZipInputStream(bis)) < ZipEntry ze; while ((ze = zis.getNextEntry()) != null) < System.out.format("File: %s Size: %d last modified: %d", ze.getName(), ze.getSize(), LocalDate.ofEpochDay(ze.getTime() / MILLS_IN_DAY)); >> > >
The example reads the given ZIP file with ZipInputStream and prints its contents to the terminal. We print the file names, their size, and the last modification time.
String fileName = "src/resources/myfile.zip";
The ZIP file is located int src/resources/ directory.
try (FileInputStream fis = new FileInputStream(fileName);
We create a FileInputStream from the file. FileInputStream is used for reading streams of raw bytes.
BufferedInputStream bis = new BufferedInputStream(fis);
For better performance, we pass the FileInputStream into the BufferedInputStream .
ZipInputStream zis = new ZipInputStream(bis))A ZipInputStream is created from the buffered FileInputStream . The try-with-resources closes the streams when they are not needed anymore.
while ((ze = zis.getNextEntry()) != null)In a while loop, we go through the entries of the ZIP file with getNextEntry method. It returns null if there are no more entries.
System.out.format("File: %s Size: %d last modified: %d", ze.getName(), ze.getSize(), LocalDate.ofEpochDay(ze.getTime() / MILLS_IN_DAY));The getName returns the name of the entry, the getSize returns the uncompressed size of the entry, and the getTime returns the last modification time of the entry.
Java decompress ZIP example
In the next example, we decompress a ZIP file in Java.
package com.zetcode; import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.FileInputStream; import java.io.FileOutputStream; import java.nio.file.Path; import java.nio.file.Paths; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; public class JavaUnzip < public static void main(String args[]) throws Exception < byte[] buffer = new byte[2048]; Path outDir = Paths.get("src/resources/output/"); String zipFileName = "src/resources/myfile.zip"; try (FileInputStream fis = new FileInputStream(zipFileName); BufferedInputStream bis = new BufferedInputStream(fis); ZipInputStream stream = new ZipInputStream(bis)) < ZipEntry entry; while ((entry = stream.getNextEntry()) != null) < Path filePath = outDir.resolve(entry.getName()); try (FileOutputStream fos = new FileOutputStream(filePath.toFile()); BufferedOutputStream bos = new BufferedOutputStream(fos, buffer.length)) < int len; while ((len = stream.read(buffer)) >0) < bos.write(buffer, 0, len); >> > > > >The example uses ZipInputStream to read the contents of the given ZIP file and FileOutputStream and BufferedOutputStream to write the contents into a directory.
Path outDir = Paths.get("src/resources/output/");This is the directory where we extract the contents of the ZIP file.
while ((entry = stream.getNextEntry()) != null)In the first while loop, we go through the entries of the ZIP file.
while ((len = stream.read(buffer)) > 0)In the second while loop, we read the entries and write them to the output stream.
In this article we have presented the Java ZipInputStream class. We have created two examples to read a ZIP file and to decompress a ZIP file.
Author
My name is Jan Bodnar and I am a passionate programmer with many years of programming experience. I have been writing programming articles since 2007. So far, I have written over 1400 articles and 8 e-books. I have over eight years of experience in teaching programming.
Class ZipInputStream
This class implements an input stream filter for reading files in the ZIP file format. Includes support for both compressed and uncompressed entries.
Field Summary
Fields declared in class java.util.zip.InflaterInputStream
Fields declared in class java.io.FilterInputStream
Constructor Summary
Method Summary
Methods declared in class java.util.zip.InflaterInputStream
Methods declared in class java.io.FilterInputStream
Methods declared in class java.io.InputStream
Methods declared in class java.lang.Object
Field Details
LOCSIG
EXTSIG
CENSIG
ENDSIG
LOCHDR
EXTHDR
CENHDR
ENDHDR
LOCVER
LOCFLG
LOCHOW
LOCTIM
LOCCRC
LOCSIZ
LOCLEN
LOCNAM
LOCEXT
EXTCRC
EXTSIZ
EXTLEN
CENVEM
CENVER
CENFLG
CENHOW
CENTIM
CENCRC
CENSIZ
CENLEN
CENNAM
CENEXT
CENCOM
CENDSK
CENATT
CENATX
CENOFF
ENDSUB
ENDTOT
ENDSIZ
ENDOFF
ENDCOM
Constructor Details
ZipInputStream
ZipInputStream
Method Details
getNextEntry
closeEntry
available
Returns 0 after EOF has reached for the current entry data, otherwise always return 1. Programs should not count on this method to return the actual number of bytes that could be read without blocking.
read
Reads from the current ZIP entry into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
skip
close
createZipEntry
Report a bug or suggest an enhancement
For further API reference and developer documentation see the Java SE Documentation, which contains more detailed, developer-targeted descriptions with conceptual overviews, definitions of terms, workarounds, and working code examples. Other versions.
Java is a trademark or registered trademark of Oracle and/or its affiliates in the US and other countries.
Copyright © 1993, 2023, Oracle and/or its affiliates, 500 Oracle Parkway, Redwood Shores, CA 94065 USA.
All rights reserved. Use is subject to license terms and the documentation redistribution policy.How to read file from ZIP using InputStream?
I must get file content from ZIP archive (only one file, I know its name) using SFTP. The only thing I'm having is ZIP's InputStream . Most examples show how get content using this statement:
ZipFile zipFile = new ZipFile("location");
But as I said, I don't have ZIP file on my local machine and I don't want to download it. Is an InputStream enough to read? UPD: This is how I do:
import java.util.zip.ZipInputStream; import com.jcraft.jsch.Channel; import com.jcraft.jsch.ChannelSftp; import com.jcraft.jsch.JSch; import com.jcraft.jsch.Session; public class SFTP < public static void main(String[] args) < String SFTPHOST = "host"; int SFTPPORT = 3232; String SFTPUSER = "user"; String SFTPPASS = "mypass"; String SFTPWORKINGDIR = "/dir/work"; Session session = null; Channel channel = null; ChannelSftp channelSftp = null; try < JSch jsch = new JSch(); session = jsch.getSession(SFTPUSER, SFTPHOST, SFTPPORT); session.setPassword(SFTPPASS); java.util.Properties config = new java.util.Properties(); config.put("StrictHostKeyChecking", "no"); session.setConfig(config); session.connect(); channel = session.openChannel("sftp"); channel.connect(); channelSftp = (ChannelSftp) channel; channelSftp.cd(SFTPWORKINGDIR); ZipInputStream stream = new ZipInputStream(channelSftp.get("file.zip")); ZipEntry entry = zipStream.getNextEntry(); System.out.println(entry.getName); //Yes, I got its name, now I need to get content >catch (Exception ex) < ex.printStackTrace(); >finally < session.disconnect(); channelSftp.disconnect(); channel.disconnect(); >> >
No reason that should not work , you will just have to get all the ZIPEntries and save those from the stream
7 Answers 7
Below is a simple example on how to extract a ZIP File, you will need to check if the file is a directory. But this is the simplest.
The step you are missing is reading the input stream and writing the contents to a buffer which is written to an output stream.
// Expands the zip file passed as argument 1, into the // directory provided in argument 2 public static void main(String args[]) throws Exception < if(args.length != 2) < System.err.println("zipreader zipfile outputdir"); return; >// create a buffer to improve copy performance later. byte[] buffer = new byte[2048]; // open the zip file stream InputStream theFile = new FileInputStream(args[0]); ZipInputStream stream = new ZipInputStream(theFile); String outdir = args[1]; try < // now iterate through each item in the stream. The get next // entry call will return a ZipEntry for each file in the // stream ZipEntry entry; while((entry = stream.getNextEntry())!=null) < String s = String.format("Entry: %s len %d added %TD", entry.getName(), entry.getSize(), new Date(entry.getTime())); System.out.println(s); // Once we get the entry from the stream, the stream is // positioned read to read the raw data, and we keep // reading until read returns 0 or less. String outpath = outdir + "/" + entry.getName(); FileOutputStream output = null; try < output = new FileOutputStream(outpath); int len = 0; while ((len = stream.read(buffer)) >0) < output.write(buffer, 0, len); >> finally < // we must always close the output file if(output!=null) output.close(); >> > finally < // we must always close the zip file. stream.close(); >>
Code excerpt came from the following site: