How to reduce the size of the jar file by excluding language profiles? #27

seinecle · 2020-02-24T16:55:37Z

I need to run this lib in a memory constrained environment: less than 200Mb for the unzipped package. How can I exclude rare language profiles from the library?

An alternative: can the memory size be significantly decreased by minifying the json files used for each language?

Note: I am using the maven build of the lingua.

pemistahl · 2020-02-25T11:57:37Z

Hi @seinecle, thanks for your question. Are you talking about limitation of RAM (1) or hard disk space (2)?

For (1): It is enough to simply specify only those languages within the LanguageDetectorBuilder that you want taking part in the classification process. It is guaranteed that only the language models for the specified languages are loaded into RAM. The language model json files have already been minified, so they have the smallest possible file size.

For (2): Actually, I have not yet thought about this use case of limited hard disk space. One way to overcome this could be to write a custom Gradle task that allows to specify the languages that the jar file should contain while building the jar file in a subsequent step. This would require to not try to load the language models of the excluded languages into RAM, otherwise an exception would be thrown that the requested language models could not be found.
Alternatively, you could unpack the jar file (it is actually just a renamed zip archive with some metadata), remove some of the language models in the resources folder and rezip the jar file again. But I'm not sure whether this would work.

By the way, why are you using the old Maven-based version? Because the jar file is smaller?

seinecle · 2020-02-25T13:04:38Z

Thanks @pemistahl ! My issue is the second case (hard disk space). I am integrating lingua in my Java project by using the maven dependency:

<!-- https://mvnrepository.com/artifact/com.github.pemistahl/lingua -->
<dependency>
    <groupId>com.github.pemistahl</groupId>
    <artifactId>lingua</artifactId>
    <version>0.6.1</version>
</dependency>

~~Is there a better way?~~ I realize my first comment in this issue was mistaken: I use the lingua library via a maven integration, but it is not maven built. Sorry for the confusion.

Not sure my use case is a common one. For reference, the optimaize library (not maintained anymore - thanks for mentioning lingua in an issue, btw!) is less than 6Mb large. I had to revert to it because lingua can't fit my allocated memory space.

pemistahl added the question label Feb 25, 2020

pemistahl changed the title ~~diminishing the size of the jar by excluding some language profiles~~ How to reduce the size of the jar file by excluding language profiles? Feb 25, 2020

pemistahl / lingua

How to reduce the size of the jar file by excluding language profiles? #27

How to reduce the size of the jar file by excluding language profiles? #27

seinecle commented Feb 24, 2020 •

edited

This comment has been minimized.

pemistahl commented Feb 25, 2020 •

edited

This comment has been minimized.

seinecle commented Feb 25, 2020 •

edited

pemistahl / lingua

Join GitHub today

How to reduce the size of the jar file by excluding language profiles? #27

How to reduce the size of the jar file by excluding language profiles? #27

Comments

seinecle commented Feb 24, 2020 • edited

This comment has been minimized.

pemistahl commented Feb 25, 2020 • edited

This comment has been minimized.

seinecle commented Feb 25, 2020 • edited

seinecle commented Feb 24, 2020 •

edited

pemistahl commented Feb 25, 2020 •

edited

seinecle commented Feb 25, 2020 •

edited