Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reduce the size of the jar file by excluding language profiles? #27

Open
seinecle opened this issue Feb 24, 2020 · 2 comments
Open

How to reduce the size of the jar file by excluding language profiles? #27

seinecle opened this issue Feb 24, 2020 · 2 comments
Labels

Comments

@seinecle
Copy link

@seinecle seinecle commented Feb 24, 2020

I need to run this lib in a memory constrained environment: less than 200Mb for the unzipped package. How can I exclude rare language profiles from the library?

An alternative: can the memory size be significantly decreased by minifying the json files used for each language?

Note: I am using the maven build of the lingua.

@pemistahl pemistahl added the question label Feb 25, 2020
@pemistahl pemistahl changed the title diminishing the size of the jar by excluding some language profiles How to reduce the size of the jar file by excluding language profiles? Feb 25, 2020
@pemistahl

This comment has been minimized.

Copy link
Owner

@pemistahl pemistahl commented Feb 25, 2020

Hi @seinecle, thanks for your question. Are you talking about limitation of RAM (1) or hard disk space (2)?

For (1): It is enough to simply specify only those languages within the LanguageDetectorBuilder that you want taking part in the classification process. It is guaranteed that only the language models for the specified languages are loaded into RAM. The language model json files have already been minified, so they have the smallest possible file size.

For (2): Actually, I have not yet thought about this use case of limited hard disk space. One way to overcome this could be to write a custom Gradle task that allows to specify the languages that the jar file should contain while building the jar file in a subsequent step. This would require to not try to load the language models of the excluded languages into RAM, otherwise an exception would be thrown that the requested language models could not be found.
Alternatively, you could unpack the jar file (it is actually just a renamed zip archive with some metadata), remove some of the language models in the resources folder and rezip the jar file again. But I'm not sure whether this would work.

By the way, why are you using the old Maven-based version? Because the jar file is smaller?

@seinecle

This comment has been minimized.

Copy link
Author

@seinecle seinecle commented Feb 25, 2020

Thanks @pemistahl ! My issue is the second case (hard disk space). I am integrating lingua in my Java project by using the maven dependency:

<!-- https://mvnrepository.com/artifact/com.github.pemistahl/lingua -->
<dependency>
    <groupId>com.github.pemistahl</groupId>
    <artifactId>lingua</artifactId>
    <version>0.6.1</version>
</dependency>

Is there a better way? I realize my first comment in this issue was mistaken: I use the lingua library via a maven integration, but it is not maven built. Sorry for the confusion.

Not sure my use case is a common one. For reference, the optimaize library (not maintained anymore - thanks for mentioning lingua in an issue, btw!) is less than 6Mb large. I had to revert to it because lingua can't fit my allocated memory space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.