logtrix

Examples

Parsing a log file

try (CrawlLogIterator log = new CrawlLogIterator(Paths.get("crawl.log"))) {
    for (CrawlDataItem line : log) {
        System.out.println(line.getStatusCode());
        System.out.println(line.getURL());
    }
}

Grouping the summary by various things

CrawlSummary.byRegisteredDomain(log);
CrawlSummary.byHost(log);
CrawlSummary.byKey(log, item -> item.getCaptureBegan().toString().substring(0, 4)); // by year

Limit top N results

CrawlSummary.build(log).topN(10); // top 10 status codes, mime-types etc

Working with status codes

StatusCodes.describe(404);      // "Not found"
StatusCodes.describe(-4);       // "HTTP timeout"
StatusCodes.isError(-4);        // true
StatusCodes.isServerError(503); // true

Command-line interface

Output a JSON crawl summary grouped by registered domain:

java org.netpreserve.logtrix.CrawlSummary -g registered-domain crawl.log

For more options:

java org.netpreserve.logtrix.CrawlSummary --help

Name	Latest commit message	Commit time
Failed to load latest commit information.
resources	UI added	Apr 5, 2019
src/org/netpreserve/logtrix	More flexible duplicate detection	Apr 5, 2019
test-resources/org/netpreserve/logtrix	Add basic test	Apr 3, 2019
test/org.netpreserve.logtrix	Omit null fields for smaller JSON	Apr 4, 2019
README.md	Add option to limit results to top N	Apr 4, 2019
pom.xml	Group summary by host and registered domain	Apr 4, 2019

iipc/logtrix

Join GitHub today

Clone with HTTPS

Downloading...

Launching GitHub Desktop...

Launching GitHub Desktop...

Launching Xcode...

Launching Visual Studio...

README.md

logtrix

Examples

Parsing a log file

Grouping the summary by various things

Limit top N results

Working with status codes

Command-line interface