Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upInvalid format exception in scanJobLog #239
Comments
ato
added
the
bug
label
Mar 11, 2019
added a commit
to ukwa/heritrix3
that referenced
this issue
Mar 11, 2019
This comment has been minimized.
This comment has been minimized.
I have a patch but haven't tested it yet. |
This comment has been minimized.
This comment has been minimized.
@anjackson happy to test with that |
This comment has been minimized.
This comment has been minimized.
@ruebot Awesome, thanks. I've popped a SNAPSHOT build here: Can you give it a spin? |
This comment has been minimized.
This comment has been minimized.
Good to go on my end: $ tail -f job.log
2019-03-01T12:01:32.130Z INFO Job instantiated
2019-03-01T12:01:42.293Z INFO Job launched
2019-03-01T12:01:43.923Z INFO PREPARING 20190301120143
2019-03-01T12:01:43.955Z INFO PAUSED 20190301120143
2019-03-01T12:01:54.079Z INFO RUNNING 20190301120143
2019-03-01T12:02:01.454Z WARNING nowhere to log added seed: http://calendars.students.yorku.ca/ (in thread 'ToeThread #21: http://calendars.registrar.yorku.ca/'; in processor 'candidates')
2019-03-01T12:02:07.490Z WARNING nowhere to log added seed: https://calendars.students.yorku.ca/ (in thread 'ToeThread #21: http://calendars.students.yorku.ca/'; in processor 'candidates')
2019-03-11T12:26:03.810Z INFO CHECKPOINTED cp00001-20190311122554
2019-03-11T12:26:06.453Z INFO PAUSING 20190301120143
2019-03-11T12:26:07.702Z INFO PAUSED 20190301120143 $ cat heritrix_out.log
Tue Mar 12 08:32:54 EDT 2019 Starting heritrix
Linux wombat 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
JAVA_OPTS= -Xmx256m
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256586
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 256586
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Oracle Corporation OpenJDK Runtime Environment 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12
Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore adhoc.keystore -destkeystore adhoc.keystore -deststoretype pkcs12".
Using ad-hoc HTTPS certificate with fingerprint...
SHA1:E7:65:F2:C6:5A:90:A0:79:61:FF:28:1F:24:81:A6:74:D0:38:2C:0B
Verify in browser before accepting exception.
2019-03-12 12:32:55.523 INFO thread-1 org.archive.crawler.framework.Engine.addJobDirectory() added crawl job: academic-calendars
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
engine listening at port 9191
operator login set per command-line
NOTE: We recommend a longer, stronger password, especially if your web
interface will be internet-accessible.
Heritrix version: 3.4.0-SNAPSHOT-2019-03-11T21:36:31Z
log4j:WARN No appenders could be found for logger (freemarker.cache).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ato commentedMar 11, 2019
@ruebot encountered the following exception after checkpointing and restarting Heritrix:
Looks like a bug here:
heritrix3/engine/src/main/java/org/archive/crawler/framework/CrawlJob.java
Line 168 in aa705be
If the job log is larger than 100KB,
startPosition
is set to 100KB from the end which might be in the middle of a line. If that partial line still happens to matchPattern.compile("(\\S+) (\\S+) Job launched")
then an incomplete timestamp may be parsed causing the exception.@anjackson suggests the following fix: