Recently faced a problem in analyzing log.
"Single line log file of 10GB size needs to be read and all IP address must be printed"
Issue: Cannot read line by line to avoid memory corruption. Have to go for character by character.
Solution:
#!/usr/bin/python
import re
def getIP():
ip = re.compile('\d+|\\.')
out = []
with open("./ipaddr","r") as f:
while True:
c = f.read(1)
if not c:
break
if ip.match(c):
out.append(c)
for i in range(14):
c = f.read(1)
if ip.match(c):
out.append(c)
else:
if out:
yield "".join(out)
out = []
print str([ipad for ipad in getIP()])
Any ideas to simplify ??
"Single line log file of 10GB size needs to be read and all IP address must be printed"
Issue: Cannot read line by line to avoid memory corruption. Have to go for character by character.
Solution:
#!/usr/bin/python
import re
def getIP():
ip = re.compile('\d+|\\.')
out = []
with open("./ipaddr","r") as f:
while True:
c = f.read(1)
if not c:
break
if ip.match(c):
out.append(c)
for i in range(14):
c = f.read(1)
if ip.match(c):
out.append(c)
else:
if out:
yield "".join(out)
out = []
print str([ipad for ipad in getIP()])
Any ideas to simplify ??
No comments:
Post a Comment