[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]





 "http://www.w3.org/TR/html4/loose.dtd">

<li>date: Tue Feb 17 05:42:37 2004</li>
<li>from: ron at Opus1.COM (Ronald Chmara)</li>
<li>in-reply-to: <<a href="msg00517.html">[email protected]</a>></li>
<li>references: <<a href="msg00517.html">[email protected]</a>></li>
<li>subject: [ale] Any MySQL Experts around?</li>

I disagree, because I've worked on a project where 2.4 million dollars 
(per year) hinged on doing this...

&gt;  I've worked on writing reporting solutions for
&gt; huge sets of log data, and I've been a systems admin and DBA (MySQL and
&gt; Informix).  There may be a solution I've not seen but in my experience
&gt; you're not going to get good performance doing what you plan to do.

*nod*

Doing this right required a lot of CPU, or really strong design. We 
chose the former, the latter cost too much for queries that were run 
once a month (4 hour return time vs. 1.2 seconds, but the costing 
estimates of implementing the 1.2 second solution were fairly large).

&gt; Storing all your log files on a central server is an excellent idea.  I
&gt; would store them as text files if you can.  What I would suggest for
&gt; better reporting is to decide what needs reporting and write a perl 
&gt; script
&gt; (most efficient afaik for text processing) to parse your files and 
&gt; store only
&gt; *aggregate* data in a database.

Note that pre-processing via perl (or whatever) can *significantly* 
shrink a data set.  Example: if you have 20 directories off of the 
parent root of a data site, tracking (and searching) 20 integers is 
much smaller (and faster) than tracking and searching 20 text fields. 
Same with GET/PUT/POST, same with User agents, daemons, user, etc... 
even form fields and query strings can be managed the same way.

Aggregate is not required, but the same rules to limit a data set might 
help. Don't repeat data. Don't repeat things like &lt;host&gt; [apache], 
POST, &lt;homedir&gt;, &lt;childdir&gt; etc.... If it occurs less than 65,000 times 
in a db, turn it into an integer token. Heck, I'd tokenize it after 600 
times. I prefer a result after 1.2 seconds. :-)

Of course, db's with bad joining (cough) can perform poorly in such 
situations.

-Bop


</pre>
<!--X-Body-of-Message-End-->
<!--X-MsgBody-End-->
<!--X-Follow-Ups-->
<hr>
<!--X-Follow-Ups-End-->
<!--X-References-->
<ul><li><strong>References</strong>:
<ul>
<li><strong><a name="00517" href="msg00517.html">[ale] Any MySQL Experts around?</a></strong>
<ul><li><em>From:</em> jtaylor at onlinea.com (J.M. Taylor)</li></ul></li>
</ul></li></ul>
<!--X-References-End-->
<!--X-BotPNI-->
<ul>
<li>Prev by Date:
<strong><a href="msg00553.html">[ale] Windows 2000 source</a></strong>
</li>
<li>Next by Date:
<strong><a href="msg00555.html">[ale] OT: Delta Electronics</a></strong>
</li>
<li>Previous by thread:
<strong><a href="msg00544.html">[ale] Any MySQL Experts around?</a></strong>
</li>
<li>Next by thread:
<strong><a href="msg00518.html">[ale] OT: MS disabling features and changing old standards.</a></strong>
</li>
<li>Index(es):
<ul>
<li><a href="maillist.html#00554"><strong>Date</strong></a></li>
<li><a href="threads.html#00554"><strong>Thread</strong></a></li>
</ul>
</li>
</ul>

<!--X-BotPNI-End-->
<!--X-User-Footer-->
<!--X-User-Footer-End-->
</body>
</html>

Prev by Date: [no subject]
Next by Date: [no subject]
Previous by thread: [no subject]
Next by thread: [no subject]
Index(es):
- Date
- Thread