Tim Allison
@tallison314159.bsky.social
📤 85
📥 138
📝 34
Files, search, crawling, security.
#ApacheTika
among others...
In 4 hours (noon EST), I'm hosting a demo with office hours for
#ApacheTika
in belated celebration of World Digital Preservation Day
#wdpd2025
!
www.meetup.com/apache-tika-...
Please dm me for the meeting info.
loading . . .
Apache Tika -- What's New/Office Hours, Thu, Nov 13, 2025, 12:00 PM | Meetup
This will be an expansion of my presentation at the Digital Preservation Bake Off (Tools Demonstration) #iPres2025 and a late entry to celebrate World Digital Preservation
https://www.meetup.com/apache-tika-community/events/311746184/
19 days ago
0
1
0
reposted by
Tim Allison
American Dialect Society
26 days ago
Make your 2025 words-of-the-year nominations for the only vote that matters!
bit.ly/2025WOTYNOMS
0
5
7
reposted by
Tim Allison
Eric Geller
27 days ago
New: Google says it has discovered at least 5 malware families that use AI to rewrite their code and generate new capabilities on the fly, suggesting AI-powered malware is finally starting to take off.
cloud.google.com/blog/topics/...
Report also has interesting stories about state actors' AI use.
0
71
55
If you're attending
#iPres2025
, make sure to check out
@petervwyatt.bsky.social
's tutorial on Monday: "A forensic spotlight on PDF/A"!
twelve.eventsair.com/QuickEventWe...
loading . . .
iPRES 2025 - TUTORIAL 3: A forensic spotlight on PDF/A
https://twelve.eventsair.com/QuickEventWebsitePortal/ipres2025/ipres/Agenda/AgendaItemDetail?id=98e546c8-1739-48a5-ae35-43891b76f307
about 1 month ago
0
0
0
reposted by
Tim Allison
ToxSec
about 1 month ago
Is AI fueling the old 'Dead Internet' conspiracy theory? Yes! AI is building a fake internet just for you.
#ai
#psychology
#cybersecurity
#society
#internet
www.toxsec.com/p/ai-is-buil...
loading . . .
The Dead Internet - AI is Building a Fake Internet Just for You
How Generative AI is Fueling the "Dead Internet Theory," Creating an Authenticity Crisis, and Why AI Detection Can't Save Us.
https://www.toxsec.com/p/ai-is-building-a-fake-internet-just
1
6
2
In belated celebration of World Digital Preservation Day, I'm throwing a "What's new with Apache Tika/Office hours" meetup: November 13, noon EST. Everyone interested in files is welcome to join!
#ApacheTika
#wdpd2025
#digipres
#fileForensics
#reverseEngineering
www.meetup.com/apache-tika-...
loading . . .
Apache Tika -- What's New/Office Hours, Thu, Nov 13, 2025, 12:00 PM | Meetup
This will be an expansion of my presentation at the Digital Preservation Bake Off (Tools Demonstration) #iPres2025 and a late entry to celebrate World Digital Preservation
https://www.meetup.com/apache-tika-community/events/311746184
about 1 month ago
0
5
5
reposted by
Tim Allison
DistrictCon
about 1 month ago
We're officially announcing our speakers DistrictCon Year 1! Check out our incredible lineup:
www.districtcon.org/speakers
This also includes our Day 1 & Day 2 Keynotes from Ian Levy and Dan Ridge. And don't forget, GA tickets go on sale November 16! See you in January! 🪩
0
11
16
reposted by
Tim Allison
Charlie Hull
about 1 month ago
It's your responsibility - but how do you even get started fixing search? A blog for Search Product Managers and other search leads
thesearchjuggler.com/its-your-res...
loading . . .
It's your responsibility - but how do you even start fixing search? - Charlie Hull - The Search Juggler
How to get started fixing search - looking for zero result searches, low click queries and how to prioritise
https://thesearchjuggler.com/its-your-responsibility-but-how-do-you-even-start-fixing-search/
0
3
1
So, the news for
#ApacheTika
and
#ipres2025
: I implemented fully recursive extraction of raw embedded files from the commandline.
issues.apache.org/jira/browse/...
add a skeleton here at some point
about 1 month ago
1
1
0
reposted by
Tim Allison
Elizabeth Lopatto
about 2 months ago
goddamn is there anything Wikipedia editors can’t do
www.nytimes.com/2025/10/17/n...
loading . . .
Wikipedia Volunteers Avert Tragedy by Taking Down Gunman at Conference
https://www.nytimes.com/2025/10/17/nyregion/wikipedia-conference-gunman.html
4
2497
507
reposted by
Tim Allison
Ian Coldwater 📦💥
about 2 months ago
Everyone tests in production. Some people just don’t know it yet
3
66
16
Looking forward to some baking with
#ApacheTika
! News soon on some
#conferenceDrivenDevelopment
.
#ipres2025
add a skeleton here at some point
about 2 months ago
1
3
1
Amazing work, as always,
@seeinglogic.bsky.social
!
#AIxCC
add a skeleton here at some point
about 2 months ago
0
2
0
And, y, I'm late to the game, but I'm really excited for this course,
@softwaredoug.bsky.social
!
add a skeleton here at some point
about 2 months ago
0
1
0
reposted by
Tim Allison
Alexander Reelsen
about 2 months ago
F3: The Open-Source Data File Format for the Future Packaging WASM code to read an evolving file format with the data. Interesting approach and a good idea to test the sandbox abilities of the execution engine. Also mentions of a lot of alternatives to parquet/ORC.
loading . . .
https://db.cs.cmu.edu/papers/2025/zeng-sigmod2025.pdf
1
1
1
reposted by
Tim Allison
Jennifer Ouellette
about 2 months ago
A biological 0-day? Threat-screening tools may miss AI-designed proteins.
arstechnica.com/science/2025...
loading . . .
A biological 0-day? Threat-screening tools may miss AI-designed proteins.
Ordering DNA for AI-designed toxins doesn’t always raise red flags.
https://arstechnica.com/science/2025/10/do-ai-designed-proteins-create-a-biosecurity-vulnerability/
0
8
6
reposted by
Tim Allison
Andreas Lehmkühler
2 months ago
The new bugfix release 2.0.35 of
#Apache
#PDFBox
is available
pdfbox.apache.org/download.html
loading . . .
Apache PDFBox | Download
The Apache PDFBox™ library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract ...
https://pdfbox.apache.org/download.html
0
2
1
reposted by
Tim Allison
ToxSec
2 months ago
Anyone in
#bugbounty
looking to connect?
0
2
1
reposted by
Tim Allison
Alexander Doria
2 months ago
New 7-8B OCR model release from AliBaba. Integrated structures data approach looks promising for specialized use cases with complex visual inputs.
huggingface.co/Logics-MLLM/...
2
29
4
reposted by
Tim Allison
Doug Turnbull
2 months ago
Tomorrow I'll be talking about vector retrieval, continuing Cheat at Search Essentials. Full details on my blog article
softwaredoug.com/blog/2025/07...
loading . . .
Free course: Cheat at Search Essentials
A free introductory search course for anyone who wants better search without all the hard work
https://softwaredoug.com/blog/2025/07/31/cheat-at-search-essentials
1
1
1
reposted by
Tim Allison
2 months ago
📣This
#WebArchiveWednesday
, plan your proposal for
#iipcWAC26
, “Sustainable
#WebArchiving
,” at KBR, Royal Library of Belgium!
netpreserve.org/ga2026/CfP
🗓️ Deadline for proposals: OCT 15
#webarchives
#DigitalPreservation
#DigitalHumanities
0
0
5
reposted by
Tim Allison
Doug Turnbull
2 months ago
Recording for BM25 + Lexical Search now up
maven.com/p/e9fbe4/che...
loading . . .
Cheat at Search Essentials: BM25 + Lexical
It's often said with chat interfaces and RAG, search has become the hard problem. Search has a long history and means more than vector databases. Let's learn how BM25 and similar techniques compliment...
https://maven.com/p/e9fbe4/cheat-at-search-essentials-bm25-lexical
1
2
2
reposted by
Tim Allison
Doug Turnbull
2 months ago
This Wednesday I'll be discussing how to Cheat at Query Understanding using LLMs with Jason Liu. If you want a taste of "Cheat at Search with LLMs", please come hang out!
maven.com/p/eebe98
loading . . .
Cheating at Query Understanding with LLMs
LLMs transformed query understanding from months-long NLP projects into simple prompting tasks. Students learn practical skills for modern search, RAG, and e-commerce systems. This positions you for h...
https://maven.com/p/eebe98
0
1
1
reposted by
Tim Allison
WIRED
2 months ago
The annual award ceremony features miniature operas, scientific demos, and 24/7 lectures.
www.wired.com/story/say-he...
loading . . .
Say Hello to the 2025 Ig Nobel Prize Winners
The annual award ceremony features miniature operas, scientific demos, and 24/7 lectures.
https://www.wired.com/story/say-hello-to-the-2025-ig-nobel-prize-winners/
1
59
12
reposted by
Tim Allison
Fredrik Dahlgren
3 months ago
Great paper on finding and exploiting parser differentials between ZIP parsers to bypass signature validation, malware detection, or VSCode extension ID validation.
www.usenix.org/conference/u...
0
15
4
reposted by
Tim Allison
Adrien Grand
3 months ago
Lucene 10.3 is out with 40% faster lexical search, 15% faster dense vector search and 30% faster terms dictionary lookups.
lucene.apache.org/core/corenew...
loading . . .
Lucene™ Core News
Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for...
https://lucene.apache.org/core/corenews.html#apache-lucenetm-1030-available
1
3
3
reposted by
Tim Allison
Apache Software Foundation (The ASF)
3 months ago
🚨 Breaking News from Community Over Code 🚨 Introducing The ASF’s New Logo
buff.ly/DzgT82w
#CommunityOverCode
#opensource
0
26
20
reposted by
Tim Allison
Matthew Martin
3 months ago
Bluesky- "If you can't cite peer reviewed literature, your opinion is morally equivalent to fart noises <links to papers>" Anyhow, I just want to show of my LLM side projects, there really isn't a forum for that anymore.
0
3
1
reposted by
Tim Allison
Matthew Martin
3 months ago
People on Twitter - "LLMs are gods and I command them so I am a god and people will finally give me the respect I crave" People on Mastodon - "<frothing> slop <pant shitting> LLMs :( <howler monkey sounds> stochastic parrot <growling noises> by the way, Github is the root of all social evils"
1
2
2
W00t!
add a skeleton here at some point
3 months ago
0
1
0
reposted by
Tim Allison
DistrictCon
3 months ago
🚨T I C K E T D R O P D A T E S 🚨 you asked, we're answering 😉 Early Bird: Sep 15 (Mon), noon EST GA: Nov 16, 2025 (Sat), noon EST
www.eventbrite.com/e/districtco...
loading . . .
DistrictCon Year 1
DistrictCon is a DC hacker con, focusing on hacking together and exchanging ideas over typical talk tracks.
https://www.eventbrite.com/e/districtcon-year-1-tickets-1467291561559
0
9
7
reposted by
Tim Allison
Paul Ford
3 months ago
This is supposed to be ironic but I saw it and went “Yeah!”
12
315
52
reposted by
Tim Allison
Fredrik Dahlgren
3 months ago
This is a great post on how to bypass code signing (e.g. for malware persistence or to introduce backdoors) by tampering with V8 heap snapshots. All Electron apps (like Slack, 1Password, and Signal) and Chromium based browsers were vulnerable to this issue.
blog.trailofbits.com/2025/09/03/s...
loading . . .
Subverting code integrity checks to locally backdoor Signal, 1Password, Slack, and more
A vulnerability in Electron applications allows attackers to bypass code integrity checks by tampering with V8 heap snapshot files, enabling local backdoors in applications like Signal, 1Password, and...
https://blog.trailofbits.com/2025/09/03/subverting-code-integrity-checks-to-locally-backdoor-signal-1password-slack-and-more/
0
3
1
reposted by
Tim Allison
Doug Turnbull
3 months ago
Trey Grainger and I are offering an "AI Powered Search" course in November. Hope to see you there 30% off in Sept
maven.com/search-schoo...
1
1
1
reposted by
Tim Allison
Alexander Reelsen
3 months ago
Started watching a few videos about MCP. Learned that there are semi official MCP SDKs, where Spring contributed the Java one. Any 1 hour talk to get more up to speed? Curious about learning more, but as condensed as possible - afraid that knowledge outdates fast 😂
loading . . .
Learn how to build an MCP Server in Java
🔍 Discover how to implement a Model Context Protocol (MCP) server using only the core Java SDK. This tutorial expands on our MCP series by showing you a more lightweight, flexible approach for…
https://www.youtube.com/watch?v=Y_Rk6QgWUbE
0
1
1
reposted by
Tim Allison
Dan Jurafsky
3 months ago
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here:
web.stanford.edu/~jurafsky/sl...
loading . . .
Speech and Language Processing
Speech and Language Processing
https://web.stanford.edu/~jurafsky/slp3/
3
149
64
W00t! The
#aixcc
stage talks recently dropped:
aicyberchallenge.com/def-con-33/
#defcon
#defcon33
loading . . .
DEF-CON-33 – aicyberchallenge.com
https://aicyberchallenge.com/def-con-33/
3 months ago
0
0
1
reposted by
Tim Allison
Dare Obasanjo
3 months ago
Microsoft just dropped a COPILOT function for Excel so you can run Gen AI on spreadsheets. They warn not to trust it for math or legal/compliance work since it hallucinates. Which of course means people will and the results will be both hilarious and catastrophic.
loading . . .
Microsoft Excel adds Copilot AI to help fill in spreadsheet cells
Get ready for some AI spreadsheeting.
https://www.theverge.com/news/761338/microsoft-excel-ai-copilot-spreadsheet-cell-filling
7
203
34
reposted by
Tim Allison
Mark Griffin
3 months ago
ICYMI: 5 systems built to compete in DARPA's AI Cyber Challenge are now Open Source:
archive.aicyberchallenge.com
Everything from prompt templates, to terraform code, to implementations of very recent research techniques, it's all there.
loading . . .
AIxCC Competition Archive | AIxCC Competition Archive
The comprehensive archive of DARPA's Artificial Intelligence Cyber Challenge
https://archive.aicyberchallenge.com/
0
1
1
reposted by
Tim Allison
Phrack Zine
4 months ago
At long last - Phrack 72 has been released online for your reading pleasure! Check it out:
phrack.org
0
123
66
reposted by
Tim Allison
Hazel Weakly
4 months ago
Fun fact: Linux allows file names to be any byte sequence except / and the null character. You can also mix character encodings in the same directory
#!/usr/bin/env
bash prefix="$(dd if=/dev/urandom bs=128 count=1)" touch “$(printf ‘%s\n%s’ “you can’t hurt me I’m already dead” “$prefix”)”
add a skeleton here at some point
12
58
15
reposted by
Tim Allison
daniel:// stenberg://
4 months ago
An Open Source sustainability story in two slides. (for a coming talk of mine) Slide 1: car brands using
#curl
Slide 2: car brands sponsoring or paying for #curl support
15
225
392
reposted by
Tim Allison
Nik McLaughlin
4 months ago
So much to unpack here from
@micahflee.com
but for now I’m just taking it as a permanent cure to any imposter syndrome I ever feel ever.
youtu.be/KFYyfrTIPQY?...
loading . . .
"We are currently clean on OPSEC": The Signalgate Saga (DEFCON 33)
YouTube video by Micah Lee
https://youtu.be/KFYyfrTIPQY?si=LJcaU3mZ2KGGYH4F
0
33
6
reposted by
Tim Allison
WIRED
4 months ago
Security researchers found a weakness in OpenAI’s Connectors, which let you hook up ChatGPT to other services, that allowed them to extract data from a Google Drive without any user interaction.
loading . . .
A Single Poisoned Document Could Leak ‘Secret’ Data Via ChatGPT
Security researchers found a weakness in OpenAI’s Connectors, which let you hook up ChatGPT to other services, that allowed them to extract data from a Google Drive without any user interaction.
https://wrd.cm/4op28AP
7
251
118
reposted by
Tim Allison
Gareth Watkins
4 months ago
Somebody on LinkedIn said what we're all thinking.
491
25008
5765
reposted by
Tim Allison
Saber
4 months ago
Phrack #72 release reveals TTPs, backdoors and targets of a Chinese/North Korean state actor mimicking Kimsuky A copy of his workstation is available for all researchers to analyze! Article:
data.ddosecrets.com/APT%20Down%2...
Data dump:
ddosecrets.com/article/apt-...
loading . . .
APT Down - The North Korea Files - Distributed Denial of Secrets
Approximately 9 GB of files exfiltrated from a North Korean threat actor's computer. The data is being released alongside Phrack, and South Korean victims were notified prior to publication. Resear...
https://ddosecrets.com/article/apt-down-the-north-korea-files
0
21
12
#GreenMLOps
add a skeleton here at some point
4 months ago
0
0
0
reposted by
Tim Allison
lcamtuf
4 months ago
I'm sorry folks, the spec made it clear
5
93
16
If you’re at
@defcon.bsky.social
today and want to learn how we developed the challenges and scoring algorithm, please join us at the
#AIxCC
stage at 10:30!
4 months ago
1
1
0
reposted by
Tim Allison
Fredrik Dahlgren
4 months ago
We’re open sourcing our AI reasoning system Buttercup, which placed second in DARPAs AI Cyber Challenge! It runs on your laptop and works with any OSS-fuzz/ClusterFuzz compatible project.
blog.trailofbits.com/2025/08/08/b...
loading . . .
Buttercup is now open-source!
Now that DARPA’s AI Cyber Challenge (AIxCC) has officially ended, we can finally make Buttercup, our CRS (Cyber Reasoning System), open source!
https://blog.trailofbits.com/2025/08/08/buttercup-is-now-open-source/
1
17
7
Load more
feeds!
log in