I checked out what you get when you export your content from Twitter, Facebook, Google+, LinkedIn, WordPress and LiveJournal, with an eye for both private archives and migrating to your own site.

Tired of Twitter? Fed up with Facebook? Irritated by Instagram?

If you want to leave a major social network, but keep your content — or even just make sure you have your own backup in case the site shuts down, purges accounts, or changes its TOS *cough* LiveJournal *cough*, you can usually get some of your info. But not all of them give it to you in a way that’s useful.

Twitter

You get a CSV spreadsheet containing all your tweets since the dawn of Twitter, with the text in one column, ID in another, timestamp, reply-to, and so on. It’s pretty easy to import this into another system. (I pulled mine into a test WordPress site using the WP All-Import plugin.)

Links in the text appear as the t.co shortened URL, with the “real” URL in another column. Of course, if the “real” URL was also a shortener, you’ll just see bit.ly or whatever. And if you’ve been on Twitter long enough, you may find that some of your older links use shorteners that don’t exist anymore (or have purged their archives), like ping.fm or tr.im.

You also get an offline web app with an index.html that allows you to view all your tweets month by month without visiting the site.

But you don’t get any of your uploaded media, or direct messages. So if you mostly use Twitter for text-based microblogging, you’re fine, but if you use it for photo sharing or private conversations, you’re out of luck.

Update: Retweets are sometimes incomplete in the spreadsheet. The text field is a constructed manual retweet — “RT @otheruser: Text of the original tweet” — but it’s truncated to fit in 140 characters (even if the original was made after the 280-character update). So if adding the username pushes it over the limit, or if it was longer to begin with, you don’t actually get the entire original tweet in the spreadsheet. I suspect this means retweets don’t actually use that field, and get the content straight from the original tweet by ID.

Facebook

You get all your photos, videos and messages, organized by folders, but the names are all just numeric IDs. You do get an offline web application that includes names and indexes.

It also has your entire timeline in one giant HTML file. But it only includes the text, the type of update, and the timestamps. If you posted a link, it doesn’t include the link. If you posted a photo, it doesn’t link to the photo.

And you don’t get comments, either your comments on other people’s posts, or their comments on yours.

Worst, though? It doesn’t indicate the privacy of each post. That means you can’t take the timeline and import it to a new system unless you separate the public and private posts one by one.

Update March 2018: Apparently if you use the Facebook Messenger app on Android, there’s a good chance Facebook also has your SMS messages and call history. This is probably not something you expected them to have.

Google Plus

Google Takeout allows you to export various categories of data, including your Google+ stream, circles, +1s and page posts.

Each post is exported as a separate HTML file, named after the first line. Comment threads are included, along with timestamps, a permalink to the original post, and a visibility indicator. It only marks Public vs. Limited, but that’s better than you get from Facebook.

The HTML files are suitable for publishing as-is, and marked up so that that it shouldn’t be hard to write an import tool for a CMS. (I’m planning on writing a script to convert them to WordPress’ XML format.)

Images aren’t included in the G+ stream download, and are instead hotlinked on photo posts and galleries. I haven’t checked, but I suspect any images you uploaded to Google+ will be included in your Google Photos download.

There is an index of all your posts…but it’s in alphabetical order.

Bonus: Google Buzz

When Google shut down Buzz a few years ago, they generated archives and put them in each person’s Drive account. They did one cool thing, which was to create two sets of archives: One complete, the other containing only public posts.

The format? Long PDFs, dozens of pages each, with all your posts, labeled by source (Buzz, Twitter, a specific site, etc.)…with the letters scrambled. Apparently they left the “reduce file size” option turned on when they generated them. This means you can’t copy/paste or search in the PDF itself, but you can open it in Google Docs and it’ll convert the text back, at which point you can do both. But that doesn’t preserve links or media, which you have to get out of the original PDF…

LinkedIn

LinkedIn generates two phases of archives. The first one, available within minutes of requesting it, contains your profile info, your messages, contacts and invitations in CSV files.

The complete archive, available within 24 hours, actually lives up to the name. Everything is in a set of CSV files: Your contacts, your shares, your group posts, your group comments, even your behind the scenes info like ad targeting categories and recent login records. (One word of warning: They’re encoded as UTF-16, so if the tool you use to import afterward isn’t expecting that, you may need to convert it.)

I’m not sure how photos and video are handled, as I’ve never uploaded either to LinkedIn (other than my profile picture, which landed in a folder called Media Files).

LiveJournal

LiveJournal’s own export tool will export a month at a time into a CSV or XML file, which includes your posts and their metadata (timestamps, moods, etc.), but not comments, userpics or photos.

There are other tools available using the API, which might be able to get more data. I’ve looked at two:

The WordPress importer will pull in all your posts, and the comments on them, and makes a note of moods, music, etc. (you can use my LJ-Moods plugin to display them). It doesn’t transfer any images you’ve uploaded.

DreamWidth’s importer seems more complete – LJ and Dreamwidth are based on the same code, after all – and is able to natively handle moods, userpics, etc. But it doesn’t transfer your media library either.

WordPress

WordPress exports a giant XML file containing all your posts, their comments, and their metadata. You can import it into another WordPress instance, and have virtually the same blog. Or you can merge two blogs together by importing both. (I’ve moved posts with comment threads from one blog to another by putting them in a category, exporting the category, and then importing them on the new blog.)

It doesn’t include your media library, but if you import the file to a new site before closing down the old one, the importer should offer to pull in all of the images and other media that are actually used in posts.

Plus on a self-hosted site you have a lot of tools available: backup plugins that will include everything, SFTP access through your web host, etc.

Update: Tumblr

I didn’t initially include Tumblr because it doesn’t have an exporter…but WordPress has an importer that does a good job of transferring your blog directly from Tumblr to a WordPress blog. (Look on your WordPress dashboard under Tools/Import.) It even imports images (though sometimes it imports a single-image post as a gallery for some reason). The original URL is stored in a custom field, and you can leave it connected and import new items when you want to bring them in.

Some gotchas: It can only map to one author, but you get to choose which one. It puts everything in the default category. Videos don’t get imported, even if you’ve just embedded a YouTube video.

Update: Mastodon

With the 2.3.0 update (March 2018), Mastodon has added its first archive tool. It’s essentially complete, but it’s only machine-readable so far. You get a pair of files in ActivityPub format (based on JSON), one containing your profile and one containing your formatted posts. You also get a folder structure containing any images and videos you’ve uploaded, and your icon and header image.

If you’re willing to slog through the JSON files, you can figure out which image goes to which post, but it’s still a pain.

But this is a first pass, aimed more at portability (keep your own backups or move your data to another instance or service) than readability. ActivityPub is a new standard, so there aren’t many converters yet, but that’s likely to improve.

Others

Instagram doesn’t have an export tool, so you have to rely on third-party solutions.

Flickr allows you to bulk-download photos from your Camera Roll, and it helpfully uses the title to name the files, but it doesn’t export the description, tags, or comments.

Mastodon currently only lets you export your contacts and block lists, but archiving and migration (from one Mastodon instance to another) are on the roadmap.

Spectrum on the Floor (Not Pink Floyd)

You’ve probably heard about Instagram’s new terms of service, which claim the right to sell your photos. [Update: Instagram has posted a “that’s not what we meant!” statement and promised to revise that section.]

To help us deliver interesting paid or sponsored content or promotions, you agree that a business or other entity may pay us to display your username, likeness, photos (along with any associated metadata), and/or actions you take, in connection with paid or sponsored content or promotions, without any compensation to you.

Monetization is one thing, but selling my creative output, using it or my likeness for advertising, without my permission? That’s stepping over the line. Add this to the recent decision to hide image previews from Twitter, and a pattern emerges of a service that was once open and free starting to close ranks.

I’m not personally worried about Instagram in particular. I’ve only really dabbled in it over the last few months, treating it most of the time as a first draft for Flickr. I have maybe 50 photos and a handful of followers, and most of the people I follow there are also on other networks. If Instagram doesn’t back down or clarify the language [Update: they did], I can easily repost the photos I want to keep online and go somewhere else.

I am worried about the trend it highlights: You can’t always rely on social media.

And I am worried about the fact that these changes were announced after the Facebook acquisition went through, and after Facebook revised their terms so that they no longer have to put new terms of service to a vote. I’ve got a lot more invested in Facebook than I have in Instagram.

Where Have All The Photos Gone?

GloomI used to blog about web browsers at Spread Firefox and Opera Watch. Both sites are long gone. Countless articles I’ve linked to have vanished as publishers restructured or went out of business.

I’ve got an extensive LiveJournal from a few years back. It’s still there, but when I let my paid account lapse, I started moving over some of the less personal, more tech- and entertainment-focused posts (like convention reports) to this site, just in case a BOFH deletes it, or they change their terms of service to something unacceptable.

The question “Who owns your data?” has been repeated so often over the years that I can’t look up the post I’m thinking about, which advocated open file formats over proprietary ones (like Microsoft Office) on the basis that you should always be able to find a reader for a text document, but if you lose access to Word, or if Microsoft decides to drop support for an older format, you’re at their mercy.

The problem with social networks as services is that, like with those proprietary file types, you’re at their mercy. Want to search for a three-year-old Tweet? Tough. Facebook changed their privacy settings again? Oops. Twitter decides they don’t want apps like yours to exist, so they close off part of their API? Bye! The site you posted all your photos to decides to close up shop? *Poof!* There go your photos.

So What’s the Alternative?

Train ArrivingWhen it comes down to it, the only way to be sure you aren’t going to be exploited or abandoned is to do it yourself.

Blogging is basically the same as social networking, except distributed:

  • People publish written posts, photos, videos, and more.
  • Other people comment on them.
  • You can “share” a post by linking to it, and pingbacks/trackbacks will let them know you’ve done so.
  • You can subscribe to someone’s updates through RSS, and services like RSSCloud and PubSubHubub can make updates appear quickly.
  • Services like OpenId make it possible to authenticate visitors, which means you can start locking down who gets to see what.

The upside is that you, not Facebook or Google or Twitter, have full control of your content. The downside is that you have to exercise that control. You have to maintain the infrastructure, you have to guard against attackers, you have to filter out spam, you have to do your own backups, and you have to know at least something about the system under the hood.

We keep going to social networks because they’re so damn convenient. They take care of all that, and make your stuff easier for people to discover as a bonus.

But when you leave the network — or when it leaves you — what happens to all your photos, status updates, rants, raves, and commentary?

Who owns your profile?

The Top 10 Reasons I Will Not Follow You in Return on Twitter is making its way around…well…Twitter today. Just reading the tile makes me wonder: why would someone expect to be followed in return? I guess it comes down to this question: What does it mean to follow someone? Is it different from friending them? And just what does “friend” mean in this context, anyway?

The way social networking sites use the term “Friend” has always bugged me. The actual software for Facebook, MySpace, or LiveJournal seems to use it to mean two distinct things:

  • An actual friend, someone with whom you interact on a personal basis.
  • An entity whose posts you’re following because you’re interested in the content, rather than invested in the person.

Wishful thinking aside, reading Neil Gaiman’s blog regularly doesn’t make me his friend.

Okay, so “Friend” is shorthand, but it brings in a load of connotations, blending the two meanings. People will freak out when a stranger “friends” them, will feel insulted if someone that they’ve friended doesn’t friend them back, or will feel rejected if someone de-friends them. I’ve heard it suggested that one reason people move from one social network to another is to start over with a clean slate of friends, and not have to worry about the drama of removing anyone from their current friends’ list.

Twitter, with the simple and direct term, “Follower,”, doesn’t seem like it would bring in the same level of baggage. To me, clicking “Follow” doesn’t feel like it has the same emotional weight as marking someone as a friend. I follow accounts that I find interesting, and that I actually have a chance of keeping up with. If someone follows me, I don’t feel obligated to follow them, and if I follow someone else, I don’t expect them to follow me.

So I was perplexed when I started seeing new followers showing up on my personal Twitter account who clearly had only done a keyword search on my latest tweet, or looked at who I was following. What were they expecting? That I would look at the “XYZ is following you!” email and trace it to their website? That I would follow them back?

It didn’t make any sense to me.

Of course, now I’m sure they were expecting me to follow them back. As this article suggests, a lot of people do see “Follow” as a synonym for “Friend”, and they were most likely trying to game that system.

In other words, despite the terminology, Twitter’s stuck with the same old baggage that clogs up other social networks.

To all three of you reading this via LiveJournal syndication, sorry for filling up your friends list with repeats. I changed the footer on the feed, and didn’t realize that LiveJournal will repost articles as new if there’s any change.

I guess this means I’ll have to be careful about things like fixing typos on the latest 15 posts.